from:"Jark Wu"

[jira] [Created] (FLINK-22320) Add documentation for new introduced ALTER TABLE statements

2021-04-16 Thread Jark Wu (Jira)

Jark Wu created FLINK-22320:
---

 Summary: Add documentation for new introduced ALTER TABLE 
statements
 Key: FLINK-22320
 URL: https://issues.apache.org/jira/browse/FLINK-22320
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Jark Wu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22316) Support MODIFY column/constraint/watermark for ALTER TABLE statement

2021-04-16 Thread Jark Wu (Jira)

Jark Wu created FLINK-22316:
---

 Summary: Support MODIFY column/constraint/watermark for ALTER 
TABLE statement
 Key: FLINK-22316
 URL: https://issues.apache.org/jira/browse/FLINK-22316
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Jark Wu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22315) Support ADD column/constraint/watermark for ALTER TABLE statement

2021-04-16 Thread Jark Wu (Jira)

Jark Wu created FLINK-22315:
---

 Summary: Support ADD column/constraint/watermark for ALTER TABLE 
statement
 Key: FLINK-22315
 URL: https://issues.apache.org/jira/browse/FLINK-22315
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Jark Wu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22314) AggRecordsCombiner should combine buffered records first instead of accumulate on state directly

2021-04-16 Thread Jark Wu (Jira)

Jark Wu created FLINK-22314:
---

 Summary: AggRecordsCombiner should combine buffered records first 
instead of accumulate on state directly
 Key: FLINK-22314
 URL: https://issues.apache.org/jira/browse/FLINK-22314
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Reporter: Jark Wu


In Window TVF Aggregation, currently, the {{AggRecordsCombiner}} accumulates 
buffered records on state directly. This is not good for performance. We can 
accumulate records in memory first, and then merge the accumulator into state, 
if the aggs support {{merge()}} method. This can reduce lots of state accessing 
when having {{COUNT DISTINCT}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22302) Restructure SQL "Queries" documentation into one page per operation

2021-04-16 Thread Jark Wu (Jira)

Jark Wu created FLINK-22302:
---

 Summary: Restructure SQL "Queries" documentation into one page per 
operation
 Key: FLINK-22302
 URL: https://issues.apache.org/jira/browse/FLINK-22302
 Project: Flink
  Issue Type: Task
  Components: Documentation, Table SQL / API
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.13.0


Currently, the "Queries" page has been very large and it's getting longger when 
we supporting more features. We already have separate pages for Joins and CEP. 
I would propose to separate "Queries" into one page per operation. This way we 
can easily add more detailed informations for the operations and more examples.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22261) Python StreamingModeDataStreamTests is failed on Azure

2021-04-13 Thread Jark Wu (Jira)

Jark Wu created FLINK-22261:
---

 Summary: Python StreamingModeDataStreamTests is failed on Azure
 Key: FLINK-22261
 URL: https://issues.apache.org/jira/browse/FLINK-22261
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.13.0
Reporter: Jark Wu
 Fix For: 1.13.0


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=16443=logs=9cada3cb-c1d3-5621-16da-0f718fb86602=8d78fe4f-d658-5c70-12f8-4921589024c3

{code}
2021-04-13T11:49:32.1640428Z === FAILURES 
===
2021-04-13T11:49:32.1641478Z _ 
StreamingModeDataStreamTests.test_keyed_process_function_with_state __
2021-04-13T11:49:32.1641744Z 
2021-04-13T11:49:32.1642074Z self = 

2021-04-13T11:49:32.1642359Z 
2021-04-13T11:49:32.1642606Z def 
test_keyed_process_function_with_state(self):
2021-04-13T11:49:32.1644412Z self.env.set_parallelism(1)
2021-04-13T11:49:32.1644941Z 
self.env.get_config().set_auto_watermark_interval(2000)
2021-04-13T11:49:32.1645447Z 
self.env.set_stream_time_characteristic(TimeCharacteristic.EventTime)
2021-04-13T11:49:32.1647182Z data_stream = 
self.env.from_collection([(1, 'hi', '1603708211000'),
2021-04-13T11:49:32.1648276Z 
(2, 'hello', '1603708224000'),
2021-04-13T11:49:32.1661775Z 
(3, 'hi', '1603708226000'),
2021-04-13T11:49:32.1663379Z 
(4, 'hello', '1603708289000'),
2021-04-13T11:49:32.1665197Z 
(5, 'hi', '1603708291000'),
2021-04-13T11:49:32.1666200Z 
(6, 'hello', '1603708293000')],
2021-04-13T11:49:32.1666827Z
type_info=Types.ROW([Types.INT(), Types.STRING(),
2021-04-13T11:49:32.1667449Z
 Types.STRING()]))
2021-04-13T11:49:32.1667830Z 
2021-04-13T11:49:32.1668351Z class 
MyTimestampAssigner(TimestampAssigner):
2021-04-13T11:49:32.1668755Z 
2021-04-13T11:49:32.1669783Z def extract_timestamp(self, value, 
record_timestamp) -> int:
2021-04-13T11:49:32.1670386Z return int(value[2])
2021-04-13T11:49:32.1670672Z 
2021-04-13T11:49:32.1671063Z class 
MyProcessFunction(KeyedProcessFunction):
2021-04-13T11:49:32.1671434Z 
2021-04-13T11:49:32.1671727Z def __init__(self):
2021-04-13T11:49:32.1672090Z self.value_state = None
2021-04-13T11:49:32.1685812Z self.list_state = None
2021-04-13T11:49:32.1686276Z self.map_state = None
2021-04-13T11:49:32.1686609Z 
2021-04-13T11:49:32.1687039Z def open(self, runtime_context: 
RuntimeContext):
2021-04-13T11:49:32.1688350Z value_state_descriptor = 
ValueStateDescriptor('value_state', Types.INT())
2021-04-13T11:49:32.1688953Z self.value_state = 
runtime_context.get_state(value_state_descriptor)
2021-04-13T11:49:32.1689892Z list_state_descriptor = 
ListStateDescriptor('list_state', Types.INT())
2021-04-13T11:49:32.1690492Z self.list_state = 
runtime_context.get_list_state(list_state_descriptor)
2021-04-13T11:49:32.1691407Z map_state_descriptor = 
MapStateDescriptor('map_state', Types.INT(), Types.STRING())
2021-04-13T11:49:32.1692052Z self.map_state = 
runtime_context.get_map_state(map_state_descriptor)
2021-04-13T11:49:32.1692481Z 
2021-04-13T11:49:32.1693134Z def process_element(self, value, ctx):
2021-04-13T11:49:32.1693632Z current_value = 
self.value_state.value()
2021-04-13T11:49:32.1694106Z self.value_state.update(value[0])
2021-04-13T11:49:32.1694573Z current_list = [_ for _ in 
self.list_state.get()]
2021-04-13T11:49:32.1695051Z self.list_state.add(value[0])
2021-04-13T11:49:32.1695445Z map_entries_string = []
2021-04-13T11:49:32.1695902Z for k, v in self.map_state.items():
2021-04-13T11:49:32.1696822Z 
map_entries_string.append(str(k) + ': ' + str(v))
2021-04-13T11:49:32.1697700Z map_entries_string = '{' + ', 
'.join(map_entries_string) + '}'
2021-04-13T11:49:32.1698483Z self.map_state.put(value[0], 
value[1])
2021-04-13T11:49:32.1698941Z current_key = ctx.get_current_key()
2021-04-13T11:49:32.1699840Z yield "current key: {}, current 
value state: {}, current list state: {}, " \
2021-04-13T11:49:32.1700593Z   "current map state: {}, 
current value: {}".format(str(current_key),
2021-

[jira] [Created] (FLINK-22161) 'Run Mesos WordCount test' failed on Azure

2021-04-08 Thread Jark Wu (Jira)

Jark Wu created FLINK-22161:
---

 Summary: 'Run Mesos WordCount test' failed on Azure
 Key: FLINK-22161
 URL: https://issues.apache.org/jira/browse/FLINK-22161
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Mesos
Reporter: Jark Wu
 Fix For: 1.13.0


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=16210=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=ff888d9b-cd34-53cc-d90f-3e446d355529

{code}
Apr 08 14:24:10 Step 1/2 : FROM ubuntu:xenial
Head https://registry-1.docker.io/v2/library/ubuntu/manifests/xenial: received 
unexpected HTTP status: 500 Internal Server Error
Apr 08 14:24:12 Command: build_image failed. Retrying...
Apr 08 14:24:12 Building Mesos Docker container
Apr 08 14:24:12 Sending build context to Docker daemon  6.144kB

Apr 08 14:24:12 Step 1/2 : FROM ubuntu:xenial
Head https://registry-1.docker.io/v2/library/ubuntu/manifests/xenial: received 
unexpected HTTP status: 500 Internal Server Error
Apr 08 14:24:13 Command: build_image failed. Retrying...
Apr 08 14:24:13 Command: build_image failed 5 times.

Apr 08 14:24:13 ERROR: Could not build mesos image. Aborting...
Error: No such container: mesos-master
The MVN_REPO variable is not set. Defaulting to a blank string.
Removing network docker-mesos-cluster-network
Network docker-mesos-cluster-network not found.
Apr 08 14:24:14 [FAIL] Test script contains errors.
Apr 08 14:24:14 Checking for errors...
Apr 08 14:24:14 No errors in log files.
Apr 08 14:24:14 Checking for exceptions...
Apr 08 14:24:14 No exceptions in log files.
Apr 08 14:24:14 Checking for non-empty .out files...
grep: 
/home/vsts/work/1/s/flink-dist/target/flink-1.13-SNAPSHOT-bin/flink-1.13-SNAPSHOT/log/*.out:
 No such file or directory
Apr 08 14:24:14 No non-empty .out files.
Apr 08 14:24:14 
Apr 08 14:24:14 [FAIL] 'Run Mesos WordCount test' failed after 0 minutes and 10 
seconds! Test exited with exit code 1
Apr 08 14:24:14 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22160) Test Window TVF based aggregation and TopN

2021-04-08 Thread Jark Wu (Jira)

Jark Wu created FLINK-22160:
---

 Summary: Test Window TVF based aggregation and TopN
 Key: FLINK-22160
 URL: https://issues.apache.org/jira/browse/FLINK-22160
 Project: Flink
  Issue Type: Test
  Components: Table SQL / API
Reporter: Jark Wu
 Fix For: 1.13.0


In FLINK-19604 
([FLIP-145|https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function]),
 we introduced a new syntax to express Window Aggregate and Window TopN. For 
Window Aggregate, we have also introduced a new window kind: cumulate windows. 

The scope of this task is to make sure:

1. The old window aggergate syntax ({{GROUP BY TUMBLE(...)}}) can be rewrite 
using the new syntax, and get the same results. Note, session window is not 
supported yet in the new syntax.
2. Verify the new CUMULATE window works as expect
3. Verify the new Window TopN workss as expect
4. Failure and recovery and rescale case: results are still correct.
5. Window emitting: window should be fired once watermark advances window end 
(we can manually generate source data with monotonically and slowly increasing 
timestamp)
6. The feature is well-documented


Note: the documentation for this feature is still going on (FLINK-22159), for 
testing the feature, we can use the FLIP documentation as an instruction for 
now. 






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22159) Add documentation for the new window TVF based operations

2021-04-08 Thread Jark Wu (Jira)

Jark Wu created FLINK-22159:
---

 Summary: Add documentation for the new window TVF based operations
 Key: FLINK-22159
 URL: https://issues.apache.org/jira/browse/FLINK-22159
 Project: Flink
  Issue Type: Sub-task
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.13.0


In this 1.13 version, we have supported window TVF based aggregation and TopN 
of FLIP-145. We should add documentation for them. We may also need to 
restructure the "Queries" page.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22080) Refactor SqlClient for better testing

2021-03-31 Thread Jark Wu (Jira)

Jark Wu created FLINK-22080:
---

 Summary: Refactor SqlClient for better testing
 Key: FLINK-22080
 URL: https://issues.apache.org/jira/browse/FLINK-22080
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Jark Wu
 Fix For: 1.13.0


Currently, we added a JUnit Rule {{TerminalStreamsResource}} to replace 
{{System.in}} and {{System.out}} stream to get the output of SqlClient. 
However, this is not safe, especially used by multiple tests. 

We should refactor {{SqlClient}} to expose a new testing purpose {{main}} 
method which can pass in custom {{InputStream}} and {{OutputStream}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Apply external 3 days for FLINK-20387 after feature freeze

2021-03-31 Thread Jark Wu

+1 for more days for this FLIP.

Changing data type is a massive work, and supporting windowing on
timestamp_ltz is also
 a very complex job when considering DST. Would be great if we can have
more days to
finalize this work.

Best,
Jark

On Thu, 1 Apr 2021 at 03:46, Timo Walther  wrote:

> Hi everyone,
>
> I support Leonard's request. It was foreseeable that the changes of
> FLIP-162 will be massive and will take some time. By looking at PRs such as
>
> https://github.com/apache/flink/pull/15280
>
> I would also vote for giving a bit more time for proper reviews and
> finalizing this story for consistency.
>
> Regards,
> Timo
>
>
>
> On 31.03.21 17:56, Leonard Xu wrote:
> > Hi,  Dawid & Guowei
> >
> > Sorry to apply the extension, I want to apply 3 days for ticket
> [FLINK-20387] Support column of TIMESTAMP WITH LOCAL ZONE TIME type as
> rowtime[1],  it is the last ticket of FLIP-162[2] which aims to solve
> various time zone issues and offer a consistent time function behavior.  We
> experienced a long discussion for this FLIP and I took some efforts to
> resolve the  tricky daylight saving time problem. In fact, I have been
> working hard to promote this FLIP recently.
> >
> > But I really hope that this feature can join 1.13. The motivation is not
> because I want to complete this FLIP personally, but from the user's
> perspective that we can provide them a consistent time behavior and
> resolves the time zone issues naturally, we believe this will greatly
> improve the user experience, thus I’m asking to apply 3 days for only this
> ticket FLINK-20387.
> >
> > Best,
> > Leonard
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-20387
> > [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior
> >
> >
> >> 在 2021年3月31日，17:54，Guowei Ma  写道：
> >>
> >> Hi, community:
> >>
> >> Friendly reminder that today (3.31) is the last day of feature
> development. Under normal circumstances, you will not be able to submit new
> features from tomorrow (4.1). Tomorrow we will create 1.13.0-rc0 for
> testing, welcome to help test together.
> >> After the test is relatively stable, we will cut the release-1.13
> branch.
> >>
> >> Best,
> >> Dawid & Guowei
> >>
> >>
> >> On Mon, Mar 29, 2021 at 5:17 PM Till Rohrmann  > wrote:
> >> +1 for the 31st of March for the feature freeze.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Mon, Mar 29, 2021 at 10:12 AM Robert Metzger  > wrote:
> >>
> >>> +1 for March 31st for the feature freeze.
> >>>
> >>>
> >>>
> >>> On Fri, Mar 26, 2021 at 3:39 PM Dawid Wysakowicz <
> dwysakow...@apache.org >
> >>> wrote:
> >>>
>  Thank you Thomas! I'll definitely check the issue you linked.
> 
>  Best,
> 
>  Dawid
> 
>  On 23/03/2021 20:35, Thomas Weise wrote:
> > Hi Dawid,
> >
> > Thanks for the heads up.
> >
> > Regarding the "Rebase and merge" button. I find that merge option
> >>> useful,
> > especially for small simple changes and for backports. The following
>  should
> > help to safeguard from the issue encountered previously:
> > https://github.com/jazzband/pip-tools/issues/1085 <
> https://github.com/jazzband/pip-tools/issues/1085>
> >
> > Thanks,
> > Thomas
> >
> >
> > On Tue, Mar 23, 2021 at 4:58 AM Dawid Wysakowicz <
> >>> dwysakow...@apache.org 
> >
> > wrote:
> >
> >> Hi devs, users!
> >>
> >> 1. *Feature freeze date*
> >>
> >> We are approaching the end of March which we agreed would be the
> time
>  for
> >> a Feature Freeze. From the knowledge I've gather so far it still
> seems
>  to
> >> be a viable plan. I think it is a good time to agree on a particular
>  date,
> >> when it should happen. We suggest *(end of day CEST) March 31st*
> >> (Wednesday next week) as the feature freeze time.
> >>
> >> Similarly as last time, we want to create RC0 on the day after the
>  feature
> >> freeze, to make sure the RC creation process is running smoothly,
> and
> >>> to
> >> have a common testing reference point.
> >>
> >> Having said that let us remind after Robert & Dian from the previous
> >> release what it a Feature Freeze means:
> >>
> >> *B) What does feature freeze mean?*After the feature freeze, no new
> >> features are allowed to be merged to master. Only bug fixes and
> >> documentation improvements.
> >> The release managers will revert new feature commits after the
> feature
> >> freeze.
> >> Rational: The goal of the feature freeze phase is to improve the
> >>> system
> >> stability by addressing known bugs. New features tend to introduce
> new
> >> instabilities, which would prolong the release process.
> >> If you need to merge a new feature after the freeze, please open a
> >>

[jira] [Created] (FLINK-22011) Support local global optimization for window aggregation in runtime

2021-03-29 Thread Jark Wu (Jira)

Jark Wu created FLINK-22011:
---

 Summary: Support local global optimization for window aggregation 
in runtime
 Key: FLINK-22011
 URL: https://issues.apache.org/jira/browse/FLINK-22011
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Runtime
Reporter: Jark Wu
Assignee: Jark Wu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21997) Add back missing jdbc.md zh docs

2021-03-26 Thread Jark Wu (Jira)

Jark Wu created FLINK-21997:
---

 Summary: Add back missing jdbc.md zh docs
 Key: FLINK-21997
 URL: https://issues.apache.org/jira/browse/FLINK-21997
 Project: Flink
  Issue Type: Task
  Components: Connectors / JDBC, Documentation, Table SQL / Ecosystem
Reporter: Jark Wu
Assignee: jjiey


We lost some Chinese doc pages when migrate  doc website from Jekyll to Hugo in 
(https://github.com/apache/flink/pull/14903),  such as `jdbc zh`.

we should add them back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21909) Unify API and implementation for Hive and Filesystem source connector

2021-03-21 Thread Jark Wu (Jira)

Jark Wu created FLINK-21909:
---

 Summary: Unify API and implementation for Hive and Filesystem 
source connector
 Key: FLINK-21909
 URL: https://issues.apache.org/jira/browse/FLINK-21909
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / FileSystem, Connectors / Hive
Reporter: Jark Wu


This should make Filesystem source connector have all the ability of Hive 
source connector (including the watermark ability). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21908) Support HiveSource to emit watermarks which is extract from partition values

2021-03-21 Thread Jark Wu (Jira)

Jark Wu created FLINK-21908:
---

 Summary: Support HiveSource to emit watermarks which is extract 
from partition values
 Key: FLINK-21908
 URL: https://issues.apache.org/jira/browse/FLINK-21908
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive
Reporter: Jark Wu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21907) Support watermark syntax for Hive DDL dialect

2021-03-21 Thread Jark Wu (Jira)

Jark Wu created FLINK-21907:
---

 Summary: Support watermark syntax for Hive DDL dialect
 Key: FLINK-21907
 URL: https://issues.apache.org/jira/browse/FLINK-21907
 Project: Flink
  Issue Type: Sub-task
Reporter: Jark Wu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21906) Support computed column syntax for Hive DDL dialect

2021-03-21 Thread Jark Wu (Jira)

Jark Wu created FLINK-21906:
---

 Summary: Support computed column syntax for Hive DDL dialect
 Key: FLINK-21906
 URL: https://issues.apache.org/jira/browse/FLINK-21906
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive
Reporter: Jark Wu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21905) Hive streaming source should use FIFO FileSplitAssigner instead of LIFO

2021-03-21 Thread Jark Wu (Jira)

Jark Wu created FLINK-21905:
---

 Summary: Hive streaming source should use FIFO FileSplitAssigner 
instead of LIFO
 Key: FLINK-21905
 URL: https://issues.apache.org/jira/browse/FLINK-21905
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive
Reporter: Jark Wu


Currently, Hive streaming source uses {{SimpleAssigner}} which hands out splits 
in LIFO order. However, it will result in out-of-order partition reading even 
if we add partition splits in order. Therefore, we should use a FIFO order 
{{FileSplitAssigner}} instead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21899) Introduce SOURCE_WATERMARK built-infunction to preserve watermark from source

2021-03-21 Thread Jark Wu (Jira)

Jark Wu created FLINK-21899:
---

 Summary: Introduce SOURCE_WATERMARK built-infunction to preserve 
watermark from source
 Key: FLINK-21899
 URL: https://issues.apache.org/jira/browse/FLINK-21899
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Jark Wu


The SOURCE_WATERMARK function doesn't have concrete implementation. The 
{{eval()}} function should throw a meaningful exception message to indicate 
users the function should only be used in DDL to generate a watermark preserved 
from source system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21894) Update type of HiveTablePartition#partitionSpec from Map to Map

2021-03-21 Thread Jark Wu (Jira)

Jark Wu created FLINK-21894:
---

 Summary: Update type of HiveTablePartition#partitionSpec from 
Map to Map
 Key: FLINK-21894
 URL: https://issues.apache.org/jira/browse/FLINK-21894
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
Reporter: Jark Wu
Assignee: Jark Wu


In order to support watermark for Hive connector (FLINK-21871), we need to 
extract time from the partition string values. However, currently 
{{HiveTablePartition#partitionSpec}} uses {{Map}} type to store 
the partition spec which is hard convert back to string values. Therefore, I 
propose to use the raw {{Map}} type instead which is easier to 
convert into {{Map}} when needed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21892) Add documentation for the DESC statement syntax

2021-03-21 Thread Jark Wu (Jira)

Jark Wu created FLINK-21892:
---

 Summary: Add documentation for the DESC statement syntax
 Key: FLINK-21892
 URL: https://issues.apache.org/jira/browse/FLINK-21892
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation, Table SQL / API
Reporter: Jark Wu


We suppored {{DESC}} as an abbreviation of the DESCRIBE statement in 
FLINK-21847. We should update the documentation page: 
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sql/describe/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21881) Support local global optimization for window aggregation

2021-03-19 Thread Jark Wu (Jira)

Jark Wu created FLINK-21881:
---

 Summary: Support local global optimization for window aggregation
 Key: FLINK-21881
 URL: https://issues.apache.org/jira/browse/FLINK-21881
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Jark Wu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21871) Support watermark for Hive and Filesystem streaming source

2021-03-19 Thread Jark Wu (Jira)

Jark Wu created FLINK-21871:
---

 Summary: Support watermark for Hive and Filesystem streaming source
 Key: FLINK-21871
 URL: https://issues.apache.org/jira/browse/FLINK-21871
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / FileSystem, Connectors / Hive, Table SQL / 
API
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.13.0


Hive and Filesystem already support streaming source. However, they doesn't 
support watermark on the source. That means users can't leverage the streaming 
source to perform the Flink powerful streaming analysis, e.g. window aggregate, 
interval join, and so on. 

In order to make more Hive users can leverage Flink to perform streaming 
analysis, and also cooperate with the new optimized window-TVF operations 
(FLIP-145), we need to support watermark for Hive and Filesystem. 

### How to emit watermark in Hive and Filesystem

Factual data in Hive are usually partitioned by date time, e.g. 
{{pt_day=2021-03-19, pt_hour=10}}. In this case, when the data of partition 
{{pt_day=2021-03-19, pt_hour=10}} are emitted, 
we should be able to know all the data before {{2021-03-19 11:00:00}} have been 
arrived, so we can emit a watermark value of {{2021-03-19 11:00:00}}. We call 
this partition watermark. 

The partition watermark is much better than record watermark (extract watermark 
from record, e.g. {{ts - INTERVAL '1' MINUTE}}). 
Because in above example, if we are using partition watermark, the window of 
[10:00, 11:00) will be triggered when pt_hour=10 is finished.
However, if we are using record watermark, the window of [10:00, 11:00) will be 
triggered when pt_hour=11 is arrived, that will make the pipeline have one more 
partition dely. 

Therefore, we firstly focus on support partition watermark for Hive and 
Filesystem.

### Example

In order to support such watermarks, we propose using the following DDL to 
define a Hive table with watermark defined:

{code:sql}
-- using hive dialect
CREATE TABLE hive_table (
  x int, 
  y string,
  z int,
  rowtime timestamp,
  WATERMARK FOR rowtime AS SOURCE_WATERMARK
) PARTITIONED BY (pt_day string, pt_hour string) 
TBLPROPERTIES (
  'streaming-source.enable'='true',
  'streaming-source.monitor-interval'='1s',
  'partition.time-extractor.timestamp-pattern'='$pt_day $pt_hour:00:00',
  'streaming-source.partition-interval'='1h'
);

-- window aggregate on the hive table
SELECT window_start, window_end, COUNT(*), MAX(y), SUM(z)
FROM TABLE(
   TUMBLE(TABLE hive_table, DESCRIPTOR(ts), INTERVAL '1' HOUR))
GROUP BY window_start, window_end;
{code}

For filesystem connector, the DDL can be:

{code:sql}
CREATE TABLE fs_table (
x int,
y string,
z int,
ts TIMESTAMP(3),
pt_day string,
pt_hour string,
WATERMARK FOR ts AS SOURCE_WATERMARK
) PARTITIONED BY (pt_day, pt_hour)
  WITH (
'connector' = 'filesystem',
'path' = '/path/to/file',
'format' = 'parquet',

'streaming-source.enable'='true',
'streaming-source.monitor-interval'='1s',
'partition.time-extractor.timestamp-pattern'='$pt_day $pt_hour:00:00',
'streaming-source.partition-interval'='1h'
);
{code}

I will explain the new function/configuration. 


### SOURCE_WATERMARK built-in function

FLIP-66[1] proposed {{SYSTEM_WATERMARK}} function for watermarks preserved in 
underlying source system. 
However, the SYSTEM prefix sounds like a Flink system generated value, but 
actually, this is a SOURCE system generated value. 
So I propose to use {{SOURCE_WATERMARK}} intead, this also keeps the concept 
align with the API of 
{{org.apache.flink.table.descriptors.Rowtime#watermarksFromSource}}.


### Table Options for Watermark

- {{partition.time-extractor.timestamp-pattern}}: this option already exists. 
This is used to extract/convert partition value to a timestamp value.
- {{streaming-source.partition-interval}}: this is a new option. It indicates 
the minimal time interval of the partitions. It's used to calculate the correct 
watermark when a partition is finished. The watermark = partition-timestamp + 
time-inteval.

### How to support watermark for existing Hive tables

We all know that we can't create a new table for an existing Hive table. So we 
should support altering existing Hive table to add the watermark inforamtion. 
This can be supported by the new ALTER TABLE syntax proposed in FLINK-21634. 
Because watermark, computed column, table options are all encoded in Hive table 
parameters, 
so other systems (e.g. Hive MR, Spark) can still read this Hive table as usual. 

{code:sql}
ALTER TABLE hive_table ADD (
  WATERMARK FOR ts AS SOURCE_WATERMARK
);

ALTER TABLE hive_table SET (
  'streaming-source.enable'='true',
  'streaming-source.monitor-interval'='1s',
  'partition.time-extractor.timestamp-pattern'='$pt_day $pt_hour:00:00',
  'streaming-source.partition-watermark.advance'='1h'
);
{code

[jira] [Created] (FLINK-21774) Do not display column names when retrun set is emtpy in SQL Client

2021-03-13 Thread Jark Wu (Jira)

Jark Wu created FLINK-21774:
---

 Summary: Do not display column names when retrun set is emtpy in 
SQL Client
 Key: FLINK-21774
 URL: https://issues.apache.org/jira/browse/FLINK-21774
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Client
Reporter: Jark Wu


Currently, SQL Client will display column names even if the return set is empty:

{code}
SHOW MODULES;
+-+
| module name |
+-+
0 row in set
!ok
{code}

In mature databases, e.g. MySQL, they only show "Empty Set" instead of column 
names:

{code}
mysql> show tables;
Empty set (0.00 sec)
{code}

We can improve this by simply omit the column names header. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21741) Support SHOW JARS statement in SQL Client

2021-03-11 Thread Jark Wu (Jira)

Jark Wu created FLINK-21741:
---

 Summary: Support SHOW JARS statement in SQL Client
 Key: FLINK-21741
 URL: https://issues.apache.org/jira/browse/FLINK-21741
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Client
Reporter: Jark Wu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21742) Support REMOVE JAR statement in SQL Client

2021-03-11 Thread Jark Wu (Jira)

Jark Wu created FLINK-21742:
---

 Summary: Support REMOVE JAR statement in SQL Client
 Key: FLINK-21742
 URL: https://issues.apache.org/jira/browse/FLINK-21742
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Client
Reporter: Jark Wu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21725) DataTypeExtractor extracts wrong fields ordering for Tuple12

2021-03-10 Thread Jark Wu (Jira)

Jark Wu created FLINK-21725:
---

 Summary: DataTypeExtractor extracts wrong fields ordering for 
Tuple12
 Key: FLINK-21725
 URL: https://issues.apache.org/jira/browse/FLINK-21725
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.12.2, 1.11.3, 1.13.0
Reporter: Jark Wu


The following test can reproduce the problem:

{code:java}
 /** Emit Tuple12 result. */
public static class JavaTableFuncTuple12
extends TableFunction<
Tuple12<
String,
String,
String,
String,
String,
String,
Integer,
Integer,
Integer,
Integer,
Integer,
Integer>> {
private static final long serialVersionUID = -8258882510989374448L;

public void eval(String str) {
collect(
Tuple12.of(
str + "_a",
str + "_b",
str + "_c",
str + "_d",
str + "_e",
str + "_f",
str.length(),
str.length() + 1,
str.length() + 2,
str.length() + 3,
str.length() + 4,
str.length() + 5));
}
}
{code}

{code:scala}
@Test
  def testCorrelateTuple12(): Unit = {
val util = streamTestUtil()
util.addTableSource[(Int, Long, String)]("MyTable", 'a, 'b, 'c)
val function = new JavaTableFuncTuple12
util.addTemporarySystemFunction("func1", function)
val sql =
  """
|SELECT *
|FROM MyTable, LATERAL TABLE(func1(c)) AS T
|""".stripMargin

util.verifyExecPlan(sql)
  }
{code}

{code}
// output plan
Correlate(invocation=[func1($cor0.c)], correlate=[table(func1($cor0.c))], 
select=[a,b,c,f0,f1,f10,f11,f2,f3,f4,f5,f6,f7,f8,f9], 
rowType=[RecordType(INTEGER a, BIGINT b, VARCHAR(2147483647) c, 
VARCHAR(2147483647) f0, VARCHAR(2147483647) f1, INTEGER f10, INTEGER f11, 
VARCHAR(2147483647) f2, VARCHAR(2147483647) f3, VARCHAR(2147483647) f4, 
VARCHAR(2147483647) f5, INTEGER f6, INTEGER f7, INTEGER f8, INTEGER f9)], 
joinType=[INNER])
+- LegacyTableSourceScan(table=[[default_catalog, default_database, MyTable, 
source: [TestTableSource(a, b, c)]]], fields=[a, b, c])
{code}

Note that there is no problem if using the legacy {{tEnv.registerFunction}} to 
register function, becuase it uses {{TypeInformation}}. However, it has problem 
if using  {{tEnv.createTemporaryFunction}} or {{CREATE FUNCTION}} syntax, 
because it uses {{TypeInference}}.

Note this problem exists in latest 1.11, 1.12, and master branch. 

I think the problem might lay in this line: 
https://github.com/apache/flink/blob/c6997c97c575d334679915c328792b8a3067cfb5/flink-table/flink-table-common/src/main/java/org/apache/flink/table/types/extraction/DataTypeExtractor.java#L562

because it orders field names by alphabetical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-162 follow-up discussion

2021-03-09 Thread Jark Wu

Thanks Leonard,

I'm also +1 with not introducing this fallback option.
It's error-prone to mix the implementation of wrong behavior and correct
behavior.
And it's better to educate users the right way in one version instead of
spanning multiple versions.

Best,
Jark

On Tue, 9 Mar 2021 at 15:15, Kurt Young  wrote:

> Hi Leonard,
>
> Thanks for this careful consideration. Given the fallback option will
> eventually change the behavior twice, which means
> potentially break user's job twice, I would also +1 to not introduce it.
>
> Best,
> Kurt
>
> On Fri, Mar 5, 2021 at 3:00 PM Leonard Xu  wrote:
>
>> Hi, all
>>
>> As the FLIP-162 discussed,  we agreed current time functions’ behavior is
>> incorrect and plan to introduce the option 
>> *t**able.exec.fallback-legacy-time-function
>> *to enable user fallback to incorrect behavior.
>>
>> (1) The option is convenient for users who want to upgrade to 1.13 but
>> don't want to change their sql job, user need to config the option value, 
>> *this
>> is the first time users influenced by these wrong functions.*
>>
>> (2) But we didn’t consider that the option will be deleted after one or
>> two major versions, users have to change their sql job again at that time
>> point, *this the second time** users influenced by these wrong
>> functions.*
>>
>> (3) Besides, maintaining two sets of functions is prone to bugs.
>>
>> I’ve discussed with some community developers offline, they tend to solve
>> these functions at once i.e. Correct the wrong functions directly and do
>> not introduce this option.
>>
>> Considering that we will delete the configuration eventually,  comparing
>> hurting users twice and bothering them for a long time, I would rather hurt
>> users once.
>> *Thus I also +1* that we should directly correct these wrong functions
>> and remove the wrong functions at the same time.
>>
>>
>> If we can make a consensus in this thread, I think we can remove this
>> option support in FLIP-162.
>> How do you think?
>>
>> Best,
>> Leonard
>>
>>
>>
>>
>>

[jira] [Created] (FLINK-21669) Support "table.dml-sync" option to execute DML statements synchronizely in TableEnvironment and SQL Client

2021-03-08 Thread Jark Wu (Jira)

Jark Wu created FLINK-21669:
---

 Summary: Support "table.dml-sync" option to execute DML statements 
synchronizely in TableEnvironment and SQL Client
 Key: FLINK-21669
 URL: https://issues.apache.org/jira/browse/FLINK-21669
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Table SQL / Client
Reporter: Jark Wu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21664) Support STATEMENT SET syntax in TableEnvironment

2021-03-08 Thread Jark Wu (Jira)

Jark Wu created FLINK-21664:
---

 Summary: Support STATEMENT SET syntax in TableEnvironment
 Key: FLINK-21664
 URL: https://issues.apache.org/jira/browse/FLINK-21664
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Jark Wu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21647) 'Run kubernetes session test (default input)' failed on Azure

2021-03-07 Thread Jark Wu (Jira)

Jark Wu created FLINK-21647:
---

 Summary: 'Run kubernetes session test (default input)' failed on 
Azure
 Key: FLINK-21647
 URL: https://issues.apache.org/jira/browse/FLINK-21647
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Affects Versions: 1.13.0
Reporter: Jark Wu


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14236=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=ff888d9b-cd34-53cc-d90f-3e446d355529=2247



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21634) ALTER TABLE statement enhancement

2021-03-05 Thread Jark Wu (Jira)

Jark Wu created FLINK-21634:
---

 Summary: ALTER TABLE statement enhancement
 Key: FLINK-21634
 URL: https://issues.apache.org/jira/browse/FLINK-21634
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API, Table SQL / Client
Reporter: Jark Wu


We already introduced ALTER TABLE statement in FLIP-69 [1], but only support to 
rename table name and change table options. One useful feature of ALTER TABLE 
statement is modifying schema. This is also heavily required by integration 
with data lakes (e.g. iceberg). 

Therefore, I propose to support following ALTER TABLE statements:

*Add Column*

{code:sql}
ALTER TABLE  
ADD COLUMN   [COMMENT column_comment]
{code}

This follows SQL standard 2011 Section 11.10. And Iceberg[2], Trino[3], 
MySQL[4] also are the same. 

*Drop Column*

{code:sql}
ALTER TABLE  DROP COLUMN 
{code}

This follows SQL standard 2011 Section 11.10. And Iceberg[2], Trino[3], 
MySQL[4] also are the same. 


*Alter Column*

{code:sql}
ALTER TABLE  ALTER COLUMN  
  SET DATA TYPE   [COMMENT column_comment]
{code}

This follows SQL standard 2011 Section 11.10. Same to PG [5], and similar to  
Iceberg[2], Trino[3], MySQL[4].

*Rename Column*
{code:sql}
ALTER TABLE  REANME COLUMN  TO 
{code}

This is not listed in SQL standard, but is also very useful. Follows the syntax 
of Iceberg[2], Trino[3], MySQL[4].

*Unset Options*
{code:sql}
ALTER TABLE  RESET (key1=val1, key2=val2, ...)
{code}

Out of SQL standard, but is useful. Has been discussed in FLINK-17845. Use 
{{RESET}} to keep align with {{SET key=value}} and {{RESET key}} proposed in 
FLIP-163. And PG[5] also uses the {{RESET}} keyword.

For example:

{code:sql}
-- add a new column 
ALTER TABLE mytable ADD COLUMN new_column STRING COMMENT 'new_column docs';

-- drop an old column
ALTER TABLE prod.db.sample DROP COLUMN legacy_name;

-- rename column name
ALTER TABLE prod.db.sample RENAME COLUMN data TO payload;

-- alter table type
ALTER TABLE prod.db.sample ALTER COLUMN measurement 
  SET DATA TYPE double COMMENT 'unit is bytes per second';
{code}


[1]: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP+69+-+Flink+SQL+DDL+Enhancement
[2]: http://iceberg.apache.org/spark-ddl/#alter-table-alter-column
[3]: https://trino.io/docs/current/sql/alter-table.html
[4]: https://dev.mysql.com/doc/refman/8.0/en/alter-table.html
[5]: https://www.postgresql.org/docs/9.1/sql-altertable.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21618) Introduce a new integration test framework for SQL Client

2021-03-04 Thread Jark Wu (Jira)

Jark Wu created FLINK-21618:
---

 Summary: Introduce a new integration test framework for SQL Client
 Key: FLINK-21618
 URL: https://issues.apache.org/jira/browse/FLINK-21618
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.13.0


Currently, adding test for a new feature in SQL Client is difficult. There is 
no clear clue where to add tests. Besides, all the tests in SQL Client module 
is somewhat unit test, there is no integration tests. That's why we can see 
many little bugs in this module. 

An end-to-end component path of SQL Client is:
SqlClient => CliClient => parse sql => invoke 

For example, {{CliClientTest}} only tests Sql Parser in CliClient, and 
{{LocalExecutorITCase}} only tests {{LocalExecutor}}. Therefore, this issue 
aims to resolve this problem, introduce a new integration framework to test 
end-to-end of SQL Client. 






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21616) Introduce a new integration test framework for SQL Client

2021-03-04 Thread Jark Wu (Jira)

Jark Wu created FLINK-21616:
---

 Summary: Introduce a new integration test framework for SQL Client
 Key: FLINK-21616
 URL: https://issues.apache.org/jira/browse/FLINK-21616
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.13.0


Currently, adding test for a new feature in SQL Client is difficult. There is 
no clear clue where to add tests. Besides, all the tests in SQL Client module 
is somewhat unit test, there is no integration tests. That's why we can see 
many little bugs in this module. 

An end-to-end component path of SQL Client is:
SqlClient => CliClient => parse sql => invoke 

For example, {{CliClientTest}} only tests Sql Parser in CliClient, and 
{{LocalExecutorITCase}} only tests {{LocalExecutor}}. Therefore, this issue 
aims to resolve this problem, introduce a new integration framework to test 
end-to-end of SQL Client. 






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21614) Introduce a new integration test framework for SQL Client

2021-03-04 Thread Jark Wu (Jira)

Jark Wu created FLINK-21614:
---

 Summary: Introduce a new integration test framework for SQL Client
 Key: FLINK-21614
 URL: https://issues.apache.org/jira/browse/FLINK-21614
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.13.0


Currently, adding test for a new feature in SQL Client is difficult. There is 
no clear clue where to add tests. Besides, all the tests in SQL Client module 
is somewhat unit test, there is no integration tests. That's why we can see 
many little bugs in this module. 

An end-to-end component path of SQL Client is:
SqlClient => CliClient => parse sql => invoke 

For example, {{CliClientTest}} only tests Sql Parser in CliClient, and 
{{LocalExecutorITCase}} only tests {{LocalExecutor}}. Therefore, this issue 
aims to resolve this problem, introduce a new integration framework to test 
end-to-end of SQL Client. 






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21615) Introduce a new integration test framework for SQL Client

2021-03-04 Thread Jark Wu (Jira)

Jark Wu created FLINK-21615:
---

 Summary: Introduce a new integration test framework for SQL Client
 Key: FLINK-21615
 URL: https://issues.apache.org/jira/browse/FLINK-21615
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.13.0


Currently, adding test for a new feature in SQL Client is difficult. There is 
no clear clue where to add tests. Besides, all the tests in SQL Client module 
is somewhat unit test, there is no integration tests. That's why we can see 
many little bugs in this module. 

An end-to-end component path of SQL Client is:
SqlClient => CliClient => parse sql => invoke 

For example, {{CliClientTest}} only tests Sql Parser in CliClient, and 
{{LocalExecutorITCase}} only tests {{LocalExecutor}}. Therefore, this issue 
aims to resolve this problem, introduce a new integration framework to test 
end-to-end of SQL Client. 






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-03-04 Thread Jark Wu

big +1 from my side.

Best,
Jark

On Thu, 4 Mar 2021 at 20:59, Leonard Xu  wrote:

> +1 for the roadmap.
>
> Thanks Timo for driving this.
>
> Best,
> Leonard
>
> > 在 2021年3月4日，20:40，Timo Walther  写道：
> >
> > Last call for feedback on this topic.
> >
> > It seems everyone agrees to finally complete FLIP-32. Since FLIP-32 has
> been accepted for a very long time, I think we don't need another voting
> thread for executing the last implementation step. Please let me know if
> you think differently.
> >
> > I will start deprecating the affected classes and interfaces beginning
> of next week.
> >
> > Regards,
> > Timo
> >
> >
> > On 26.02.21 15:46, Seth Wiesman wrote:
> >> Strong +1
> >> Having two planners is confusing to users and the diverging semantics
> make
> >> it difficult to provide useful learning material. It is time to rip the
> >> bandage off.
> >> Seth
> >> On Fri, Feb 26, 2021 at 12:54 AM Kurt Young  wrote:
> >>>  breaking
> >>> change.>
> >>>
> >>> Hi Timo,
> >>>
> >>> First of all I want to thank you for introducing this planner design
> back
> >>> in 1.9, this is a great work
> >>> that allows lots of blink features to be merged to Flink in a
> reasonably
> >>> short time. It greatly
> >>> accelerates the evolution speed of Table & SQL.
> >>>
> >>> Everything comes with a cost, as you said, right now we are facing the
> >>> overhead of maintaining
> >>> two planners and it causes bugs and also increases imbalance between
> these
> >>> two planners. As
> >>> a developer and also for the good of all Table & SQL users, I also
> think
> >>> it's better for us to be more
> >>> focused on a single planner.
> >>>
> >>> Your proposed roadmap looks good to me, +1 from my side and thanks
> >>> again for all your efforts!
> >>>
> >>> Best,
> >>> Kurt
> >>>
> >>>
> >>> On Thu, Feb 25, 2021 at 5:01 PM Timo Walther 
> wrote:
> >>>
>  Hi everyone,
> 
>  since Flink 1.9 we have supported two SQL planners. Most of the
> original
>  plan of FLIP-32 [1] has been implemented. The Blink code merge has
> been
>  completed and many additional features have been added exclusively to
>  the new planner. The new planner is now in a much better shape than
> the
>  legacy one.
> 
>  In order to avoid user confusion, reduce duplicate code, and improve
>  maintainability and testing times of the Flink project as a whole we
>  would like to propose the following steps to complete FLIP-32:
> 
>  In Flink 1.13:
>  - Deprecate the `flink-table-planner` module
>  - Deprecate `BatchTableEnvironment` for both Java, Scala, and Python
> 
>  In Flink 1.14:
>  - Drop `flink-table-planner` early
>  - Drop many deprecated interfaces and API on demand
>  - Rename `flink-table-planner-blink` to `flink-table-planner`
>  - Rename `flink-table-runtime-blink` to `flink-table-runtime`
>  - Remove references of "Blink" in the code base
> 
>  This will have an impact on users that still use DataSet API together
>  with Table API. With this change we will not support converting
> between
>  DataSet API and Table API anymore. We hope to compensate the missing
>  functionality in the new unified TableEnvironment and/or the batch
> mode
>  in DataStream API during 1.14 and 1.15. For this, we are looking for
>  further feedback which features are required in Table API/DataStream
> API
>  to have a smooth migration path.
> 
>  Looking forward to your feedback.
> 
>  Regards,
>  Timo
> 
>  [1]
> 
> 
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions
> 
> >>>
> >
>
>

[jira] [Created] (FLINK-21579) Support "SHOW USER FUNCTIONS" statement

2021-03-03 Thread Jark Wu (Jira)

Jark Wu created FLINK-21579:
---

 Summary: Support "SHOW USER FUNCTIONS" statement
 Key: FLINK-21579
 URL: https://issues.apache.org/jira/browse/FLINK-21579
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Table SQL / Client
Reporter: Jark Wu


Currently, the {{SHOW FUNCTIONS}} lists all user-defined and built-in functions 
which is very verbose. We already introduced {{listUserDefinedFunctions()}} 
method on {{TableEnvironment}}. I think we can introduce a dedicate syntax to 
show user-defined only functions which simply calls 
{{listUserDefinedFunctions()}}. That would be very useful. 

I propose to use {{SHOW USER FUNCTIONS}} which is also supported by Snowflake 
[1] and Spark[2].


[1]: https://docs.snowflake.com/en/sql-reference/sql/show-user-functions.html
[2]: 
https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-aux-show-functions.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-162: Consistent Flink SQL time function behavior

2021-03-02 Thread Jark Wu

+1 (binding)

Best,
Jark

On Tue, 2 Mar 2021 at 10:42, Leonard Xu  wrote:

> Hi all,
>
> I would like to start the vote for FLIP-162 [1], which has been discussed
> and
> reached a consensus in the discussion thread [2].
>
> Please vote +1 to approve the FLIP, or -1 with a comment.
>
> The vote will be open until March 5th (72h), unless there is an objection
> or not enough votes.
>
> Best,
> Leonard
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-162:+Consistent+Flink+SQL+time+function+behavior
> >
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-162-Consistent-Flink-SQL-time-function-behavior-tc48116.html
> <
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-162-Consistent-Flink-SQL-time-function-behavior-tc48116.html
> >

Re: [VOTE] FLIP-163: SQL Client Improvements

2021-03-02 Thread Jark Wu

+1 to the updated FLIP.

Best,
Jark

On Wed, 3 Mar 2021 at 10:30, Leonard Xu  wrote:

> +1 (non-binding)
>
> The updated FLIP looks well.
>
> Best,
> Leonard
>
>
> > 在 2021年3月2日，22:27，Shengkai Fang  写道：
> >
> > already updated the FLIP[2]. It seems the vote has lasted  for a long
> time.
>
>

[jira] [Created] (FLINK-21553) WindowDistinctAggregateITCase#testHopWindow_Cube is unstable

2021-03-01 Thread Jark Wu (Jira)

Jark Wu created FLINK-21553:
---

 Summary: WindowDistinctAggregateITCase#testHopWindow_Cube is 
unstable
 Key: FLINK-21553
 URL: https://issues.apache.org/jira/browse/FLINK-21553
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Jark Wu
 Fix For: 1.13.0


See 
https://dev.azure.com/imjark/Flink/_build/results?buildId=422=logs=d1352042-8a7d-50b6-3946-a85d176b7981=b2322052-d503-5552-81e2-b3a532a1d7e8




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-03-01 Thread Jark Wu

bout the option. If the
> > > summary
> > > >>> has errors, please correct me.
> > > >>>
> > > >>> `table.dml-sync`:
> > > >>> - take effect for `executeMultiSql` and sql client
> > > >>> - benefit: SQL script portability. One script for all platforms.
> > > >>> - drawback: Don't work for `TableEnvironment#executeSql`.
> > > >>>
> > > >>> `table.multi-dml-sync`:
> > > >>> - take effect for `executeMultiSql` and sql client
> > > >>> - benefit: SQL script portability
> > > >>> - drawback: It's confused when the sql script has one dml statement
> > but
> > > >>> need to set option `table.multi-dml-sync`
> > > >>>
> > > >>> `client.dml-sync`:
> > > >>> - take effect for sql client only
> > > >>> - benefit: clear definition.
> > > >>> - drawback: Every platform needs to define its own option. Bad SQL
> > > script
> > > >>> portability.
> > > >>>
> > > >>> Just as Jark said, I think the `table.dml-sync` is a good choice if
> > we
> > > >> can
> > > >>> extend its scope and make this option works for `executeSql`.
> > > >>> It's straightforward and users can use this option now in table
> api.
> > > The
> > > >>> drawback is the  `TableResult#await` plays the same role as the
> > option.
> > > >> I
> > > >>> don't think the drawback is really critical because many systems
> have
> > > >>> commands play the same role with the different names.
> > > >>>
> > > >>> Best,
> > > >>> Shengkai
> > > >>>
> > > >>> Timo Walther  于2021年2月25日周四 下午4:23写道：
> > > >>>
> > > >>>> The `table.` prefix is meant to be a general option in the table
> > > >>>> ecosystem. Not necessarily attached to Table API or SQL Client.
> > That's
> > > >>>> why SQL Client is also located in the `flink-table` module.
> > > >>>>
> > > >>>> My main concern is the SQL script portability. Declaring the
> > > sync/async
> > > >>>> behavior will happen in many SQL scripts. And users should be
> easily
> > > >>>> switch from SQL Client to some commercial product without the need
> > of
> > > >>>> changing the script again.
> > > >>>>
> > > >>>> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync`
> > > later
> > > >>>> but that would mean introducing future confusion. An app name
> (what
> > > >>>> `sql-client` kind of is) should not be part of a config option key
> > if
> > > >>>> other apps will need the same kind of option.
> > > >>>>
> > > >>>> Regards,
> > > >>>> Timo
> > > >>>>
> > > >>>>
> > > >>>> On 24.02.21 08:59, Jark Wu wrote:
> > > >>>>>>  From my point of view, I also prefer "sql-client.dml-sync",
> > > >>>>> because the behavior of this configuration is very clear.
> > > >>>>> Even if we introduce a new config in the future, e.g.
> > > `table.dml-sync`,
> > > >>>>> we can also deprecate the sql-client one.
> > > >>>>>
> > > >>>>> Introducing a "table."  configuration without any implementation
> > > >>>>> will confuse users a lot, as they expect it should take effect on
> > > >>>>> the Table API.
> > > >>>>>
> > > >>>>> If we want to introduce an unified "table.dml-sync" option, I
> > prefer
> > > >>>>> it should be implemented on Table API and affect all the DMLs on
> > > >>>>> Table API (`tEnv.executeSql`, `Table.executeInsert`,
> > `StatementSet`),
> > > >>>>> as I have mentioned before [1].
> > > >>>>>
> > > >>>>>> It would be very straightforward that it affects all the DMLs on
> > SQL
> > > >> CLI
> > > >>>>> and
> > > >>>>> TableEnvironment (including `executeSql`, `StatementSet`,

[jira] [Created] (FLINK-21542) Add documentation for supporting INSERT INTO specific columns

2021-03-01 Thread Jark Wu (Jira)

Jark Wu created FLINK-21542:
---

 Summary: Add documentation for supporting INSERT INTO specific 
columns
 Key: FLINK-21542
 URL: https://issues.apache.org/jira/browse/FLINK-21542
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: Jark Wu
 Fix For: 1.13.0


We have supported INSERT INTO specific columns in FLINK-18726, but no add 
documentation yet. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Splitting User support mailing list

2021-03-01 Thread Jark Wu

I also have some concerns about splitting python and sql.
Because I have seen some SQL questions users reported but is related to
deployment or state backend.

Best,
Jark

On Mon, 1 Mar 2021 at 17:15, Konstantin Knauf 
wrote:

> Hi Roman,
>
> I slightly +1 for a list dedicated to Statefun users, but -1 for splitting
> up the rest. I think there are still a lot of crosscutting concerns between
> Python, DataStream, Table API and SQL where users of another API can also
> help out, too. It also requires users to think about which lists to
> subscribe/write to, instead of simply subscribing to one list.
>
> Why do you think the quality and speed of answers would improve with
> dedicated lists?
>
> Best,
>
> Konstantin
>
>
>
>
>
> On Mon, Mar 1, 2021 at 10:09 AM xiao...@ysstech.com 
> wrote:
>
> > Hi Roman,
> >
> > This is a very good idea. I will look forward to the official setting up
> > "sub-lists" as soon as possible and sharing development experience and
> > problems with friends in a certain field.
> >
> > Regards,
> > yue
> >
> >
> >
> > xiao...@ysstech.com
> >
> > From: Roman Khachatryan
> > Date: 2021-03-01 16:48
> > To: dev
> > Subject: [DISCUSS] Splitting User support mailing list
> > Hi everyone,
> >
> > I'd like to propose to extract several "sub-lists" from our user mailing
> > list (u...@flink.apache.org).
> >
> > For example,
> > - user-sql@flink.a.o (Python)
> > - user-statefun@f.a.o (StateFun)
> > - user-py@f.a.o. (SQL/TableAPI)
> > And u...@flink.apache.org will remain the main or "default" list.
> >
> > That would improve the quality and speed of the answers and allow
> > developers to concentrate on the relevant topics.
> >
> > At the downside, this would lessen the exposure to the various Flink
> areas
> > for lists maintainers.
> >
> > What do you think?
> >
> > Regards,
> > Roman
> >
>
>
> --
>
> Konstantin Knauf | Head of Product
>
> +49 160 91394525
>
>
> Follow us @VervericaData Ververica 
>
>
> --
>
> Join Flink Forward  - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Yip Park Tung Jason, Jinwei (Kevin) Zhang, Karl Anton
> Wehner
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-23 Thread Jark Wu

>From my point of view, I also prefer "sql-client.dml-sync",
because the behavior of this configuration is very clear.
Even if we introduce a new config in the future, e.g. `table.dml-sync`,
we can also deprecate the sql-client one.

Introducing a "table."  configuration without any implementation
will confuse users a lot, as they expect it should take effect on
the Table API.

If we want to introduce an unified "table.dml-sync" option, I prefer
it should be implemented on Table API and affect all the DMLs on
Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`),
as I have mentioned before [1].

> It would be very straightforward that it affects all the DMLs on SQL CLI
and
TableEnvironment (including `executeSql`, `StatementSet`,
`Table#executeInsert`, etc.).
This can also make SQL CLI easy to support this configuration by passing
through to the TableEnv.

Best,
Jark


[1]:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html


On Wed, 24 Feb 2021 at 10:39, Kurt Young  wrote:

> If we all agree the option should only be handled by sql client, then why
> don't we
> just call it `sql-client.dml-sync`? As you said, calling it
> `table.dml-sync` but has no
> affection in `TableEnv.executeSql("INSERT INTO")` will also cause a big
> confusion for
> users.
>
> The only concern I saw is if we introduce
> "TableEnvironment.executeMultiSql()" in the
> future, how do we control the synchronization between statements? TBH I
> don't really
> see a strong requirement for such interfaces. Right now, we have a pretty
> clear semantic
> of `TableEnv.executeSql`, and it's very convenient for users if they want
> to execute multiple
> sql statements. They can simulate either synced or async execution with
> this building block.
>
> This will introduce slight overhead for users, but compared to the
> confusion we might
> cause if we introduce such a method of our own, I think it's better to wait
> for some more
> feedback.
>
> Best,
> Kurt
>
>
> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther  wrote:
>
> > Hi Kurt,
> >
> > we can also shorten it to `table.dml-sync` if that would help. Then it
> > would confuse users that do a regular `.executeSql("INSERT INTO")` in a
> > notebook session.
> >
> > In any case users will need to learn the semantics of this option.
> > `table.multi-dml-sync` should be described as "If a you are in a multi
> > statement environment, execute DMLs synchrounous.". I don't have a
> > strong opinion on shortening it to `table.dml-sync`.
> >
> > Just to clarify the implementation: The option should be handled by the
> > SQL Client only, but the name can be shared accross platforms.
> >
> > Regards,
> > Timo
> >
> >
> > On 23.02.21 09:54, Kurt Young wrote:
> > > Sorry for the late reply, but I'm confused by `table.multi-dml-sync`.
> > >
> > > IIUC this config will take effect with 2 use cases:
> > > 1. SQL client, either interactive mode or executing multiple statements
> > via
> > > -f. In most cases,
> > > there will be only one INSERT INTO statement but we are controlling the
> > > sync/async behavior
> > > with "*multi-dml*-sync". I think this will confuse a lot of users.
> > Besides,
> > >
> > > 2. TableEnvironment#executeMultiSql(), but this is future work, we are
> > also
> > > not sure if we will
> > > really introduce this in the future.
> > >
> > > I would prefer to introduce this option for only sql client. For
> > platforms
> > > Timo mentioned which
> > > need to control such behavior, I think it's easy and flexible to
> > introduce
> > > one on their own.
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang 
> > wrote:
> > >
> > >> Hi everyone.
> > >>
> > >> Sorry for the late response.
> > >>
> > >> For `execution.runtime-mode`, I think it's much better than
> > >> `table.execution.mode`. Thanks for Timo's suggestions!
> > >>
> > >> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should
> clarify
> > the
> > >> usage of the SHOW CREATE TABLE statements. It should be allowed to
> > specify
> > >> the table that is fully qualified and only works for the table that is
> > >> created by the sql statements.
> > >>
> > >> I have updated the FLIP with s

[jira] [Created] (FLINK-21456) TableResult#print() should correctly stringify values of TIMESTAMP type in SQL format

2021-02-23 Thread Jark Wu (Jira)

Jark Wu created FLINK-21456:
---

 Summary: TableResult#print() should correctly stringify values of 
TIMESTAMP type in SQL format
 Key: FLINK-21456
 URL: https://issues.apache.org/jira/browse/FLINK-21456
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Jark Wu


Currently {{TableResult#print()} simply use {{Object#toString()}} as the string 
representation of the fields. This is not SQL compliant, because for TIMESTAMP 
and TIMESTAMP_LZ, the string representation should be {{2021-02-23 17:30:00}} 
instead of {{2021-02-23T17:30:00Z}}.


Note: we may need to update {{PrintUtils#rowToString(Row)}} and also SQL Client 
which invokes this method. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-163: SQL Client Improvements

2021-02-21 Thread Jark Wu

+1 (binding)

Best,
Jark

On Mon, 22 Feb 2021 at 11:06, Shengkai Fang  wrote:

> Hi devs
>
> It seems we have reached consensus on FLIP-163[1] in the discussion[2]. So
> I'd like to start the vote for this FLIP.
>
> Please vote +1 to approve the FLIP, or -1 with a comment.
>
> The vote will be open for 72 hours, until Feb. 25 2021 12:00 AM UTC+8,
> unless there's an objection.
>
> Best,
> Shengkai
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-15 Thread Jark Wu

Hi Ingo,

1) I think you are right, the table path should be fully-qualified.

2) I think this is also a good point. The SHOW CREATE TABLE
only aims to print DDL for the tables registered using SQL CREATE TABLE
DDL.
If a table is registered using Table API,  e.g.
`StreamTableEnvironment#createTemporaryView(String, DataStream)`,
currently it's not possible to print DDL for such tables.
I think we should point it out in the FLIP.

Best,
Jark



On Mon, 15 Feb 2021 at 21:33, Ingo Bürk  wrote:

> Hi all,
>
> I have a couple questions about the SHOW CREATE TABLE statement.
>
> 1) Contrary to the example in the FLIP I think the returned DDL should
> always have the table identifier fully-qualified. Otherwise the DDL depends
> on the current context (catalog/database), which could be surprising,
> especially since "the same" table can behave differently if created in
> different catalogs.
> 2) How should this handle tables which cannot be fully characterized by
> properties only? I don't know if there's an example for this yet, but
> hypothetically this is not currently a requirement, right? This isn't as
> much of a problem if this syntax is SQL-client-specific, but if it's
> general Flink SQL syntax we should consider this (one way or another).
>
>
> Regards
> Ingo
>
> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther  wrote:
>
> > Hi Shengkai,
> >
> > thanks for updating the FLIP.
> >
> > I have one last comment for the option `table.execution.mode`. Should we
> > already use the global Flink option `execution.runtime-mode` instead?
> >
> > We are using Flink's options where possible (e.g. `pipeline.name` and
> > `parallism.default`) why not also for batch/streaming mode?
> >
> > The description of the option matches to the Blink planner behavior:
> >
> > ```
> > Among other things, this controls task scheduling, network shuffle
> > behavior, and time semantics.
> > ```
> >
> > Regards,
> > Timo
> >
> > On 10.02.21 06:30, Shengkai Fang wrote:
> > > Hi, guys.
> > >
> > > I have updated the FLIP.  It seems we have reached agreement. Maybe we
> > can
> > > start the vote soon. If anyone has other questions, please leave your
> > > comments.
> > >
> > > Best，
> > > Shengkai
> > >
> > > Rui Li 于2021年2月9日 周二下午7:52写道：
> > >
> > >> Hi guys,
> > >>
> > >> The conclusion sounds good to me.
> > >>
> > >> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang 
> wrote:
> > >>
> > >>> Hi, Timo, Jark.
> > >>>
> > >>> I am fine with the new option name.
> > >>>
> > >>> Best,
> > >>> Shengkai
> > >>>
> > >>> Timo Walther 于2021年2月9日 周二下午5:35写道：
> > >>>
> > >>>> Yes, `TableEnvironment#executeMultiSql()` can be future work.
> > >>>>
> > >>>> @Rui, Shengkai: Are you also fine with this conclusion?
> > >>>>
> > >>>> Thanks,
> > >>>> Timo
> > >>>>
> > >>>> On 09.02.21 10:14, Jark Wu wrote:
> > >>>>> I'm fine with `table.multi-dml-sync`.
> > >>>>>
> > >>>>> My previous concern about "multi" is that DML in CLI looks like
> > >> single
> > >>>>> statement.
> > >>>>> But we can treat CLI as a multi-line accepting statements from
> > >> opening
> > >>> to
> > >>>>> closing.
> > >>>>> Thus, I'm fine with `table.multi-dml-sync`.
> > >>>>>
> > >>>>> So the conclusion is `table.multi-dml-sync` (false by default), and
> > >> we
> > >>>> will
> > >>>>> support this config
> > >>>>> in SQL CLI first, will support it in
> > >> TableEnvironment#executeMultiSql()
> > >>>> in
> > >>>>> the future, right?
> > >>>>>
> > >>>>> Best,
> > >>>>> Jark
> > >>>>>
> > >>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther 
> > >> wrote:
> > >>>>>
> > >>>>>> Hi everyone,
> > >>>>>>
> > >>>>>> I understand Rui's concerns. `table.dml-sync` should not apply to
> > >>>>>> regular `executeSql`. Actually, this option makes only sense when
> > >>>>>> executing multi statements. Once we have a
> > >>>>>> `TableEnvironment.executeMultiSql()` this config could be
> > >> considered.
> > >>>>>>
> > >>>>>> Maybe we can find a better generic name? Other platforms will also
> > >>> need
> > >>>>>> to have this config option, which is why I would like to avoid a
> SQL
> > >>>>>> Client specific option. Otherwise every platform has to come up
> with
> > >>>>>> this important config option separately.
> > >>>>>>
> > >>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other
> > >>> opinions?
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>> Timo
> > >>>>>>
> > >>>>>> On 09.02.21 08:50, Shengkai Fang wrote:
> > >>>>>>> Hi, all.
> > >>>>>>>
> > >>>>>>> I think it may cause user confused. The main problem is  we have
> no
> > >>>> means
> > >>>>>>> to detect the conflict configuration, e.g. users set the option
> > >> true
> > >>>> and
> > >>>>>>> use `TableResult#await` together.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Shengkai.
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >> --
> > >> Best regards!
> > >> Rui Li
> > >>
> > >
> >
> >
>

Re: [VOTE] FLIP-164: Improve Schema Handling in Catalogs

2021-02-12 Thread Jark Wu

+1 (binding)

Best,
Jark

> 2021年2月12日 20:37，Dawid Wysakowicz  写道：
> 
> +1 (binding)
> 
> Best,
> 
> Dawid
> 
> On 12/02/2021 13:33, Timo Walther wrote:
>> Hi everyone,
>> 
>> I'd like to start a vote on FLIP-164 [1] which was discussed in [2].
>> 
>> The vote will be open for at least 72 hours. Unless there are any
>> objections, I'll close it by February 17th, 2021 (due to weekend) if
>> we have received sufficient votes.
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs
>> [2]
>> https://lists.apache.org/thread.html/r79064a448d38d5b3d091dfb4703ae3eb94d5c6dd3970f120be7d5b13%40%3Cdev.flink.apache.org%3E
>> 
>> Regards,
>> Timo
>

Re: [DISCUSS] FLIP-164: Improve Schema Handling in Catalogs

2021-02-09 Thread Jark Wu

Hi Timo,

1) I'm fine with `Column`, but are we going to introduce new interfaces
for `UniqueConstraint` and `WatermarkSpec`? If we want to introduce
a new stack, it would be better to have a different name, otherwise,
it's easy to use a wrong class for users.

Best,
Jark

On Wed, 10 Feb 2021 at 09:49, Rui Li  wrote:

> I see. Makes sense to me. Thanks Timo for the detailed explanation!
>
> On Tue, Feb 9, 2021 at 9:48 PM Timo Walther  wrote:
>
> > Hi Rui,
> >
> > 1. It depends whether you would like to declare (unresolved) or use
> > (resolved) a schema. In catalogs and APIs, people would actually like to
> > declare a schema. Because the schema might reference objects from other
> > catalogs etc. However, whenever the schema comes out of the framework it
> > is fully resolved and people can use to configure their UI, connector,
> etc.
> > 2. No, `getTable` doesn't have to return a resolved schema. Actually,
> > this was my initial design (see Rejected Alternatives 1) where we pass
> > the SchemaResolver into the Catalog. However, a catalog must not deal
> > with resolution. When storing a table we need a resolved schema to
> > perist the fully expanded properties, however, when reading those
> > properties in again the schema can be resolved in a later stage.
> >
> > Regards,
> > Timo
> >
> > On 09.02.21 14:07, Rui Li wrote:
> > > Hi Timo,
> > >
> > > Thanks for the FLIP. It looks good to me overall. I have two questions.
> > > 1. When should we use a resolved schema and when to use an unresolved
> > one?
> > > 2. The FLIP mentions only resolved tables/views can be stored into a
> > > catalog. Does that mean the getTable method should also return a
> resolved
> > > object?
> > >
> > > On Tue, Feb 9, 2021 at 6:29 PM Timo Walther 
> wrote:
> > >
> > >> Hi Jark,
> > >>
> > >> thanks for your feedback. Let me answer some of your comments:
> > >>
> > >> 1) Since we decided to build an entire new stack, we can also
> introduce
> > >> better names for columns, constraints, and watermark spec. My goal was
> > >> to shorten the names during this refactoring. Therefore, `TableSchema`
> > >> becomes `Schema` and `TableColumn` becomes `Column`. This also fits
> > >> better to a `CatalogView` that has a schema but is actually not a
> table
> > >> but a view. So `Column` is very generic. What do you think?
> > >>
> > >> 2) `ComputedColumn` and `WatermarkSpec` of the new generation will
> store
> > >> `ResolvedExpression`.
> > >>
> > >> 3) I adopted most of the methods from `TableSchema` in
> `ResolvedSchema`.
> > >> However, I skipped `getColumnDataTypes()` because the behavior is not
> > >> clear to me. Should it include computed columns or virtual metadata
> > >> columns? I think we should force users to think about what they
> require.
> > >> Otherwise we implicitly introduce bugs.
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >> On 09.02.21 10:56, Jark Wu wrote:
> > >>> Hi Timo,
> > >>>
> > >>> The messy TableSchema confuses many developers.
> > >>> It's great to see we can finally come up with a clean interface
> > hierarchy
> > >>> and still backward compatible.
> > >>>
> > >>> Thanks for preparing the nice FLIP. It looks good to me. I have some
> > >> minor
> > >>> comments:
> > >>>
> > >>> 1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn`
> instead
> > >> of
> > >>> `Column`?
> > >>>
> > >>> 2) You mentioned ResolvedSchema should store ResolvedExpression,
> should
> > >> we
> > >>> extend
> > >>> `ComputedColumn` and `WatermarkSpec` to allow
> `ResolvedExpression`?
> > >>>
> > >>> 3) `ResolvedSchema` aims to replace `TableSchema`, it would be better
> > to
> > >>> add un-deprecated
> > >>> methods of `TableSchema` into `ResolvedSchema`
> > >>> (e.g. `getColumnDataTypes()`).
> > >>> Then users can have a smooth migration.
> > >>>
> > >>> Best,
> > >>> Jark
> > >>>
> > >>> On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz <
> dwysakow...@apache.org>
> > >>> wrote:
> > >>>
> > >>>

Re: [DISCUSS] FLIP-164: Improve Schema Handling in Catalogs

2021-02-09 Thread Jark Wu

Hi Timo,

The messy TableSchema confuses many developers.
It's great to see we can finally come up with a clean interface hierarchy
and still backward compatible.

Thanks for preparing the nice FLIP. It looks good to me. I have some minor
comments:

1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn` instead of
`Column`?

2) You mentioned ResolvedSchema should store ResolvedExpression, should we
extend
  `ComputedColumn` and `WatermarkSpec` to allow `ResolvedExpression`?

3) `ResolvedSchema` aims to replace `TableSchema`, it would be better to
add un-deprecated
methods of `TableSchema` into `ResolvedSchema`
(e.g. `getColumnDataTypes()`).
Then users can have a smooth migration.

Best,
Jark

On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz 
wrote:

> Hi Timo,
>
> From my perspective the proposed changes look good. I agree it is an
> important step towards FLIP-129 and FLIP-136. Personally I feel
> comfortable voting on the document.
>
> Best,
>
> Dawid
>
> On 05/02/2021 16:09, Timo Walther wrote:
> > Hi everyone,
> >
> > you might have seen that we discussed a better schema API in past as
> > part of FLIP-129 and FLIP-136. We also discussed this topic during
> > different releases:
> >
> > https://issues.apache.org/jira/browse/FLINK-17793
> >
> > Jark and I had an offline discussion how we can finally fix this
> > shortcoming and maintain backwards compatibile for a couple of
> > releases to give people time to update their code.
> >
> > I would like to propose the following FLIP:
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs
> >
> >
> > The FLIP updates the class hierarchy to achieve the following goals:
> >
> > - make it visible whether a schema is resolved or unresolved and when
> > the resolution happens
> > - offer a unified API for FLIP-129, FLIP-136, and catalogs
> > - allow arbitrary data types and expressions in the schema for
> > watermark spec or columns
> > - have access to other catalogs for declaring a data type or
> > expression via CatalogManager
> > - a cleaned up TableSchema
> > - remain backwards compatible in the persisted properties and API
> >
> > Looking forward to your feedback.
> >
> > Thanks,
> > Timo
>
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-09 Thread Jark Wu

I'm fine with `table.multi-dml-sync`.

My previous concern about "multi" is that DML in CLI looks like single
statement.
But we can treat CLI as a multi-line accepting statements from opening to
closing.
Thus, I'm fine with `table.multi-dml-sync`.

So the conclusion is `table.multi-dml-sync` (false by default), and we will
support this config
in SQL CLI first, will support it in TableEnvironment#executeMultiSql() in
the future, right?

Best,
Jark

On Tue, 9 Feb 2021 at 16:37, Timo Walther  wrote:

> Hi everyone,
>
> I understand Rui's concerns. `table.dml-sync` should not apply to
> regular `executeSql`. Actually, this option makes only sense when
> executing multi statements. Once we have a
> `TableEnvironment.executeMultiSql()` this config could be considered.
>
> Maybe we can find a better generic name? Other platforms will also need
> to have this config option, which is why I would like to avoid a SQL
> Client specific option. Otherwise every platform has to come up with
> this important config option separately.
>
> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other opinions?
>
> Regards,
> Timo
>
> On 09.02.21 08:50, Shengkai Fang wrote:
> > Hi, all.
> >
> > I think it may cause user confused. The main problem is  we have no means
> > to detect the conflict configuration, e.g. users set the option true and
> > use `TableResult#await` together.
> >
> > Best,
> > Shengkai.
> >
>
>

[jira] [Created] (FLINK-21327) Support window TVF in batch mode

2021-02-08 Thread Jark Wu (Jira)

Jark Wu created FLINK-21327:
---

 Summary: Support window TVF in batch mode
 Key: FLINK-21327
 URL: https://issues.apache.org/jira/browse/FLINK-21327
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Jark Wu


As a batch and streaming unified engine, we should also support to run window 
TVF in batch mode. Then users can use one query with streaming mode to produce 
data in real-time and use the same query with batch mode to backfill data for a 
specific day.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-08 Thread Jark Wu

Hi Rui,

That's a good point. From the naming of the option, I prefer to get sync
behavior.
It would be very straightforward that it affects all the DMLs on SQL CLI
and
TableEnvironment (including `executeSql`, `StatementSet`,
`Table#executeInsert`, etc.).
This can also make SQL CLI easy to support this configuration by passing
through to the TableEnv.

Best,
Jark

On Tue, 9 Feb 2021 at 10:07, Rui Li  wrote:

> Hi,
>
> Glad to see we have reached consensus on option #2. +1 to it.
>
> Regarding the name, I'm fine with `table.dml-async`. But I wonder whether
> this config also applies to table API. E.g. if a user
> sets table.dml-async=false and calls TableEnvironment::executeSql to run a
> DML, will he get sync behavior?
>
> On Mon, Feb 8, 2021 at 11:28 PM Jark Wu  wrote:
>
>> Ah, I just forgot the option name.
>>
>> I'm also fine with `table.dml-async`.
>>
>> What do you think @Rui Li  @Shengkai Fang
>>  ?
>>
>> Best,
>> Jark
>>
>> On Mon, 8 Feb 2021 at 23:06, Timo Walther  wrote:
>>
>>> Great to hear that. Can someone update the FLIP a final time before we
>>> start a vote?
>>>
>>> We should quickly discuss how we would like to name the config option
>>> for the async/sync mode. I heared voices internally that are strongly
>>> against calling it "detach" due to historical reasons with a Flink job
>>> detach mode. How about `table.dml-async`?
>>>
>>> Thanks,
>>> Timo
>>>
>>>
>>> On 08.02.21 15:55, Jark Wu wrote:
>>> > Thanks Timo,
>>> >
>>> > I'm +1 for option#2 too.
>>> >
>>> > I think we have addressed all the concerns and can start a vote.
>>> >
>>> > Best,
>>> > Jark
>>> >
>>> > On Mon, 8 Feb 2021 at 22:19, Timo Walther  wrote:
>>> >
>>> >> Hi Jark,
>>> >>
>>> >> you are right. Nesting STATEMENT SET and ASYNC might be too verbose.
>>> >>
>>> >> So let's stick to the config option approach.
>>> >>
>>> >> However, I strongly believe that we should not use the batch/streaming
>>> >> mode for deriving semantics. This discussion is similar to time
>>> function
>>> >> discussion. We should not derive sync/async submission behavior from a
>>> >> flag that should only influence runtime operators and the incremental
>>> >> computation. Statements for bounded streams should have the same
>>> >> semantics in batch mode.
>>> >>
>>> >> I think your proposed option 2) is a good tradeoff. For the following
>>> >> reasons:
>>> >>
>>> >> pros:
>>> >> - by default, batch and streaming behave exactly the same
>>> >> - SQL Client CLI behavior does not change compared to 1.12 and remains
>>> >> async for batch and streaming
>>> >> - consistent with the async Table API behavior
>>> >>
>>> >> con:
>>> >> - batch files are not 100% SQL compliant by default
>>> >>
>>> >> The last item might not be an issue since we can expect that users
>>> have
>>> >> long-running jobs and prefer async execution in most cases.
>>> >>
>>> >> Regards,
>>> >> Timo
>>> >>
>>> >>
>>> >> On 08.02.21 14:15, Jark Wu wrote:
>>> >>> Hi Timo,
>>> >>>
>>> >>> Actually, I'm not in favor of explicit syntax `BEGIN ASYNC;... END;`.
>>> >>> Because it makes submitting streaming jobs very verbose, every INSERT
>>> >> INTO
>>> >>> and STATEMENT SET must be wrapped in the ASYNC clause which is
>>> >>> not user-friendly and not backward-compatible.
>>> >>>
>>> >>> I agree we will have unified behavior but this is at the cost of
>>> hurting
>>> >>> our main users.
>>> >>> I'm worried that end users can't understand the technical decision,
>>> and
>>> >>> they would
>>> >>> feel streaming is harder to use.
>>> >>>
>>> >>> If we want to have an unified behavior, and let users decide what's
>>> the
>>> >>> desirable behavior, I prefer to have a config option. A Flink
>>> cluster can
>>> >>> be set to async, then
>>> >>> users don't need to wrap every DM

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-08 Thread Jark Wu

Ah, I just forgot the option name.

I'm also fine with `table.dml-async`.

What do you think @Rui Li  @Shengkai Fang
 ?

Best,
Jark

On Mon, 8 Feb 2021 at 23:06, Timo Walther  wrote:

> Great to hear that. Can someone update the FLIP a final time before we
> start a vote?
>
> We should quickly discuss how we would like to name the config option
> for the async/sync mode. I heared voices internally that are strongly
> against calling it "detach" due to historical reasons with a Flink job
> detach mode. How about `table.dml-async`?
>
> Thanks,
> Timo
>
>
> On 08.02.21 15:55, Jark Wu wrote:
> > Thanks Timo,
> >
> > I'm +1 for option#2 too.
> >
> > I think we have addressed all the concerns and can start a vote.
> >
> > Best,
> > Jark
> >
> > On Mon, 8 Feb 2021 at 22:19, Timo Walther  wrote:
> >
> >> Hi Jark,
> >>
> >> you are right. Nesting STATEMENT SET and ASYNC might be too verbose.
> >>
> >> So let's stick to the config option approach.
> >>
> >> However, I strongly believe that we should not use the batch/streaming
> >> mode for deriving semantics. This discussion is similar to time function
> >> discussion. We should not derive sync/async submission behavior from a
> >> flag that should only influence runtime operators and the incremental
> >> computation. Statements for bounded streams should have the same
> >> semantics in batch mode.
> >>
> >> I think your proposed option 2) is a good tradeoff. For the following
> >> reasons:
> >>
> >> pros:
> >> - by default, batch and streaming behave exactly the same
> >> - SQL Client CLI behavior does not change compared to 1.12 and remains
> >> async for batch and streaming
> >> - consistent with the async Table API behavior
> >>
> >> con:
> >> - batch files are not 100% SQL compliant by default
> >>
> >> The last item might not be an issue since we can expect that users have
> >> long-running jobs and prefer async execution in most cases.
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 08.02.21 14:15, Jark Wu wrote:
> >>> Hi Timo,
> >>>
> >>> Actually, I'm not in favor of explicit syntax `BEGIN ASYNC;... END;`.
> >>> Because it makes submitting streaming jobs very verbose, every INSERT
> >> INTO
> >>> and STATEMENT SET must be wrapped in the ASYNC clause which is
> >>> not user-friendly and not backward-compatible.
> >>>
> >>> I agree we will have unified behavior but this is at the cost of
> hurting
> >>> our main users.
> >>> I'm worried that end users can't understand the technical decision, and
> >>> they would
> >>> feel streaming is harder to use.
> >>>
> >>> If we want to have an unified behavior, and let users decide what's the
> >>> desirable behavior, I prefer to have a config option. A Flink cluster
> can
> >>> be set to async, then
> >>> users don't need to wrap every DML in an ASYNC clause. This is the
> least
> >>> intrusive
> >>> way to the users.
> >>>
> >>>
> >>> Personally, I'm fine with following options in priority:
> >>>
> >>> 1) sync for batch DML and async for streaming DML
> >>> ==> only breaks batch behavior, but makes both happy
> >>>
> >>> 2) async for both batch and streaming DML, and can be set to sync via a
> >>> configuration.
> >>> ==> compatible, and provides flexible configurable behavior
> >>>
> >>> 3) sync for both batch and streaming DML, and can be
> >>>   set to async via a configuration.
> >>> ==> +0 for this, because it breaks all the compatibility, esp. our main
> >>> users.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Mon, 8 Feb 2021 at 17:34, Timo Walther  wrote:
> >>>
> >>>> Hi Jark, Hi Rui,
> >>>>
> >>>> 1) How should we execute statements in CLI and in file? Should there
> be
> >>>> a difference?
> >>>> So it seems we have consensus here with unified bahavior. Even though
> >>>> this means we are breaking existing batch INSERT INTOs that were
> >>>> asynchronous before.
> >>>>
> >>>> 2) Should we have different behavior for batch and streaming?
> >>>&g

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-08 Thread Jark Wu

Thanks Timo,

I'm +1 for option#2 too.

I think we have addressed all the concerns and can start a vote.

Best,
Jark

On Mon, 8 Feb 2021 at 22:19, Timo Walther  wrote:

> Hi Jark,
>
> you are right. Nesting STATEMENT SET and ASYNC might be too verbose.
>
> So let's stick to the config option approach.
>
> However, I strongly believe that we should not use the batch/streaming
> mode for deriving semantics. This discussion is similar to time function
> discussion. We should not derive sync/async submission behavior from a
> flag that should only influence runtime operators and the incremental
> computation. Statements for bounded streams should have the same
> semantics in batch mode.
>
> I think your proposed option 2) is a good tradeoff. For the following
> reasons:
>
> pros:
> - by default, batch and streaming behave exactly the same
> - SQL Client CLI behavior does not change compared to 1.12 and remains
> async for batch and streaming
> - consistent with the async Table API behavior
>
> con:
> - batch files are not 100% SQL compliant by default
>
> The last item might not be an issue since we can expect that users have
> long-running jobs and prefer async execution in most cases.
>
> Regards,
> Timo
>
>
> On 08.02.21 14:15, Jark Wu wrote:
> > Hi Timo,
> >
> > Actually, I'm not in favor of explicit syntax `BEGIN ASYNC;... END;`.
> > Because it makes submitting streaming jobs very verbose, every INSERT
> INTO
> > and STATEMENT SET must be wrapped in the ASYNC clause which is
> > not user-friendly and not backward-compatible.
> >
> > I agree we will have unified behavior but this is at the cost of hurting
> > our main users.
> > I'm worried that end users can't understand the technical decision, and
> > they would
> > feel streaming is harder to use.
> >
> > If we want to have an unified behavior, and let users decide what's the
> > desirable behavior, I prefer to have a config option. A Flink cluster can
> > be set to async, then
> > users don't need to wrap every DML in an ASYNC clause. This is the least
> > intrusive
> > way to the users.
> >
> >
> > Personally, I'm fine with following options in priority:
> >
> > 1) sync for batch DML and async for streaming DML
> > ==> only breaks batch behavior, but makes both happy
> >
> > 2) async for both batch and streaming DML, and can be set to sync via a
> > configuration.
> > ==> compatible, and provides flexible configurable behavior
> >
> > 3) sync for both batch and streaming DML, and can be
> >  set to async via a configuration.
> > ==> +0 for this, because it breaks all the compatibility, esp. our main
> > users.
> >
> > Best,
> > Jark
> >
> > On Mon, 8 Feb 2021 at 17:34, Timo Walther  wrote:
> >
> >> Hi Jark, Hi Rui,
> >>
> >> 1) How should we execute statements in CLI and in file? Should there be
> >> a difference?
> >> So it seems we have consensus here with unified bahavior. Even though
> >> this means we are breaking existing batch INSERT INTOs that were
> >> asynchronous before.
> >>
> >> 2) Should we have different behavior for batch and streaming?
> >> I think also batch users prefer async behavior because usually even
> >> those pipelines take some time to execute. But we need should stick to
> >> standard SQL blocking semantics.
> >>
> >> What are your opinions on making async explicit in SQL via `BEGIN ASYNC;
> >> ... END;`? This would allow us to really have unified semantics because
> >> batch and streaming would behave the same?
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 07.02.21 04:46, Rui Li wrote:
> >>> Hi Timo,
> >>>
> >>> I agree with Jark that we should provide consistent experience
> regarding
> >>> SQL CLI and files. Some systems even allow users to execute SQL files
> in
> >>> the CLI, e.g. the "SOURCE" command in MySQL. If we want to support that
> >> in
> >>> the future, it's a little tricky to decide whether that should be
> treated
> >>> as CLI or file.
> >>>
> >>> I actually prefer a config option and let users decide what's the
> >>> desirable behavior. But if we have agreed not to use options, I'm also
> >> fine
> >>> with Alternative #1.
> >>>
> >>> On Sun, Feb 7, 2021 at 11:01 AM Jark Wu  wrote:
> >>>
> >>>> Hi Timo,
> >>>>
> >>>>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-08 Thread Jark Wu

Hi Timo,

Actually, I'm not in favor of explicit syntax `BEGIN ASYNC;... END;`.
Because it makes submitting streaming jobs very verbose, every INSERT INTO
and STATEMENT SET must be wrapped in the ASYNC clause which is
not user-friendly and not backward-compatible.

I agree we will have unified behavior but this is at the cost of hurting
our main users.
I'm worried that end users can't understand the technical decision, and
they would
feel streaming is harder to use.

If we want to have an unified behavior, and let users decide what's the
desirable behavior, I prefer to have a config option. A Flink cluster can
be set to async, then
users don't need to wrap every DML in an ASYNC clause. This is the least
intrusive
way to the users.


Personally, I'm fine with following options in priority:

1) sync for batch DML and async for streaming DML
==> only breaks batch behavior, but makes both happy

2) async for both batch and streaming DML, and can be set to sync via a
configuration.
==> compatible, and provides flexible configurable behavior

3) sync for both batch and streaming DML, and can be
set to async via a configuration.
==> +0 for this, because it breaks all the compatibility, esp. our main
users.

Best,
Jark

On Mon, 8 Feb 2021 at 17:34, Timo Walther  wrote:

> Hi Jark, Hi Rui,
>
> 1) How should we execute statements in CLI and in file? Should there be
> a difference?
> So it seems we have consensus here with unified bahavior. Even though
> this means we are breaking existing batch INSERT INTOs that were
> asynchronous before.
>
> 2) Should we have different behavior for batch and streaming?
> I think also batch users prefer async behavior because usually even
> those pipelines take some time to execute. But we need should stick to
> standard SQL blocking semantics.
>
> What are your opinions on making async explicit in SQL via `BEGIN ASYNC;
> ... END;`? This would allow us to really have unified semantics because
> batch and streaming would behave the same?
>
> Regards,
> Timo
>
>
> On 07.02.21 04:46, Rui Li wrote:
> > Hi Timo,
> >
> > I agree with Jark that we should provide consistent experience regarding
> > SQL CLI and files. Some systems even allow users to execute SQL files in
> > the CLI, e.g. the "SOURCE" command in MySQL. If we want to support that
> in
> > the future, it's a little tricky to decide whether that should be treated
> > as CLI or file.
> >
> > I actually prefer a config option and let users decide what's the
> > desirable behavior. But if we have agreed not to use options, I'm also
> fine
> > with Alternative #1.
> >
> > On Sun, Feb 7, 2021 at 11:01 AM Jark Wu  wrote:
> >
> >> Hi Timo,
> >>
> >> 1) How should we execute statements in CLI and in file? Should there be
> a
> >> difference?
> >> I do think we should unify the behavior of CLI and SQL files. SQL files
> can
> >> be thought of as a shortcut of
> >> "start CLI" => "copy content of SQL files" => "past content in CLI".
> >> Actually, we already did this in kafka_e2e.sql [1].
> >> I think it's hard for users to understand why SQL files behave
> differently
> >> from CLI, all the other systems don't have such a difference.
> >>
> >> If we distinguish SQL files and CLI, should there be a difference in
> JDBC
> >> driver and UI platform?
> >> Personally, they all should have consistent behavior.
> >>
> >> 2) Should we have different behavior for batch and streaming?
> >> I think we all agree streaming users prefer async execution, otherwise
> it's
> >> weird and difficult to use if the
> >> submit script or CLI never exists. On the other hand, batch SQL users
> are
> >> used to SQL statements being
> >> executed blockly.
> >>
> >> Either unified async execution or unified sync execution, will hurt one
> >> side of the streaming
> >> batch users. In order to make both sides happy, I think we can have
> >> different behavior for batch and streaming.
> >> There are many essential differences between batch and stream systems, I
> >> think it's normal to have some
> >> different behaviors, and the behavior doesn't break the unified batch
> >> stream semantics.
> >>
> >>
> >> Thus, I'm +1 to Alternative 1:
> >> We consider batch/streaming mode and block for batch INSERT INTO and
> async
> >> for streaming INSERT INTO/STATEMENT SET.
> >> And this behavior is consistent across CLI and files.
> >>
> >> Best,
> >> Jark
> >>
> >> [1]:
>

Re: [VOTE] FLIP-152: Hive Query Syntax Compatibility

2021-02-07 Thread Jark Wu

Thanks for driving this.

+1

Best,
Jark

On Mon, 8 Feb 2021 at 09:47, Kurt Young  wrote:

> +1
>
> Best,
> Kurt
>
>
> On Sun, Feb 7, 2021 at 7:24 PM Rui Li  wrote:
>
> > Hi everyone,
> >
> > I think we have reached some consensus on FLIP-152 [1] in the discussion
> > thread [2]. So I'd like to start the vote for this FLIP.
> >
> > The vote will be open for 72 hours, until Feb. 10 2021 01:00 PM UTC,
> unless
> > there's an objection.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> > [2]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-152-Hive-Query-Syntax-Compatibility-td46928.html
> >
> > --
> > Best regards!
> > Rui Li
> >
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-06 Thread Jark Wu

Hi Timo,

1) How should we execute statements in CLI and in file? Should there be a
difference?
I do think we should unify the behavior of CLI and SQL files. SQL files can
be thought of as a shortcut of
"start CLI" => "copy content of SQL files" => "past content in CLI".
Actually, we already did this in kafka_e2e.sql [1].
I think it's hard for users to understand why SQL files behave differently
from CLI, all the other systems don't have such a difference.

If we distinguish SQL files and CLI, should there be a difference in JDBC
driver and UI platform?
Personally, they all should have consistent behavior.

2) Should we have different behavior for batch and streaming?
I think we all agree streaming users prefer async execution, otherwise it's
weird and difficult to use if the
submit script or CLI never exists. On the other hand, batch SQL users are
used to SQL statements being
executed blockly.

Either unified async execution or unified sync execution, will hurt one
side of the streaming
batch users. In order to make both sides happy, I think we can have
different behavior for batch and streaming.
There are many essential differences between batch and stream systems, I
think it's normal to have some
different behaviors, and the behavior doesn't break the unified batch
stream semantics.

Thus, I'm +1 to Alternative 1:
We consider batch/streaming mode and block for batch INSERT INTO and async
for streaming INSERT INTO/STATEMENT SET.
And this behavior is consistent across CLI and files.

Best,
Jark

[1]:
https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/resources/kafka_e2e.sql

On Fri, 5 Feb 2021 at 21:49, Timo Walther  wrote:

> Hi Jark,
>
> thanks for the summary. I hope we can also find a good long-term
> solution on the async/sync execution behavior topic.
>
> It should be discussed in a bigger round because it is (similar to the
> time function discussion) related to batch-streaming unification where
> we should stick to the SQL standard to some degree but also need to come
> up with good streaming semantics.
>
> Let me summarize the problem again to hear opinions:
>
> - Batch SQL users are used to execute SQL files sequentially (from top
> to bottom).
> - Batch SQL users are used to SQL statements being executed blocking.
> One after the other. Esp. when moving around data with INSERT INTO.
> - Streaming users prefer async execution because unbounded stream are
> more frequent than bounded streams.
> - We decided to make Flink Table API is async because in a programming
> language it is easy to call `.await()` on the result to make it blocking.
> - INSERT INTO statements in the current SQL Client implementation are
> always submitted asynchrounous.
> - Other client's such as Ververica platform allow only one INSERT INTO
> or a STATEMENT SET at the end of a file that will run asynchrounously.
>
> Questions:
>
> - How should we execute statements in CLI and in file? Should there be a
> difference?
> - Should we have different behavior for batch and streaming?
> - Shall we solve parts with a config option or is it better to make it
> explicit in the SQL job definition because it influences the semantics
> of multiple INSERT INTOs?
>
> Let me summarize my opinion at the moment:
>
> - SQL files should always be executed blocking by default. Because they
> could potentially contain a long list of INSERT INTO statements. This
> would be SQL standard compliant.
> - If we allow async execution, we should make this explicit in the SQL
> file via `BEGIN ASYNC; ... END;`.
> - In the CLI, we always execute async to maintain the old behavior. We
> can also assume that people are only using the CLI to fire statements
> and close the CLI afterwards.
>
> Alternative 1:
> - We consider batch/streaming mode and block for batch INSERT INTO and
> async for streaming INSERT INTO/STATEMENT SET
>
> What do others think?
>
> Regards,
> Timo
>
>
>
>
> On 05.02.21 04:03, Jark Wu wrote:
> > Hi all,
> >
> > After an offline discussion with Timo and Kurt, we have reached some
> > consensus.
> > Please correct me if I am wrong or missed anything.
> >
> > 1) We will introduce "table.planner" and "table.execution-mode" instead
> of
> > "sql-client" prefix,
> > and add `TableEnvironment.create(Configuration)` interface. These 2
> options
> > can only be used
> > for tableEnv initialization. If used after initialization, Flink should
> > throw an exception. We may can
> > support dynamic switch the planner in the future.
> >
> > 2) We will have only one parser,
> > i.e. org.apache.flink.table.delegation.Parser. It accepts a string
> &g

[jira] [Created] (FLINK-21305) Cumulative window should accumulate late events belonging to the cleaned slice

2021-02-05 Thread Jark Wu (Jira)

Jark Wu created FLINK-21305:
---

 Summary: Cumulative window should accumulate late events belonging 
to the cleaned slice
 Key: FLINK-21305
 URL: https://issues.apache.org/jira/browse/FLINK-21305
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Jark Wu
 Fix For: 1.13.0


Currently, the CUMULATE window drops elements belonging to the cleaned slices. 
This will lead to more inaccurate result than without slicing optimization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21304) Support split distinct aggregate for window TVF based aggregate

2021-02-05 Thread Jark Wu (Jira)

Jark Wu created FLINK-21304:
---

 Summary: Support split distinct aggregate for window TVF based 
aggregate
 Key: FLINK-21304
 URL: https://issues.apache.org/jira/browse/FLINK-21304
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Jark Wu
 Fix For: 1.13.0


Currently, optimization {{table.optimizer.distinct-agg.split.enabled}} is only 
supported for unbounded aggregate. It is a very important optimization for 
skewed distinct aggregates, and we also need this optimization for window TVF 
based aggregate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21290) Support Projection push down for Window TVF

2021-02-04 Thread Jark Wu (Jira)

Jark Wu created FLINK-21290:
---

 Summary: Support Projection push down for Window TVF
 Key: FLINK-21290
 URL: https://issues.apache.org/jira/browse/FLINK-21290
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Jark Wu


{code:scala}
  @Test
  def testTumble_ProjectionPushDown(): Unit = {
// TODO: [b, c, e, proctime] are never used, should be pruned
val sql =
  """
|SELECT
|   a,
|   window_start,
|   window_end,
|   count(*),
|   sum(d)
|FROM TABLE(TUMBLE(TABLE MyTable, DESCRIPTOR(rowtime), INTERVAL '15' 
MINUTE))
|GROUP BY a, window_start, window_end
  """.stripMargin
util.verifyRelPlan(sql)
  }
{code}

For the above test, currently we get the following plan:

{code}
Calc(select=[a, window_start, window_end, EXPR$3, EXPR$4])
+- WindowAggregate(groupBy=[a], window=[TUMBLE(time_col=[rowtime], size=[15 
min])], select=[a, COUNT(*) AS EXPR$3, SUM(d) AS EXPR$4, start('w$) AS 
window_start, end('w$) AS window_end])
   +- Exchange(distribution=[hash[a]])
  +- Calc(select=[a, d, rowtime])
 +- WatermarkAssigner(rowtime=[rowtime], watermark=[-(rowtime, 
1000:INTERVAL SECOND)])
+- Calc(select=[a, b, c, d, e, rowtime, PROCTIME() AS proctime])
   +- TableSourceScan(table=[[default_catalog, default_database, 
MyTable]], fields=[a, b, c, d, e, rowtime])
{code}

It should be able to prune fields and get the following plan:

{code}
Calc(select=[a, window_start, window_end, EXPR$3, EXPR$4])
+- WindowAggregate(groupBy=[a], window=[TUMBLE(time_col=[rowtime], size=[15 
min])], select=[a, COUNT(*) AS EXPR$3, SUM(d) AS EXPR$4, start('w$) AS 
window_start, end('w$) AS window_end])
   +- Exchange(distribution=[hash[a]])
 +- WatermarkAssigner(rowtime=[rowtime], watermark=[-(rowtime, 
1000:INTERVAL SECOND)])
   +- TableSourceScan(table=[[default_catalog, default_database, 
MyTable]], fields=[a, d, rowtime])
{code}

The reason is we didn't transpose Project and WindowTableFunction in logical 
phase. 

{code}
LogicalAggregate(group=[{0, 1, 2}], EXPR$3=[COUNT()], EXPR$4=[SUM($3)])
+- LogicalProject(a=[$0], window_start=[$7], window_end=[$8], d=[$3])
   +- LogicalTableFunctionScan(invocation=[TUMBLE($6, DESCRIPTOR($5), 
90:INTERVAL MINUTE)], rowType=[RecordType(INTEGER a, BIGINT b, 
VARCHAR(2147483647) c, DECIMAL(10, 3) d, BIGINT e, TIME ATTRIBUTE(ROWTIME) 
rowtime, TIME ATTRIBUTE(PROCTIME) proctime, TIMESTAMP(3) window_start, 
TIMESTAMP(3) window_end, TIME ATTRIBUTE(ROWTIME) window_time)])
  +- LogicalProject(a=[$0], b=[$1], c=[$2], d=[$3], e=[$4], rowtime=[$5], 
proctime=[$6])
 +- LogicalWatermarkAssigner(rowtime=[rowtime], watermark=[-($5, 
1000:INTERVAL SECOND)])
+- LogicalProject(a=[$0], b=[$1], c=[$2], d=[$3], e=[$4], 
rowtime=[$5], proctime=[PROCTIME()])
   +- LogicalTableScan(table=[[default_catalog, default_database, 
MyTable]])
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-04 Thread Jark Wu

Hi all,

After an offline discussion with Timo and Kurt, we have reached some
consensus.
Please correct me if I am wrong or missed anything.

1) We will introduce "table.planner" and "table.execution-mode" instead of
"sql-client" prefix,
and add `TableEnvironment.create(Configuration)` interface. These 2 options
can only be used
for tableEnv initialization. If used after initialization, Flink should
throw an exception. We may can
support dynamic switch the planner in the future.

2) We will have only one parser,
i.e. org.apache.flink.table.delegation.Parser. It accepts a string
statement, and returns a list of Operation. It will first use regex to
match some special statement,
 e.g. SET, ADD JAR, others will be delegated to the underlying Calcite
parser. The Parser can
have different implementations, e.g. HiveParser.

3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink dialect. But we
can allow
DELETE JAR, LIST JAR in Hive dialect through HiveParser.

4) We don't have a conclusion for async/sync execution behavior yet.

Best,
Jark



On Thu, 4 Feb 2021 at 17:50, Jark Wu  wrote:

> Hi Ingo,
>
> Since we have supported the WITH syntax and SET command since v1.9 [1][2],
> and
> we have never received such complaints, I think it's fine for such
> differences.
>
> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also requires
> string literal keys[3],
> and the SET = doesn't allow quoted keys [4].
>
> Best,
> Jark
>
> [1]:
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
> [2]:
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
> [3]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
> [4]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
> (search "set mapred.reduce.tasks=32")
>
> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk  wrote:
>
>> Hi,
>>
>> regarding the (un-)quoted question, compatibility is of course an
>> important
>> argument, but in terms of consistency I'd find it a bit surprising that
>> WITH handles it differently than SET, and I wonder if that could cause
>> friction for developers when writing their SQL.
>>
>>
>> Regards
>> Ingo
>>
>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu  wrote:
>>
>> > Hi all,
>> >
>> > Regarding "One Parser", I think it's not possible for now because
>> Calcite
>> > parser can't parse
>> > special characters (e.g. "-") unless quoting them as string literals.
>> > That's why the WITH option
>> > key are string literals not identifiers.
>> >
>> > SET table.exec.mini-batch.enabled = true and ADD JAR
>> > /local/my-home/test.jar
>> > have the same
>> > problems. That's why we propose two parser, one splits lines into
>> multiple
>> > statements and match special
>> > command through regex which is light-weight, and delegate other
>> statements
>> > to the other parser which is Calcite parser.
>> >
>> > Note: we should stick on the unquoted SET table.exec.mini-batch.enabled
>> =
>> > true syntax,
>> > both for backward-compatibility and easy-to-use, and all the other
>> systems
>> > don't have quotes on the key.
>> >
>> >
>> > Regarding "table.planner" vs "sql-client.planner",
>> > if we want to use "table.planner", I think we should explain clearly
>> what's
>> > the scope it can be used in documentation.
>> > Otherwise, there will be users complaining why the planner doesn't
>> change
>> > when setting the configuration on TableEnv.
>> > Would be better throwing an exception to indicate users it's now
>> allowed to
>> > change planner after TableEnv is initialized.
>> > However, it seems not easy to implement.
>> >
>> > Best,
>> > Jark
>> >
>> > On Thu, 4 Feb 2021 at 15:49, godfrey he  wrote:
>> >
>> > > Hi everyone,
>> > >
>> > > Regarding "table.planner" and "table.execution-mode"
>> > > If we define that those two options are just used to initialize the
>> > > TableEnvironment, +1 for introducing table options instead of
>> sql-client
>> > > options.
>> > >
>> > > Regarding "the sql client, we will maintain two parsers", I want to
>> give
>> > > more inputs:
>> > > We want to introduce sql-gateway into the Flink project (see FLIP-24 &
>> > > FLIP-91

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-04 Thread Jark Wu

Hi Ingo,

Since we have supported the WITH syntax and SET command since v1.9 [1][2],
and
we have never received such complaints, I think it's fine for such
differences.

Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also requires
string literal keys[3],
and the SET = doesn't allow quoted keys [4].

Best,
Jark

[1]:
https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
[2]:
https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
[3]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
[4]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
(search "set mapred.reduce.tasks=32")

On Thu, 4 Feb 2021 at 17:09, Ingo Bürk  wrote:

> Hi,
>
> regarding the (un-)quoted question, compatibility is of course an important
> argument, but in terms of consistency I'd find it a bit surprising that
> WITH handles it differently than SET, and I wonder if that could cause
> friction for developers when writing their SQL.
>
>
> Regards
> Ingo
>
> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu  wrote:
>
> > Hi all,
> >
> > Regarding "One Parser", I think it's not possible for now because Calcite
> > parser can't parse
> > special characters (e.g. "-") unless quoting them as string literals.
> > That's why the WITH option
> > key are string literals not identifiers.
> >
> > SET table.exec.mini-batch.enabled = true and ADD JAR
> > /local/my-home/test.jar
> > have the same
> > problems. That's why we propose two parser, one splits lines into
> multiple
> > statements and match special
> > command through regex which is light-weight, and delegate other
> statements
> > to the other parser which is Calcite parser.
> >
> > Note: we should stick on the unquoted SET table.exec.mini-batch.enabled =
> > true syntax,
> > both for backward-compatibility and easy-to-use, and all the other
> systems
> > don't have quotes on the key.
> >
> >
> > Regarding "table.planner" vs "sql-client.planner",
> > if we want to use "table.planner", I think we should explain clearly
> what's
> > the scope it can be used in documentation.
> > Otherwise, there will be users complaining why the planner doesn't change
> > when setting the configuration on TableEnv.
> > Would be better throwing an exception to indicate users it's now allowed
> to
> > change planner after TableEnv is initialized.
> > However, it seems not easy to implement.
> >
> > Best,
> > Jark
> >
> > On Thu, 4 Feb 2021 at 15:49, godfrey he  wrote:
> >
> > > Hi everyone,
> > >
> > > Regarding "table.planner" and "table.execution-mode"
> > > If we define that those two options are just used to initialize the
> > > TableEnvironment, +1 for introducing table options instead of
> sql-client
> > > options.
> > >
> > > Regarding "the sql client, we will maintain two parsers", I want to
> give
> > > more inputs:
> > > We want to introduce sql-gateway into the Flink project (see FLIP-24 &
> > > FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI client
> and
> > > the gateway service will communicate through Rest API. The " ADD JAR
> > > /local/path/jar " will be executed in the CLI client machine. So when
> we
> > > submit a sql file which contains multiple statements, the CLI client
> > needs
> > > to pick out the "ADD JAR" line, and also statements need to be
> submitted
> > or
> > > executed one by one to make sure the result is correct. The sql file
> may
> > be
> > > look like:
> > >
> > > SET xxx=yyy;
> > > create table my_table ...;
> > > create table my_sink ...;
> > > ADD JAR /local/path/jar1;
> > > create function my_udf as comMyUdf;
> > > insert into my_sink select ..., my_udf(xx) from ...;
> > > REMOVE JAR /local/path/jar1;
> > > drop function my_udf;
> > > ADD JAR /local/path/jar2;
> > > create function my_udf as comMyUdf2;
> > > insert into my_sink select ..., my_udf(xx) from ...;
> > >
> > > The lines need to be splitted into multiple statements first in the CLI
> > > client, there are two approaches:
> > > 1. The CLI client depends on the sql-parser: the sql-parser splits the
> > > lines and tells which lines are "ADD JAR".
> > > pro: there is only one parser
> > > cons: It's a little heavy that the CLI

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-04 Thread Jark Wu

tory offline to determine how we
> > > proceed with this in the near future.
> > >
> > > 1) "whether the table environment has the ability to update itself"
> > >
> > > Maybe there was some misunderstanding. I don't think that we should
> > > support `tEnv.getConfig.getConfiguration.setString("table.planner",
> > > "old")`. Instead I'm proposing to support
> > > `TableEnvironment.create(Configuration)` where planner and execution
> > > mode are read immediately and a subsequent changes to these options
> will
> > > have no effect. We are doing it similar in `new
> > > StreamExecutionEnvironment(Configuration)`. These two ConfigOption's
> > > must not be SQL Client specific but can be part of the core table code
> > > base. Many users would like to get a 100% preconfigured environment
> from
> > > just Configuration. And this is not possible right now. We can solve
> > > both use cases in one change.
> > >
> > > 2) "the sql client, we will maintain two parsers"
> > >
> > > I remember we had some discussion about this and decided that we would
> > > like to maintain only one parser. In the end it is "One Flink SQL"
> where
> > > commands influence each other also with respect to keywords. It should
> > > be fine to include the SQL Client commands in the Flink parser. Of
> > > cource the table environment would not be able to handle the
> `Operation`
> > > instance that would be the result but we can introduce hooks to handle
> > > those `Operation`s. Or we introduce parser extensions.
> > >
> > > Can we skip `table.job.async` in the first version? We should further
> > > discuss whether we introduce a special SQL clause for wrapping async
> > > behavior or if we use a config option? Esp. for streaming queries we
> > > need to be careful and should force users to either "one INSERT INTO"
> or
> > > "one STATEMENT SET".
> > >
> > > 3) 4) "HIVE also uses these commands"
> > >
> > > In general, Hive is not a good reference. Aligning the commands more
> > > with the remaining commands should be our goal. We just had a MODULE
> > > discussion where we selected SHOW instead of LIST. But it is true that
> > > JARs are not part of the catalog which is why I would not use
> > > CREATE/DROP. ADD/REMOVE are commonly siblings in the English language.
> > > Take a look at the Java collection API as another example.
> > >
> > > 6) "Most of the commands should belong to the table environment"
> > >
> > > Thanks for updating the FLIP this makes things easier to understand. It
> > > is good to see that most commends will be available in
> TableEnvironment.
> > > However, I would also support SET and RESET for consistency. Again,
> from
> > > an architectural point of view, if we would allow some kind of
> > > `Operation` hook in table environment, we could check for SQL Client
> > > specific options and forward to regular `TableConfig.getConfiguration`
> > > otherwise. What do you think?
> > >
> > > Regards,
> > > Timo
> > >
> > >
> > > On 03.02.21 08:58, Jark Wu wrote:
> > > > Hi Timo,
> > > >
> > > > I will respond some of the questions:
> > > >
> > > > 1) SQL client specific options
> > > >
> > > > Whether it starts with "table" or "sql-client" depends on where the
> > > > configuration takes effect.
> > > > If it is a table configuration, we should make clear what's the
> > behavior
> > > > when users change
> > > > the configuration in the lifecycle of TableEnvironment.
> > > >
> > > > I agree with Shengkai `sql-client.planner` and
> > > `sql-client.execution.mode`
> > > > are something special
> > > > that can't be changed after TableEnvironment has been initialized.
> You
> > > can
> > > > see
> > > > `StreamExecutionEnvironment` provides `configure()`  method to
> override
> > > > configuration after
> > > > StreamExecutionEnvironment has been initialized.
> > > >
> > > > Therefore, I think it would be better to still use
> > `sql-client.planner`
> > > > and `sql-client.execution.mode`.
> > > >
> > > > 2) Execution file
> > > >
> > > >>From my point of

[jira] [Created] (FLINK-21265) SQLClientSchemaRegistryITCase.testReading failed with DockerClientException unauthorized

2021-02-03 Thread Jark Wu (Jira)

Jark Wu created FLINK-21265:
---

 Summary: SQLClientSchemaRegistryITCase.testReading failed with 
DockerClientException unauthorized
 Key: FLINK-21265
 URL: https://issues.apache.org/jira/browse/FLINK-21265
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.13.0
Reporter: Jark Wu


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12865=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=ff888d9b-cd34-53cc-d90f-3e446d355529


{code}
Feb 03 17:50:00 [INFO] ---
Feb 03 17:50:00 [INFO]  T E S T S
Feb 03 17:50:00 [INFO] ---
Feb 03 17:50:56 [INFO] Running 
org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase
Feb 03 17:52:41 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 105.31 s <<< FAILURE! - in 
org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase
Feb 03 17:52:41 [ERROR] 
testReading(org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase)  
Time elapsed: 2.278 s  <<< ERROR!
Feb 03 17:52:41 java.util.concurrent.ExecutionException: 
org.testcontainers.containers.ContainerLaunchException: Container startup failed
Feb 03 17:52:41 at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
Feb 03 17:52:41 at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
Feb 03 17:52:41 at 
org.testcontainers.containers.GenericContainer.start(GenericContainer.java:307)
Feb 03 17:52:41 at 
org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1021)
Feb 03 17:52:41 at 
org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:29)
Feb 03 17:52:41 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
Feb 03 17:52:41 at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
Feb 03 17:52:41 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
Feb 03 17:52:41 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
Feb 03 17:52:41 at 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
Feb 03 17:52:41 at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
Feb 03 17:52:41 at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
Feb 03 17:52:41 at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
Feb 03 17:52:41 at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
Feb 03 17:52:41 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
Feb 03 17:52:41 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
Feb 03 17:52:41 at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
Feb 03 17:52:41 at java.lang.Thread.run(Thread.java:748)
Feb 03 17:52:41 Caused by: 
org.testcontainers.containers.ContainerLaunchException: Container startup failed
Feb 03 17:52:41 at 
org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:327)
Feb 03 17:52:41 at 
org.testcontainers.containers.KafkaContainer.doStart(KafkaContainer.java:94)
Feb 03 17:52:41 at 
org.testcontainers.containers.GenericContainer.start(GenericContainer.java:308)
Feb 03 17:52:41 at 
java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:719)
Feb 03 17:52:41 at 
java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:701)
Feb 03 17:52:41 at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
Feb 03 17:52:41 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
Feb 03 17:52:41 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Feb 03 17:52:41 ... 1 more
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-03 Thread Jark Wu

Thanks for the summary. LGTM.

On Wed, 3 Feb 2021 at 20:25, Rui Li  wrote:

> Thanks Jane for the summary. Looks good to me.
>
> On Wed, Feb 3, 2021 at 7:53 PM Jane Chan  wrote:
>
> > Hi @Jark, @Timo, I've updated the comments, and please have a look when
> > you're free.
> >
> > Best,
> > Jane
> >
> > On Wed, Feb 3, 2021 at 7:14 PM Jane Chan  wrote:
> >
> > >
> > > Reply @Timo
> > >
> > >> Remove the `used` column for SHOW MODULES. It will always show true.
> > >>
> > > Good catch. It's a copy-paste typo, and I forgot to remove that column.
> > >
> > > How about creating a POJO (static inner class of ModuleManager) called
> > >> `ModuleEntry` or similar.
> > >>
> > > +1 for better encapsulation.
> > >
> > > Reply @Jark
> > >
> > >> A minor comment on `useModules(List names)`, would be better
> to
> > >> use varargs here to a more fluent API: `useModules("a", "b", "c")`.
> > >>
> > >  +1, and that's better.
> > >
> > > Do we also need to add these new methods (useModules, listFullModules)
> > >> to TableEnvironment?
> > >>
> > > Yes, indeed.
> > >
> > > Thank you all for polishing this proposal to make it more thorough.
> > >
> > > Best,
> > > Jane
> > >
> > > On Wed, Feb 3, 2021 at 6:41 PM Jark Wu  wrote:
> > >
> > >> A minor comment on `useModules(List names)`,
> > >> would be better to use varargs here to a more fluent API:
> > `useModules("a",
> > >> "b", "c")`.
> > >>
> > >> Besides, do we also need to add these new methods (useModules,
> > >> listFullModules) to
> > >> TableEnvironment?
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >> On Wed, 3 Feb 2021 at 18:36, Timo Walther  wrote:
> > >>
> > >> > Thanks for the nice summary Jane. The summary looks great. Some
> minor
> > >> > feedback:
> > >> >
> > >> > - Remove the `used` column for SHOW MODULES. It will always show
> true.
> > >> >
> > >> > - `List> listFullModules()` is a very long
> > >> > signature. And `Pair` should be avoided in code because it is not
> very
> > >> > descriptive. How about creating a POJO (static inner class of
> > >> > ModuleManager) called `ModuleEntry` or similar.
> > >> >
> > >> > Otherwise +1 for the proposal.
> > >> >
> > >> > Regards,
> > >> > Timo
> > >> >
> > >> > On 03.02.21 11:24, Jane Chan wrote:
> > >> > > Hi everyone,
> > >> > >
> > >> > > I did a summary on the Jira issue page [1] since the discussion
> has
> > >> > > achieved a consensus. If there is anything missed or not
> corrected,
> > >> > please
> > >> > > let me know.
> > >> > >
> > >> > > [1] https://issues.apache.org/jira/browse/FLINK-21045#
> > >> > >
> > >> > > Best,
> > >> > > Jane
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Wed, Feb 3, 2021 at 1:33 PM Jark Wu  wrote:
> > >> > >
> > >> > >> Hi Jane,
> > >> > >>
> > >> > >> Yes. I think we should fail fast.
> > >> > >>
> > >> > >> Best,
> > >> > >> Jark
> > >> > >>
> > >> > >> On Wed, 3 Feb 2021 at 12:06, Jane Chan 
> > >> wrote:
> > >> > >>
> > >> > >>> Hi everyone,
> > >> > >>>
> > >> > >>> Thanks for the discussion to make this improvement plan clearer.
> > >> > >>>
> > >> > >>> Hi, @Jark, @Rui, and @Timo, I'm collecting the final discussion
> > >> > summaries
> > >> > >>> now and want to confirm one thing that for the statement `USE
> > >> MODULES x
> > >> > >> [,
> > >> > >>> y, z, ...]`, if the module name list contains an unexsited
> module,
> > >> > shall
> > >> >

[jira] [Created] (FLINK-21261) Improve digest of physical Expand node

2021-02-03 Thread Jark Wu (Jira)

Jark Wu created FLINK-21261:
---

 Summary: Improve digest of physical Expand node
 Key: FLINK-21261
 URL: https://issues.apache.org/jira/browse/FLINK-21261
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.13.0


Currently, the digest of {{StreamPhysicalExpand}} only geneartes field names, 
this loses many useful information, e.g. null fields, expand id, expand times. 

{code}
Expand(projects=[a, b, c, $f3, $f4, $e])
{code}

The digest of {{BatchPhysicalExpand}} generates additional projects list, but 
the first {{projects}} is reduandent information, we can remove it. 

{code}
Expand(projects=[a, c, $f2, d, $e, $f2_0], projects=[{a, c, $f2, d, 0 AS $e, 
$f2 AS $f2_0}, {a, c, null AS $f2, null AS d, 3 AS $e, $f2 AS $f2_0}])
{code}

The proposed digest of expand node would be:

{code}
Expand(projects=[{a, c, $f2, d, 0 AS $e, $f2 AS $f2_0}, {a, c, null AS $f2, 
null AS d, 3 AS $e, $f2 AS $f2_0}])
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-03 Thread Jark Wu

A minor comment on `useModules(List names)`,
would be better to use varargs here to a more fluent API: `useModules("a",
"b", "c")`.

Besides, do we also need to add these new methods (useModules,
listFullModules) to
TableEnvironment?

Best,
Jark

On Wed, 3 Feb 2021 at 18:36, Timo Walther  wrote:

> Thanks for the nice summary Jane. The summary looks great. Some minor
> feedback:
>
> - Remove the `used` column for SHOW MODULES. It will always show true.
>
> - `List> listFullModules()` is a very long
> signature. And `Pair` should be avoided in code because it is not very
> descriptive. How about creating a POJO (static inner class of
> ModuleManager) called `ModuleEntry` or similar.
>
> Otherwise +1 for the proposal.
>
> Regards,
> Timo
>
> On 03.02.21 11:24, Jane Chan wrote:
> > Hi everyone,
> >
> > I did a summary on the Jira issue page [1] since the discussion has
> > achieved a consensus. If there is anything missed or not corrected,
> please
> > let me know.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-21045#
> >
> > Best,
> > Jane
> >
> >
> >
> >
> >
> > On Wed, Feb 3, 2021 at 1:33 PM Jark Wu  wrote:
> >
> >> Hi Jane,
> >>
> >> Yes. I think we should fail fast.
> >>
> >> Best,
> >> Jark
> >>
> >> On Wed, 3 Feb 2021 at 12:06, Jane Chan  wrote:
> >>
> >>> Hi everyone,
> >>>
> >>> Thanks for the discussion to make this improvement plan clearer.
> >>>
> >>> Hi, @Jark, @Rui, and @Timo, I'm collecting the final discussion
> summaries
> >>> now and want to confirm one thing that for the statement `USE MODULES x
> >> [,
> >>> y, z, ...]`, if the module name list contains an unexsited module,
> shall
> >> we
> >>> #1 fail the execution for all of them or #2 enabled the rest modules
> and
> >>> return a warning to users? My personal preference goes to #1 for
> >>> simplicity. What do you think?
> >>>
> >>> Best,
> >>> Jane
> >>>
> >>> On Tue, Feb 2, 2021 at 3:53 PM Timo Walther 
> wrote:
> >>>
> >>>> +1
> >>>>
> >>>> @Jane Can you summarize our discussion in the JIRA issue?
> >>>>
> >>>> Thanks,
> >>>> Timo
> >>>>
> >>>>
> >>>> On 02.02.21 03:50, Jark Wu wrote:
> >>>>> Hi Timo,
> >>>>>
> >>>>>> Another question is whether a LOAD operation also adds the module to
> >>> the
> >>>>> enabled list by default?
> >>>>>
> >>>>> I would like to add the module to the enabled list by default, the
> >> main
> >>>>> reasons are:
> >>>>> 1) Reordering is an advanced requirement, adding modules needs
> >>> additional
> >>>>> USE statements with "core" module
> >>>>>sounds too burdensome. Most users should be satisfied with only
> >> LOAD
> >>>>> statements.
> >>>>> 2) We should keep compatible for TableEnvironment#loadModule().
> >>>>> 3) We are using the LOAD statement instead of CREATE, so I think it's
> >>>> fine
> >>>>> that it does some implicit things.
> >>>>>
> >>>>> Best,
> >>>>> Jark
> >>>>>
> >>>>> On Tue, 2 Feb 2021 at 00:48, Timo Walther 
> >> wrote:
> >>>>>
> >>>>>> Not the module itself but the ModuleManager should handle this case,
> >>>> yes.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Timo
> >>>>>>
> >>>>>>
> >>>>>> On 01.02.21 17:35, Jane Chan wrote:
> >>>>>>> +1 to Jark's proposal
> >>>>>>>
> >>>>>>> To make it clearer,  will `module#getFunctionDefinition()`
> >> return
> >>>> empty
> >>>>>>> suppose the module is loaded but not enabled?
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jane
> >>>>>>>
> >>>>>>> On Mon, Feb 1, 2021 at 10:02 PM Timo Walther 
> >>>> wrote:
> >>>>>>>
> >>>>>>>> +1 to

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-02 Thread Jark Wu

Hi Timo,

I will respond some of the questions:

1) SQL client specific options

Whether it starts with "table" or "sql-client" depends on where the
configuration takes effect.
If it is a table configuration, we should make clear what's the behavior
when users change
the configuration in the lifecycle of TableEnvironment.

I agree with Shengkai `sql-client.planner` and `sql-client.execution.mode`
are something special
that can't be changed after TableEnvironment has been initialized. You can
see
`StreamExecutionEnvironment` provides `configure()`  method to override
configuration after
StreamExecutionEnvironment has been initialized.

Therefore, I think it would be better to still use  `sql-client.planner`
and `sql-client.execution.mode`.

2) Execution file

>From my point of view, there is a big difference between
`sql-client.job.detach` and
`TableEnvironment.executeMultiSql()` that `sql-client.job.detach` will
affect every single DML statement
in the terminal, not only the statements in SQL files. I think the single
DML statement in the interactive
terminal is something like tEnv#executeSql() instead of
tEnv#executeMultiSql.
So I don't like the "multi" and "sql" keyword in `table.multi-sql-async`.
I just find that runtime provides a configuration called
"execution.attached" [1] which is false by default
which specifies if the pipeline is submitted in attached or detached mode.
It provides exactly the same
functionality of `sql-client.job.detach`. What do you think about using
this option?

If we also want to support this config in TableEnvironment, I think it
should also affect the DML execution
 of `tEnv#executeSql()`, not only DMLs in `tEnv#executeMultiSql()`.
Therefore, the behavior may look like this:

val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by default
tableResult.await()   ==> manually block until finish
tEnv.getConfig().getConfiguration().setString("execution.attached", "true")
val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync, don't need
to wait on the TableResult
tEnv.executeMultiSql(
"""
CREATE TABLE   ==> always sync
INSERT INTO ...  => sync, because we set configuration above
SET execution.attached = false;
INSERT INTO ...  => async
""")

On the other hand, I think `sql-client.job.detach`
and `TableEnvironment.executeMultiSql()` should be two separate topics,
as Shengkai mentioned above, SQL CLI only depends on
`TableEnvironment#executeSql()` to support multi-line statements.
I'm fine with making `executeMultiSql()` clear but don't want it to block
this FLIP, maybe we can discuss this in another thread.

Best,
Jark

[1]:
https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached

On Wed, 3 Feb 2021 at 15:33, Shengkai Fang  wrote:

> Hi, Timo.
> Thanks for your detailed feedback. I have some thoughts about your
> feedback.
>
> *Regarding #1*: I think the main problem is whether the table environment
> has the ability to update itself. Let's take a simple program as an
> example.
>
>
> ```
> TableEnvironment tEnv = TableEnvironment.create(...);
>
> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
>
>
> tEnv.executeSql("...");
>
> ```
>
> If we regard this option as a table option, users don't have to create
> another table environment manually. In that case, tEnv needs to check
> whether the current mode and planner are the same as before when executeSql
> or explainSql. I don't think it's easy work for the table environment,
> especially if users have a StreamExecutionEnvironment but set old planner
> and batch mode. But when we make this option as a sql client option, users
> only use the SET command to change the setting. We can rebuild a new table
> environment when set successes.
>
>
> *Regarding #2*: I think we need to discuss the implementation before
> continuing this topic. In the sql client, we will maintain two parsers. The
> first parser(client parser) will only match the sql client commands. If the
> client parser can't parse the statement, we will leverage the power of the
> table environment to execute. According to our blueprint,
> TableEnvironment#executeSql is enough for the sql client. Therefore,
> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
>
> But if we need to introduce the `TableEnvironment.executeMultiSql` in the
> future, I think it's OK to use the option `table.multi-sql-async` rather
> than option `sql-client.job.detach`. But we think the name is not suitable
> because the name is confusing for others. When setting the option false, we
> just mean it will block the execution of the INSERT INTO statement, not DDL
> or others(other sql statements are always executed synchronously). So how
> about `table.job.async`? It only works for the sql-client and the
> executeMultiSql. If we set this value false, the table environment will
> return the result until the job finishes.
>
>
> *Regarding #3, #4*: I still think we should use DELETE JAR and LIST JAR
>

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-02 Thread Jark Wu

Hi Jane,

Yes. I think we should fail fast.

Best,
Jark

On Wed, 3 Feb 2021 at 12:06, Jane Chan  wrote:

> Hi everyone,
>
> Thanks for the discussion to make this improvement plan clearer.
>
> Hi, @Jark, @Rui, and @Timo, I'm collecting the final discussion summaries
> now and want to confirm one thing that for the statement `USE MODULES x [,
> y, z, ...]`, if the module name list contains an unexsited module, shall we
> #1 fail the execution for all of them or #2 enabled the rest modules and
> return a warning to users? My personal preference goes to #1 for
> simplicity. What do you think?
>
> Best,
> Jane
>
> On Tue, Feb 2, 2021 at 3:53 PM Timo Walther  wrote:
>
> > +1
> >
> > @Jane Can you summarize our discussion in the JIRA issue?
> >
> > Thanks,
> > Timo
> >
> >
> > On 02.02.21 03:50, Jark Wu wrote:
> > > Hi Timo,
> > >
> > >> Another question is whether a LOAD operation also adds the module to
> the
> > > enabled list by default?
> > >
> > > I would like to add the module to the enabled list by default, the main
> > > reasons are:
> > > 1) Reordering is an advanced requirement, adding modules needs
> additional
> > > USE statements with "core" module
> > >   sounds too burdensome. Most users should be satisfied with only LOAD
> > > statements.
> > > 2) We should keep compatible for TableEnvironment#loadModule().
> > > 3) We are using the LOAD statement instead of CREATE, so I think it's
> > fine
> > > that it does some implicit things.
> > >
> > > Best,
> > > Jark
> > >
> > > On Tue, 2 Feb 2021 at 00:48, Timo Walther  wrote:
> > >
> > >> Not the module itself but the ModuleManager should handle this case,
> > yes.
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >>
> > >> On 01.02.21 17:35, Jane Chan wrote:
> > >>> +1 to Jark's proposal
> > >>>
> > >>>To make it clearer,  will `module#getFunctionDefinition()` return
> > empty
> > >>> suppose the module is loaded but not enabled?
> > >>>
> > >>> Best,
> > >>> Jane
> > >>>
> > >>> On Mon, Feb 1, 2021 at 10:02 PM Timo Walther 
> > wrote:
> > >>>
> > >>>> +1 to Jark's proposal
> > >>>>
> > >>>> I like the difference between just loading and actually enabling
> these
> > >>>> modules.
> > >>>>
> > >>>> @Rui: I would use the same behavior as catalogs here. You cannot
> > `USE` a
> > >>>> catalog without creating it before.
> > >>>>
> > >>>> Another question is whether a LOAD operation also adds the module to
> > the
> > >>>> enabled list by default?
> > >>>>
> > >>>> Regards,
> > >>>> Timo
> > >>>>
> > >>>> On 01.02.21 13:52, Rui Li wrote:
> > >>>>> If `USE MODULES` implies unloading modules that are not listed,
> does
> > it
> > >>>>> also imply loading modules that are not previously loaded,
> especially
> > >>>> since
> > >>>>> we're mapping modules by name now?
> > >>>>>
> > >>>>> On Mon, Feb 1, 2021 at 8:20 PM Jark Wu  wrote:
> > >>>>>
> > >>>>>> I agree with Timo that the USE implies the specified modules are
> in
> > >> use
> > >>>> in
> > >>>>>> the specified order and others are not used.
> > >>>>>> This would be easier to know what's the result list and order
> after
> > >> the
> > >>>> USE
> > >>>>>> statement.
> > >>>>>> That means: if current modules in order are x, y, z. And `USE
> > MODULES
> > >>>> z, y`
> > >>>>>> means current modules in order are z, y.
> > >>>>>>
> > >>>>>> But I would like to not unload the unmentioned modules in the USE
> > >>>>>> statement. Because it seems strange that USE
> > >>>>>> will implicitly remove modules. In the above example, the user may
> > >> type
> > >>>> the
> > >>>>>> wrong modules list using USE by mistake
> > >>>>&

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

2021-02-02 Thread Jark Wu

Thanks Rui for the great proposal, I believe this can be very attractive
for many Hive users.

The FLIP looks good to me in general, I only have some minor comments:

1) BlinkParserFactory
IIUC, BlinkParserFactory is located in the flink-table-api-java module with
the Parser interface there.
I suggest renaming it to `ParserFactory`, because it creates Parser instead
of BlinkParser.
And the implementations can be `HiveParserFactory` and
`FlinkParserFactory`.
I think we should avoid the `Blink` keyword in interfaces, blink planner is
already the default planner and
the old planner will be removed in the near future. There will be no
`blink` in the future then.

2) "create a new instance each time getParser is called"
Finding parser for every time getParser is called sounds heavy to me. I
think we can improve this by simplify
caching the Parser instance,  and creating a new one if current sql-dialect
is different from the cached Parser.

3) Hive version
How much code needs to be done to support new features in 3.x based on 2.x?
Is this also included in this FLIP/release?
I don't fully understand this because the FLIP says "we can use a newer
version to support older versions."

Best,
Jark

On Wed, 3 Feb 2021 at 11:48, godfrey he  wrote:

> Thanks for bringing up the discussion, Rui!
>
> Regarding the DDL part in the "Introduce HiveParser" section,
> I would like to choose the second option. Because if we could
> use one hive parser to parse all hive SQLs, we need not to copy
> Calcite parser code, and the framework and the code will be very simple.
>
> Regarding the "Go Beyond Hive" section, is that the scope of this FLIP ?
> Could you list all the extensions and give some examples ?
>
> One minor suggestion about the name of ParserImplFactory.
> How about renaming ParserImplFactory to DefaultParserFactory ?
>
> Best,
> Godfrey
>
> Rui Li  于2021年2月3日周三 上午11:16写道：
>
> > Hi Jingsong,
> >
> > Thanks for your comments and they're very good questions.
> >
> > Regarding # Version, we need to do some tradeoff here. Choosing the
> latest
> > 3.x will cover all the features we want to support. But as you said, 3.x
> > and 2.x can have some differences and requires more efforts to support
> > lower versions. I decided to pick 2.x and evolve from there to support
> new
> > features in 3.x. Because I think most hive users, especially those who
> are
> > likely to be interested in this feature, are still using 2.x or even 1.x.
> > So the priority is to cover 2.x and 1.x first.
> >
> > Regarding # Hive Codes, in my PoC, I simply copy the code and make as few
> > changes as possible. I believe we can do some clean up or refactor to
> > reduce it. With that in mind, I expect it to be over 10k lines of java
> > code, and even more if we count ANTLR grammar files as well.
> >
> > Regarding # Functions, you're right that HiveModule is more of a solution
> > than limitation. I just want to emphasize that HiveModule and hive
> dialect
> > need to be used together to achieve better compatibility.
> >
> > Regarding # Keywords, new hive versions can have more reserved keywords
> > than old versions. Since we're based on hive 2.x code, it may not provide
> > 100% keyword-compatibility to 1.x users. But I expect it to be good
> enough
> > for most cases. If not, we can provide different grammar files for lower
> > versions.
> >
> > On Tue, Feb 2, 2021 at 5:10 PM Jingsong Li 
> wrote:
> >
> > > Thanks Rui for the proposal, I think this FLIP is required by many
> users,
> > > and it is very good to traditional Hive users. I have some confusion:
> > >
> > > # Version
> > >
> > > Which Hive version do you want to choose? Maybe, Hive 3.X and Hive 2.X
> > have
> > > some differences?
> > >
> > > # Hive Codes
> > >
> > > Can you evaluate how much code we need to copy to our
> > flink-hive-connector?
> > > Do we need to change them? We need to maintain them anyway.
> > >
> > > # Functions
> > >
> > > About Hive functions, I don't think it is a limitation, we are using
> > > HiveModule to be compatible with Hive, right? So it is a solution
> instead
> > > of a limitation.
> > >
> > > # Keywords
> > >
> > > Do you think there will be a keyword problem? Or can we be 100%
> > compatible
> > > with Hive?
> > >
> > > On the whole, the FLIP looks very good and I'm looking forward to it.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Fri, Dec 11, 2020 at 11:35 AM Zhijiang
> > >  wrote:
> > >
> > > > Thanks for the further info and explanations! I have no other
> concerns.
> > > >
> > > > Best,
> > > > Zhijiang
> > > >
> > > >
> > > > --
> > > > From:Rui Li 
> > > > Send Time:2020年12月10日(星期四) 20:35
> > > > To:dev ; Zhijiang 
> > > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
> > > >
> > > > Hi Zhijiang,
> > > >
> > > > Glad to know you're interested in this FLIP. I wouldn't claim 100%
> > > > compatibility with this FLIP. That's because Flink doesn't have the
> > >

Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior

2021-02-02 Thread Jark Wu

ever),
> >  we could offer an option to make them happy. If it turns out that we had
> > wrong estimation about the user's
> > expectation, we should change the default behavior.
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Feb 2, 2021 at 4:46 PM Kurt Young  wrote:
> >
> > > Hi Timo,
> > >
> > > I don't think batch-stream unification can deal with all the cases,
> > > especially if
> > > the query involves some non deterministic functions.
> > >
> > > No matter we choose any options, these queries will have
> > > different results.
> > > For example, if we run the same query in batch mode multiple times,
> it's
> > > also
> > > highly possible that we get different results. Does that mean all the
> > > database
> > > vendors can't deliver batch-batch unification? I don't think so.
> > >
> > > What's really important here is the user's intuition. What do users
> > expect
> > > if
> > > they don't read any documents about these functions. For batch users, I
> > > think
> > > it's already clear enough that all other systems and databases will
> > > evaluate
> > > these functions during query start. And for streaming users, I have
> > > already seen
> > > some users are expecting these functions to be calculated per record.
> > >
> > > Thus I think we can make the behavior determined together with
> execution
> > > mode.
> > > One exception would be PROCTIME(), I think all users would expect this
> > > function
> > > will be calculated for each record. I think SYS_CURRENT_TIMESTAMP is
> > > similar
> > > to PROCTIME(), so we don't have to introduce it.
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Tue, Feb 2, 2021 at 4:20 PM Timo Walther 
> wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> I'm not sure if we should introduce the `auto` mode. Taking all the
> > >> previous discussions around batch-stream unification into account,
> batch
> > >> mode and streaming mode should only influence the runtime efficiency
> and
> > >> incremental computation. The final query result should be the same in
> > >> both modes. Also looking into the long-term future, we might drop the
> > >> mode property and either derive the mode or use different modes for
> > >> parts of the pipeline.
> > >>
> > >> "I think we may need to think more from the users' perspective."
> > >>
> > >> I agree here and that's why I actually would like to let the user
> decide
> > >> which semantics are needed. The config option proposal was my least
> > >> favored alternative. We should stick to the standard and bahavior of
> > >> other systems. For both batch and streaming. And use a simple prefix
> to
> > >> let users decide whether the semantics are per-record or per-query:
> > >>
> > >> CURRENT_TIMESTAMP   -- semantics as all other vendors
> > >>
> > >>
> > >> _CURRENT_TIMESTAMP  -- semantics per record
> > >>
> > >> OR
> > >>
> > >> SYS_CURRENT_TIMESTAMP  -- semantics per record
> > >>
> > >>
> > >> Please check how other vendors are handling this:
> > >>
> > >> SYSDATE  MySql, Oracle
> > >> SYSDATETIME  SQL Server
> > >>
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >>
> > >> On 02.02.21 07:02, Jingsong Li wrote:
> > >> > +1 for the default "auto" to the
> > "table.exec.time-function-evaluation".
> > >> >
> > >> >>From the definition of these functions, in my opinion:
> > >> > - Batch is the instant execution of all records, which is the
> meaning
> > of
> > >> > the word "BATCH", so there is only one time at query-start.
> > >> > - Stream only executes a single record in a moment, so time is
> > >> generated by
> > >> > each record.
> > >> >
> > >> > On the other hand, we should be more careful about consistency with
> > >> other
> > >> > systems.
> > >> >
> > >> > Best,
> > >> > Jingsong
> > >> >
> > >> > On Tue, Feb 2, 2021 at 11:24 AM

Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior

2021-02-01 Thread Jark Wu

gt; LOCALTIME/LOCALDATE and
> >>>>>>>>>> LOCALTIMESTAMP for completeness.
> >>>>>>>>>> Yes, LOCALTIMESTAMP returns TIMESTAMP, LOCALTIME returns TIME,
> the
> >>>>>>>>>> behavior of them is clear so I just listed them in the excel[1]
> of
> >>>> this
> >>>>>>>>>> FLIP references.
> >>>>>>>>>>
> >>>>>>>>>>> 2) Shall we add aliases for the timestamp types as part of this
> >>>> FLIP? I
> >>>>>>>>>> see Snowflake supports TIMESTAMP_LTZ , TIMESTAMP_NTZ ,
> TIMESTAMP_TZ
> >>>> [1]. I
> >>>>>>>>>> think the discussion was quite cumbersome with the full string
> of
> >>>>>>>>>> `TIMESTAMP WITH LOCAL TIME ZONE`. With this FLIP we are making
> this
> >>>> type
> >>>>>>>>>> even more prominent. And important concepts should have a short
> name
> >>>>>>>>>> because they are used frequently. According to the FLIP, we are
> >>>> introducing
> >>>>>>>>>> the abbriviation already in function names like
> `TO_TIMESTAMP_LTZ`.
> >>>>>>>>>> `TIMESTAMP_LTZ` could be treated similar to `STRING` for
> >>>>>>>>>> `VARCHAR(MAX_INT)`, the serializable string representation would
> >>>> not change.
> >>>>>>>>>>
> >>>>>>>>>> @Timo @Jark
> >>>>>>>>>> Nice idea, I also suffered from the long name during the
> >>>> discussions, the
> >>>>>>>>>> abbreviation will not only help us, but also makes it more
> >>>> convenient for
> >>>>>>>>>> users. I list the abbreviation name mapping to support:
> >>>>>>>>>> TIMESTAMP WITHOUT TIME ZONE <=> TIMESTAMP_NTZ   (which
> >>>> synonyms
> >>>>>>>>>> TIMESTAMP)
> >>>>>>>>>> TIMESTAMP WITH LOCAL TIME ZONE<=> TIMESTAMP_LTZ
> >>>>>>>>>> TIMESTAMP WITH TIME ZONE <=> TIMESTAMP_TZ
> >>>>   (supports
> >>>>>>>>>> them in the future)
> >>>>>>>>>>> 3) I'm fine with supporting all conversion classes like
> >>>>>>>>>> java.time.LocalDateTime, java.sql.Timestamp that TimestampType
> >>>> supported
> >>>>>>>>>> for LocalZonedTimestampType. But we agree that Instant stays the
> >>>> default
> >>>>>>>>>> conversion class right? The default extraction defined in [2]
> will
> >>>> not
> >>>>>>>>>> change, correct?
> >>>>>>>>>> Yes, Instant stays the default conversion class. The default
> >>>>>>>>>>
> >>>>>>>>>>> 4) I would remove the comment "Flink supports TIME-related
> types
> >>>> with
> >>>>>>>>>> precision well", because unfortunately this is still not
> correct.
> >>>> We still
> >>>>>>>>>> have issues with TIME(9), it would be great if someone can
> finally
> >>>> fix that
> >>>>>>>>>> though. Maybe the implementation of this FLIP would be a good
> time
> >>>> to fix
> >>>>>>>>>> this issue.
> >>>>>>>>>> You’re right, TIME(9) is not supported yet, I'll take account of
> >>>> TIME(9)
> >>>>>>>>>> to the scope of this FLIP.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I’ve updated this FLIP[2] according your suggestions @Jark @Timo
> >>>>>>>>>> I’ll start the vote soon if there’re no objections.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Leonard
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>>>
> https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-01 Thread Jark Wu

Hi Timo,

> Another question is whether a LOAD operation also adds the module to the
enabled list by default?

I would like to add the module to the enabled list by default, the main
reasons are:
1) Reordering is an advanced requirement, adding modules needs additional
USE statements with "core" module
 sounds too burdensome. Most users should be satisfied with only LOAD
statements.
2) We should keep compatible for TableEnvironment#loadModule().
3) We are using the LOAD statement instead of CREATE, so I think it's fine
that it does some implicit things.

Best,
Jark

On Tue, 2 Feb 2021 at 00:48, Timo Walther  wrote:

> Not the module itself but the ModuleManager should handle this case, yes.
>
> Regards,
> Timo
>
>
> On 01.02.21 17:35, Jane Chan wrote:
> > +1 to Jark's proposal
> >
> >   To make it clearer,  will `module#getFunctionDefinition()` return empty
> > suppose the module is loaded but not enabled?
> >
> > Best,
> > Jane
> >
> > On Mon, Feb 1, 2021 at 10:02 PM Timo Walther  wrote:
> >
> >> +1 to Jark's proposal
> >>
> >> I like the difference between just loading and actually enabling these
> >> modules.
> >>
> >> @Rui: I would use the same behavior as catalogs here. You cannot `USE` a
> >> catalog without creating it before.
> >>
> >> Another question is whether a LOAD operation also adds the module to the
> >> enabled list by default?
> >>
> >> Regards,
> >> Timo
> >>
> >> On 01.02.21 13:52, Rui Li wrote:
> >>> If `USE MODULES` implies unloading modules that are not listed, does it
> >>> also imply loading modules that are not previously loaded, especially
> >> since
> >>> we're mapping modules by name now?
> >>>
> >>> On Mon, Feb 1, 2021 at 8:20 PM Jark Wu  wrote:
> >>>
> >>>> I agree with Timo that the USE implies the specified modules are in
> use
> >> in
> >>>> the specified order and others are not used.
> >>>> This would be easier to know what's the result list and order after
> the
> >> USE
> >>>> statement.
> >>>> That means: if current modules in order are x, y, z. And `USE MODULES
> >> z, y`
> >>>> means current modules in order are z, y.
> >>>>
> >>>> But I would like to not unload the unmentioned modules in the USE
> >>>> statement. Because it seems strange that USE
> >>>> will implicitly remove modules. In the above example, the user may
> type
> >> the
> >>>> wrong modules list using USE by mistake
> >>>>and would like to declare the list again, the user has to create
> the
> >>>> module again with some properties he may don't know. Therefore, I
> >> propose
> >>>> the USE statement just specifies the current module lists and doesn't
> >>>> unload modules.
> >>>> Besides that, we may need a new syntax to list all the modules
> including
> >>>> not used but loaded.
> >>>> We can introduce SHOW FULL MODULES for this purpose with an additional
> >>>> `used` column.
> >>>>
> >>>> For example:
> >>>>
> >>>> Flink SQL> list modules:
> >>>> ---
> >>>> | modules |
> >>>> ---
> >>>> | x   |
> >>>> | y   |
> >>>> | z   |
> >>>> ---
> >>>> Flink SQL> USE MODULES z, y;
> >>>> Flink SQL> show modules:
> >>>> ---
> >>>> | modules |
> >>>> ---
> >>>> | z   |
> >>>> | y   |
> >>>> ---
> >>>> Flink SQL> show FULL modules;
> >>>> ---
> >>>> | modules |  used |
> >>>> ---
> >>>> | z   | true  |
> >>>> | y   | true  |
> >>>> | x   | false |
> >>>> ---
> >>>> Flink SQL> USE MODULES z, y, x;
> >>>> Flink SQL> show modules;
> >>>> ---
> >>>> | modules |
> >>>> ---
> >>>> | z   |
> >>>> | y   |
> >>>> | x   |
> >>>> ---
> >>>>
> >>>> What do you think?
> >>>>
> >>>> Best,
&g

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-01 Thread Jark Wu

I agree with Timo that the USE implies the specified modules are in use in
the specified order and others are not used.
This would be easier to know what's the result list and order after the USE
statement.
That means: if current modules in order are x, y, z. And `USE MODULES z, y`
means current modules in order are z, y.

But I would like to not unload the unmentioned modules in the USE
statement. Because it seems strange that USE
will implicitly remove modules. In the above example, the user may type the
wrong modules list using USE by mistake
 and would like to declare the list again, the user has to create the
module again with some properties he may don't know. Therefore, I propose
the USE statement just specifies the current module lists and doesn't
unload modules.
Besides that, we may need a new syntax to list all the modules including
not used but loaded.
We can introduce SHOW FULL MODULES for this purpose with an additional
`used` column.

For example:

Flink SQL> list modules:
---
| modules |
---
| x   |
| y   |
| z   |
---
Flink SQL> USE MODULES z, y;
Flink SQL> show modules:
---
| modules |
---
| z   |
| y   |
---
Flink SQL> show FULL modules;
---
| modules |  used |
---
| z   | true  |
| y   | true  |
| x   | false |
---
Flink SQL> USE MODULES z, y, x;
Flink SQL> show modules;
---
| modules |
---
| z   |
| y   |
| x   |
---

What do you think?

Best,
Jark

On Mon, 1 Feb 2021 at 19:02, Jane Chan  wrote:

> Hi Timo, thanks for the discussion.
>
> It seems to reach an agreement regarding #3 that <1> Module name should
> better be a simple identifier rather than a string literal. <2> Property
> `type` is redundant and should be removed, and mapping will rely on the
> module name because loading a module multiple times just using a different
> module name doesn't make much sense. <3> We should migrate to the newer API
> rather than the deprecated `TableFactory` class.
>
> Regarding #1, I think the point lies in whether changing the resolution
> order implies an `unload` operation explicitly (i.e., users could sense
> it). What do others think?
>
> Best,
> Jane
>
> On Mon, Feb 1, 2021 at 6:41 PM Timo Walther  wrote:
>
> > IMHO I would rather unload the not mentioned modules. The statement
> > expresses `USE` that implicilty implies that the other modules are "not
> > used". What do others think?
> >
> > Regards,
> > Timo
> >
> >
> > On 01.02.21 11:28, Jane Chan wrote:
> > > Hi Jark and Rui,
> > >
> > > Thanks for the discussions.
> > >
> > > Regarding #1, I'm fine with `USE MODULES` syntax, and
> > >
> > >> It can be interpreted as "setting the current order of modules", which
> > is
> > >> similar to "setting the current catalog" for `USE CATALOG`.
> > >>
> > > I would like to confirm that the unmentioned modules remain in the same
> > > relative order? E.g., if there are three loaded modules `X`, `Y`, `Z`,
> > then
> > > `USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`.
> > >
> > > Regarding #3, I'm fine with mapping modules purely by name, and I think
> > > Jark raised a good point on making the module name a simple identifier
> > > instead of a string literal. For backward compatibility, since we
> haven't
> > > supported this syntax yet, the affected users are those who defined
> > modules
> > > in the YAML configuration file. Maybe we can eliminate the 'type' from
> > the
> > > 'requiredContext' to make it optional. Thus the proposed mapping
> > mechanism
> > > could use the module name to lookup the suitable factory,  and in the
> > > meanwhile updating documentation to encourage users to simplify their
> > YAML
> > > configuration. And in the long run, we can deprecate the 'type'.
> > >
> > > Best,
> > > Jane
> > >
> > > On Mon, Feb 1, 2021 at 4:19 PM Rui Li  wrote:
> > >
> > >> Thanks Jane for starting the discussion.
> > >>
> > >> Regarding #1, I also prefer `USE MODULES` syntax. It can be
> interpreted
> > as
> > >> "setting the current order of modules", which is similar to "setting
> the
> > >> current catalog" for `USE CATALOG`.
> > >>
> > >> Regarding #3, I'm fine to map modules purely by name because I think
> it
> > >> satisfies all the use cases we have at hand. But I guess we need to
> make
> >

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-01-31 Thread Jark Wu

Thanks Jane for the summary and starting the discussion in the mailing
list.

Here are my thoughts:

1) syntax to reorder modules
I agree with Rui Li it would be quite useful if we can have some syntax to
reorder modules.
I slightly prefer `USE MODULES x, y, z` than `RELOAD MODULES x, y, z`,
because USE has a more sense of effective and specifying ordering, than
RELOAD.
>From my feeling, RELOAD just means we unregister and register x,y,z modules
again,
it sounds like other registered modules are still in use and in the order.

3) mapping modules purely by name
This can definitely improve the usability of loading modules, because
the 'type=' property
looks really redundant. We can think of this as a syntax sugar that the
default type value is the module name.
And we can support to specify 'type=' property in the future to allow
multiple modules for one module type.

Besides, I would like to mention one more change, that the module name
proposed in FLIP-68 is a string literal.
But I think we are all on the same page to change it into a simple
(non-compound) identifier.

LOAD/UNLOAD MODULE 'core'
==>
LOAD/UNLOAD MODULE core


Best,
Jark


On Sat, 30 Jan 2021 at 04:00, Jane Chan  wrote:

> Hi everyone,
>
> I would like to start a discussion on FLINK-21045 [1] about supporting
> `LOAD MODULE` and `UNLOAD MODULE` SQL syntax. It's first proposed by
> FLIP-68 [2] as following.
>
> -- load a module with the given name and append it to the end of the module
> list
> LOAD MODULE 'name' [WITH ('type'='xxx', 'prop'='myProp', ...)]
>
> --unload a module by name from the module list and other modules remain in
> the same relative positions
> UNLOAD MODULE 'name'
>
> After a round of discussion on the Jira ticket, it seems some unanswered
> questions need more opinions and suggestions.
>
> 1. The way to redefine resolution order easily
>
> Rui Li suggested introducing `USE MODULES` and adding similar
> functionality to the API because
>
> >  1) It's very tedious to unload old modules just to reorder them.
>
>  2) Users may not even know how to "re-load" an old module if it was not
> > initially loaded by the user, e.g. don't know which type to use.
>
>
> Jane Chan wondered that module is not like the catalog which has a
> concept of namespace could specify, and `USE` sounds like a
> mutual-exclusive concept.
> Maybe `RELOAD MODULES` can express upgrading the priority of the loaded
> module(s).
>
>
> 2. `LOAD/UNLOAD MODULE` v.s. `CREATE/DROP MODULE` syntax
> Jark Wu and Nicholas Jiang proposed to use `CREATE/DROP MODULE` instead
> of `LOAD/UNLOAD MODULE` because
>
> >  1) From a pure SQL user's perspective, maybe `CREATE MODULE + USE
> MODULE`
> > is easier to use rather than `LOAD/UNLOAD`.
> >  2) This will be very similar to what the catalog used now.
>
>
>   Timo Walther would rather stick to the agreed design because
> loading/unloading modules is a concept known from kernels etc.
>
> 3. Simplify the module design by mapping modules purely by name
>
> LOAD MODULE geo_utils
> LOAD MODULE hive WITH ('version'='2.1')  -- no dedicated 'type='/'module='
> but allow only 1 module to be loaded parameterized
> UNLOAD hive
> USE MODULES hive, core
>
>
> Please find more details in the reference link. Looking forward to your
> feedback.
>
> [1] https://issues.apache.org/jira/browse/FLINK-21045#
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules
> >
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules
>
> Best,
> Jane
>

Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior

2021-01-28 Thread Jark Wu

I have a minor suggestion:

I think we may not need to introduce TIMESTAMP_NTZ, we already have the
shortcut
type TIMESTAMP for TIMESTAMP WITHOUT TIME ZONE. I think we will still
suggest
 users use TIMESTAMP even if we have TIMESTAMP_NTZ. Then it seems
introducing
TIMESTAMP_NTZ doesn't help much for users, but introduces more learning
costs.

Best,
Jark





On Thu, 28 Jan 2021 at 18:52, Leonard Xu  wrote:

> Thanks all for sharing your opinions.
>
> Looks like  we’ve reached a consensus about the topic.
>
> @Timo:
> > 1) Are we on the same page that LOCALTIMESTAMP returns TIMESTAMP and not
> TIMESTAMP_LTZ? Maybe we should quickly list also LOCALTIME/LOCALDATE and
> LOCALTIMESTAMP for completeness.
> Yes, LOCALTIMESTAMP returns TIMESTAMP, LOCALTIME returns TIME, the
> behavior of them is clear so I just listed them in the excel[1] of this
> FLIP references.
>
> > 2) Shall we add aliases for the timestamp types as part of this FLIP? I
> see Snowflake supports TIMESTAMP_LTZ , TIMESTAMP_NTZ , TIMESTAMP_TZ [1]. I
> think the discussion was quite cumbersome with the full string of
> `TIMESTAMP WITH LOCAL TIME ZONE`. With this FLIP we are making this type
> even more prominent. And important concepts should have a short name
> because they are used frequently. According to the FLIP, we are introducing
> the abbriviation already in function names like `TO_TIMESTAMP_LTZ`.
> `TIMESTAMP_LTZ` could be treated similar to `STRING` for
> `VARCHAR(MAX_INT)`, the serializable string representation would not change.
>
> @Timo @Jark
> Nice idea, I also suffered from the long name during the discussions, the
> abbreviation will not only help us, but also makes it more convenient for
> users. I list the abbreviation name mapping to support:
> TIMESTAMP WITHOUT TIME ZONE <=> TIMESTAMP_NTZ   (which synonyms
> TIMESTAMP)
> TIMESTAMP WITH LOCAL TIME ZONE<=> TIMESTAMP_LTZ
> TIMESTAMP WITH TIME ZONE <=> TIMESTAMP_TZ (supports
> them in the future)
> > 3) I'm fine with supporting all conversion classes like
> java.time.LocalDateTime, java.sql.Timestamp that TimestampType supported
> for LocalZonedTimestampType. But we agree that Instant stays the default
> conversion class right? The default extraction defined in [2] will not
> change, correct?
> Yes, Instant stays the default conversion class. The default
>
> > 4) I would remove the comment "Flink supports TIME-related types with
> precision well", because unfortunately this is still not correct. We still
> have issues with TIME(9), it would be great if someone can finally fix that
> though. Maybe the implementation of this FLIP would be a good time to fix
> this issue.
> You’re right, TIME(9) is not supported yet, I'll take account of TIME(9)
> to the scope of this FLIP.
>
>
> I’ve updated this FLIP[2] according your suggestions @Jark @Timo
> I’ll start the vote soon if there’re no objections.
>
> Best,
> Leonard
>
> [1]
> https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing
> <
> https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing
> >
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-162:+Consistent+Flink+SQL+time+function+behavior>
>
>
> >
> > On 28.01.21 03:18, Jark Wu wrote:
> >> Thanks Leonard for the further investigation.
> >> I think we all agree we should correct the return value of
> >> CURRENT_TIMESTAMP.
> >> Regarding the return type of CURRENT_TIMESTAMP, I also agree
> TIMESTAMP_LTZ
> >> would be more worldwide useful. This may need more effort, but if this
> is
> >> the right direction, we should do it.
> >> Regarding the CURRENT_TIME, if CURRENT_TIMESTAMP returns
> >>  TIMESTAMP_LTZ, then I think CURRENT_TIME shouldn't return TIME_TZ.
> >> Otherwise, CURRENT_TIME will be quite special and strange.
> >> Thus I think it has to return TIME type. Given that we already have
> >> CURRENT_DATE which returns
> >>  DATE WITHOUT TIME ZONE, I think it's fine to return TIME WITHOUT TIME
> ZONE
> >> for CURRENT_TIME.
> >> In a word, the updated FLIP looks good to me. I especially like the
> >> proposed new function TO_TIMESTAMP_LTZ(numeric, [,scale]).
> >> This will be very convenient to define rowtime on a long value which is
> a
> >> very common case and has been complained a lot in mailing list.
> >> Best,
> >> Jark
> >> On Mon, 25 Jan 2021 at 21:12, Kurt Young  wrote:
> >&g

[jira] [Created] (FLINK-21191) Support reducing buffer for upsert-kafka sink

2021-01-28 Thread Jark Wu (Jira)

Jark Wu created FLINK-21191:
---

 Summary: Support reducing buffer for upsert-kafka sink
 Key: FLINK-21191
 URL: https://issues.apache.org/jira/browse/FLINK-21191
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / Kafka
Reporter: Jark Wu
 Fix For: 1.13.0


Currently, if there is a job agg -> filter -> upsert-kafka, then upsert-kafka 
will receive -U and +U for every updates instead of only a +U. This will 
produce a lot of tombstone messages in Kafka. It's not just about the 
unnecessary data volume in Kafka, but users may processes that trigger side 
effects when a tombstone records is ingested from a Kafka topic. 

A simple solution would be add a reducing buffer for the upsert-kafka, to 
reduce the -U and +U before emitting to the underlying sink. This should be 
very similar to the implementation of upsert JDBC sink. 

We can even extract the reducing logic out of the JDBC connector and it can be 
reused by other connectors. 
This should be something like `BufferedUpsertSinkFunction` which has a reducing 
buffer and flush to the underlying SinkFunction
once checkpointing or buffer timeout. We can put it in `flink-connector-base` 
which can be shared for builtin connectors and custom connectors. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior

2021-01-28 Thread Jark Wu

+1 to have shortcut types TIMESTAMP_LTZ, TIMESTAMP_TZ.

Best,
Jark


On Thu, 28 Jan 2021 at 17:32, Timo Walther  wrote:

> Hi Leonard,
>
> thanks for the great summary and the updated FLIP. I think using
> TIMESTAMP_LTZ for CURRENT_TIMESTAMP/PROCTIME/ROWTIME is a good long-term
> solution. I also discussed this with people of different backgrounds
> internally and everybody seems to agree to the proposed design. I hope
> we can have a stable implementation in 1.13 because a lot of locations
> will be touched for this change: time attributes, watermark generators,
> connectors, formats, converters, functions, windows.
>
> The FLIP is in a very good shape. I think we can start a voting soon if
> there are no objections. I have some last comments:
>
> 1) Are we on the same page that LOCALTIMESTAMP returns TIMESTAMP and not
> TIMESTAMP_LTZ? Maybe we should quickly list also LOCALTIME/LOCALDATE and
> LOCALTIMESTAMP for completeness.
>
> 2) Shall we add aliases for the timestamp types as part of this FLIP? I
> see Snowflake supports TIMESTAMP_LTZ , TIMESTAMP_NTZ , TIMESTAMP_TZ [1].
> I think the discussion was quite cumbersome with the full string of
> `TIMESTAMP WITH LOCAL TIME ZONE`. With this FLIP we are making this type
> even more prominent. And important concepts should have a short name
> because they are used frequently. According to the FLIP, we are
> introducing the abbriviation already in function names like
> `TO_TIMESTAMP_LTZ`. `TIMESTAMP_LTZ` could be treated similar to `STRING`
> for `VARCHAR(MAX_INT)`, the serializable string representation would not
> change.
>
> 3) I'm fine with supporting all conversion classes like
> java.time.LocalDateTime, java.sql.Timestamp that TimestampType supported
>   for LocalZonedTimestampType. But we agree that Instant stays the
> default conversion class right? The default extraction defined in [2]
> will not change, correct?
>
> 4) I would remove the comment "Flink supports TIME-related types with
> precision well", because unfortunately this is still not correct. We
> still have issues with TIME(9), it would be great if someone can finally
> fix that though. Maybe the implementation of this FLIP would be a good
> time to fix this issue.
>
> Regards,
> Timo
>
>
> [1]
>
> https://docs.snowflake.com/en/sql-reference/data-types-datetime.html#timestamp-ltz-timestamp-ntz-timestamp-tz
>
> [2]
>
> https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/types/utils/ClassDataTypeConverter.java
>
> On 28.01.21 03:18, Jark Wu wrote:
> > Thanks Leonard for the further investigation.
> >
> > I think we all agree we should correct the return value of
> > CURRENT_TIMESTAMP.
> > Regarding the return type of CURRENT_TIMESTAMP, I also agree
> TIMESTAMP_LTZ
> > would be more worldwide useful. This may need more effort, but if this is
> > the right direction, we should do it.
> >
> > Regarding the CURRENT_TIME, if CURRENT_TIMESTAMP returns
> >   TIMESTAMP_LTZ, then I think CURRENT_TIME shouldn't return TIME_TZ.
> > Otherwise, CURRENT_TIME will be quite special and strange.
> > Thus I think it has to return TIME type. Given that we already have
> > CURRENT_DATE which returns
> >   DATE WITHOUT TIME ZONE, I think it's fine to return TIME WITHOUT TIME
> ZONE
> > for CURRENT_TIME.
> >
> > In a word, the updated FLIP looks good to me. I especially like the
> > proposed new function TO_TIMESTAMP_LTZ(numeric, [,scale]).
> > This will be very convenient to define rowtime on a long value which is a
> > very common case and has been complained a lot in mailing list.
> >
> >
> > Best,
> > Jark
> >
> >
> >
> >
> >
> > On Mon, 25 Jan 2021 at 21:12, Kurt Young  wrote:
> >
> >> Thanks Leonard for the detailed response and also the bad case about
> option
> >> 1, these all
> >> make sense to me.
> >>
> >> Also nice catch about conversion support of LocalZonedTimestampType, I
> >> think it actually
> >> makes sense to support java.sql.Timestamp as well as
> >> java.time.LocalDateTime. It also has
> >> a slight benefit that we might have a chance to run the udf which took
> them
> >> as input parameter
> >> after we change the return type.
> >>
> >> Regarding to the return type of CURRENT_TIME, I also think timezone
> >> information is not useful.
> >> To not expand this FLIP further, I'm lean to keep it as it is.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Mon, Jan 25, 2021 at

[jira] [Created] (FLINK-21176) Translate updates on Confluent Avro Format page

2021-01-27 Thread Jark Wu (Jira)

Jark Wu created FLINK-21176:
---

 Summary: Translate updates on Confluent Avro Format page
 Key: FLINK-21176
 URL: https://issues.apache.org/jira/browse/FLINK-21176
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Documentation
Reporter: Jark Wu


We have updated examples in FLINK-20999 in commit 
2596c12f7fe6b55bfc8708e1f61d3521703225b3. We should translate the updates to 
Chinese. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior

2021-01-27 Thread Jark Wu

eans both user and Flink devs need to refactor the code(UDF,
> > builtin
> > >>>> functions, sql pipeline), to be honest, I didn’t see strong
> > motivation that
> > >>>> we have to do the pretty big refactor from user’s perspective and
> > >>>> developer’s perspective.
> > >>>>
> > >>>> In one word, both your suggestion and my proposal can resolve almost
> > all
> > >>>> user problems，the divergence is whether we need to spend pretty
> > energy just
> > >>>> to get a bit more accurate semantics?   I think we need a tradeoff.
> > >>>>
> > >>>>
> > >>>> Best,
> > >>>> Leonard
> > >>>> [1]
> > >>>>
> > https://trino.io/docs/current/functions/datetime.html#current_timestamp
> <
> > >>>>
> > https://trino.io/docs/current/functions/datetime.html#current_timestamp>
> > >>>> [2] https://issues.apache.org/jira/browse/SPARK-30374 <
> > >>>> https://issues.apache.org/jira/browse/SPARK-30374>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>>  2021-01-22，00:53，Timo Walther  ：
> > >>>>>
> > >>>>> Hi Leonard,
> > >>>>>
> > >>>>> thanks for working on this topic. I agree that time handling is not
> > >>>> easy in Flink at the moment. We added new time data types (and some
> > are
> > >>>> still not supported which even further complicates things like
> > TIME(9)). We
> > >>>> should definitely improve this situation for users.
> > >>>>>
> > >>>>> This is a pretty opinionated topic and it seems that the SQL
> standard
> > >>>> is not really deciding this but is at least supporting. So let me
> > express
> > >>>> my opinion for the most important functions:
> > >>>>>
> > >>>>> LOCALDATE / LOCALTIME / LOCALTIMESTAMP
> > >>>>>
> > >>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP
> > >>>>>
> > >>>>> I think those are the most obvious ones because the LOCAL indicates
> > >>>> that the locality should be materialized into the result and any
> time
> > zone
> > >>>> information (coming from session config or data) is not important
> > >>>> afterwards.
> > >>>>>
> > >>>>> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP
> > >>>>>
> > >>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP
> > >>>>>
> > >>>>> I'm very sceptical about this behavior. Almost all mature systems
> > >>>> (Oracle, Postgres) and new high quality systems (Presto, Snowflake)
> > use a
> > >>>> data type with some degree of time zone information encoded. In a
> > >>>> globalized world with businesses spanning different regions, I think
> > we
> > >>>> should do this as well. There should be a difference between
> > >>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be able to
> > choose
> > >>>> which behavior they prefer for their pipeline.
> > >>>>>
> > >>>>> If we would design this from scatch, I would suggest the following:
> > >>>>>
> > >>>>> - drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE /
> > >>>> LOCALTIME for materialized timestamp parts
> > >>>>>
> > >>>>> - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to
> > >>>> materialize all session time information into every record. It it
> the
> > most
> > >>>> generic data type and allows to cast to all other timestamp data
> > types.
> > >>>> This generic ability can be used for filter predicates as well
> either
> > >>>> through implicit or explicit casting.
> > >>>>>
> > >>>>> PROCTIME/ROWTIME should be time functions based on a long value.
> Both
> > >>>> System.currentMillis() and our watermark system work on long values.
> > Those
> > >>>> should return TIMESTAMP WITH LOCAL TIME ZO

[DISCUSS] FLINK-21109: Introduce "retractAccumulators" interface for AggregateFunction in Table/SQL API

2021-01-24 Thread Jark Wu

Hi all,

I would like to propose introducing a new method "retractAccumulators()" to
the `AggregateFunction` in Table/SQL.

*Motivation*

The motivation is to improve the performance of hopping (sliding) windows.
Currently, we have paned (or called sliced) optimization for the hopping
windows in Table/SQL.
That each element will only be accumulated into a single pane. And once a
window is fired,
we will merge multiple panes to get the window result.

For example, HOP(size=10s, slide=2s), a window [0, 10) consists of 5 panes
[0, 2), [2, 4), [4, 6), [6, 8), [8, 10).
And each element will fall into a single pane, e.g. element with timestamp
3 will fall into pane [2, 4).

However, currently, the merging panes happen on JVM heap memory. For
example, when window [0, 10) is going to be fired,
we will retrieve the accumulators of the 5 panes and merge them into an
in-memory accumulator.
The performance is not good, because the number of panes may be very large
when the slide is small, e.g. 8640 panes when HOP(1day, 10s).
And the memory may OOM when the accumulator is very large, e.g. count
distinct.

Thus, I would like to introduce a "retractAccumulators()" method which is
an inverse method of "merge()".
With the "retractAccumulators()" method, we can reduce the time complexity
from O(N) to O(1).
For example, when window [10, 20) is going to be fired, then we only need
to retract accumulator of pane [8, 10)
and merge the accumulator of pane [18, 20) into the state of the last
window [8, 18).

This will be a great performance improvement to make the hopping window
have similar performance
with the tumbling window, no matter how small the slide is.
And we can avoid OOM, because the merged acc is on state instead of
in-memory.

*Public Interface*

We will introduce a contract method "retractAccumulators" which is similar
to the "merge" method.

Retracts a group of accumulator instances from one accumulator instance.
This method is optional,
but implementing this method can greatly improve the performance of hopping
window aggregates.
Therefore, it is recommended to implement this method when using with
hopping windows.

param: accumulator the accumulator which will keep the retracted aggregate
results. It should
   be noted that the accumulator may contain the previous
aggregated
   results. Therefore users should not replace or clean
this instance in the
   custom retractAccumulators method.
param: retractAccs an java.lang.Iterable pointed to a group of accumulators
that will be
   retracted.

public void retractAccumulators(ACC accumulator, java.lang.Iterable
retractAccs)


What do you think?

Best,
Jark

[jira] [Created] (FLINK-21109) Introduce "retractAccumulators" interface for AggregateFunction in Table/SQL API

2021-01-24 Thread Jark Wu (Jira)

Jark Wu created FLINK-21109:
---

 Summary: Introduce "retractAccumulators" interface for 
AggregateFunction in Table/SQL API
 Key: FLINK-21109
 URL: https://issues.apache.org/jira/browse/FLINK-21109
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: Jark Wu


*Motivation*

The motivation is to improve the performance of hopping (sliding) windows.
Currently, we have paned (or called sliced) optimization for the hopping 
windows in Table/SQL. 
That each element will be accumulated into a single pane. And once a window is 
fired,
we will merge multiple panes to get the window result. 

For example, HOP(size=10s, slide=2s), a window [0, 10) consists of 5 panes [0, 
2), [2, 4), [4, 6), [6, 8), [8, 10).
And each element will fall into a single pane, e.g. element with timestamp 3 
will fall into pane [2, 4).

However, currently, the merging panes happen on JVM heap memory. For example, 
when window [0, 10) is going to be fired,
we will retrieve the accumulators of the 5 panes and merge them into an 
in-memory accumulator. 
The performance is not good, because the number of panes may be very large when 
the slide is small, e.g. 8640 panes when HOP(1day, 10s).
And the memory may OOM when the accumulator is very large, e.g. containing 
count distinct. 

Thus, I would like to introduce a "retractAccumulators()" method which is an 
inverse method of "merge()".
With the "retractAccumulators()" method, we can reduce the time complexity from 
O(N) to O(1).
For example, when window [10, 20) is going to be fired, then we only need to 
retract accumulator of pane [8, 10) 
and merge the accumulator of pane [18, 20) into the state of the last window 
[8, 18). 

This will be a great performance improvement to make the hopping window have 
similar performance 
with the tumbling window, no matter how small the slide is. 

*Public Interface*

We will introduce a contract method "retractAccumulators" which is similar to 
the "merge" method.


{code}
Retracts a group of accumulator instances from one accumulator instance. This 
method is optional, 
but implementing this method can greatly improve the performance of hopping 
window aggregates.
Therefore, it is recommended to implement this method when using with hopping 
windows. 

param: accumulator the accumulator which will keep the retracted aggregate 
results. It should
                   be noted that the accumulator may contain the previous 
aggregated
                   results. Therefore users should not replace or clean this 
instance in the
                   custom retractAccumulators method.
param: retractAccs an java.lang.Iterable pointed to a group of accumulators 
that will be
                   retracted.

public void retractAccumulators(ACC accumulator, java.lang.Iterable 
retractAccs)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] New formatting rules are now in effect

2021-01-23 Thread Jark Wu

Hi Matthias,

I also have the same problem when creating a new Java class. This is quite
annoying. Thanks for looking into it and providing a patch. The
patched plugin works well for me.

Regarding the next actions, is it possible to backport this fix to
google-java-format 1.7.0 series, and release a new version for that?  The
latest version of 1.7.0.5 is released in Apr 2020. If this doesn't work,
option #2 also sounds good to me, because users need to download the plugin
anyway.

Best,
Jark



On Sun, 24 Jan 2021 at 03:32, Matthias Pohl  wrote:

> // With this one I am curious now how many of you had the same issue
> without complaining: In the engine team there were 4 out of 4.
>
> It's about the error dialog that pops up when creating a new Java class
> file. Additionally, the Java class is generated but is not formatted
> correctly. Reformatting the file helps. Confirming the dialog is annoying,
> though. I started looking into the issue and found an
> UnsupportedOperationException that led me to the actual cause: A bug in the
> google-java-format plugin version 1.7.0.5 is causing this behavior. I
> provided a more detailed description of my findings in FLINK-21106 [1]
> including a compiled version of the patched plugin.
>
> I want to open the discussion on how we want to deal with it. I see three
> options right now:
> 1. We leave it as it is right now as we consider this to be a minor thing.
> 2. We provide the patched google-java-format plugin as part of the docs.
> 3. We upgrade to Java 11 to be able to upgrade the google-java-format
> plugin as it was already mentioned earlier in the thread.
>
> None of the above options seem to be the right one to go for. Any thoughts
> on this?
>
> Best,
> Matthias
>
> [1] https://issues.apache.org/jira/browse/FLINK-21106
>
>
>
> On Wed, Dec 30, 2020 at 11:05 AM Chesnay Schepler 
> wrote:
>
>> 1) No, it is not possible to exclude certain code blocks from formatting.
>> There is a workaround though for this case; you can add en empty comment
>> (//) to the end of a line to prevent subsequent lines from being added
>> to it.
>> https://github.com/google/google-java-format/issues/137
>>
>> Note that any spotless-specific features would like not help us anyway,
>> unless we'd be fine with not using the IntelliJ plugin.
>>
>> 2) The code style has not been updated yet.
>> The indent choices with the google-java-format is either 2 spaces for
>> everything, or 4 spaces + 8 spaces for continuations.
>> In other words, 4 spaces is simply not an option.
>>
>> On 12/30/2020 9:09 AM, Jark Wu wrote:
>> > Hi,
>> >
>> > I have played with the format plugin these days and found some problems.
>> > Maybe some of them are personal taste.
>> >
>> > 1) Is it possible to disable auto-format for some code blocks?
>> > For example, the format of code generation
>> > StructuredObjectConverter#generateCode is manually
>> >   adjusted for readability. However, the format plugin breaks it and
>> it's
>> > hard to read now.
>> > See before[1] and after[2]. cc @Timo Walther 
>> >
>> > 2) Using 4 spaces or 8 spaces for continuation indent?
>> > AOSP uses 8 spaces for continuation indent.
>> > However, Flink Code Style suggests "Each new line should have one extra
>> > indentation relative to
>> > the line of the called entity" which means 4 spaces.
>> > Personally, I think 4 spaces may be more friendly for Java lambdas.  An
>> > example:
>> >
>> > 8 spaces:
>> >
>> > wrapClassLoader(
>> >  () ->
>> >  environment
>> >  .getModules()
>> >  .forEach(
>> >  (name, entry) ->
>> >  modules.put(
>> >  name,
>> >  createModule(
>> >  entry.asMap(),
>> > classLoader;
>> >
>> > 4 spaces:
>> >
>> > wrapClassLoader(
>> >  () ->
>> >  environment
>> >  .getModules()
>> >  .forEach(
>> >  (name, entry) ->
>> >  modules.put(name, createModule(entry.asMap(),
>> > classLoader;
>> >
>> >
>> >
>> > Best,
>> > Jark
>> >
>> > [1]:
>> >
>> https://github.com

Re: Flink Table from KeyedStream

2021-01-21 Thread Jark Wu

Hi Dom,

AFAIK, Table API will apply a key partitioner based on the join key for the
join operator,
[id, data] and [numbeer, metadata] in your case. So the partitioner in the
KeyedStreaem
is not respected.

Best,
Jark

On Thu, 21 Jan 2021 at 21:39, Dominik Wosiński  wrote:

> Hey,
> I was wondering if that's currently possible to use KeyedStream to create a
> properly partitioned Table in Flink 1.11 ? I have a use case where I wanted
> to first join two streams using Flink SQL and then process them via
> *KeyedProcessFunction.* So I do something like:
>
> implicit val env = StreamExecutionEnvironment.getExecutionEnvironment
> implicit val ste = StreamTableEnvironment.create(env)
> val stream1 = env.addSource(someKafkaConsumer)
> val stream2 = env.addSource(otherKafkaConsumer)
> val table1 = ste.createTemporaryView("firstTable",
> stream1.keyBy(_.getId()), $"id", $"data", $"name")
> val table2 = ste.createTemporaryView("secondTable",
> stream2.keyBy(_.getNumber()), "$number", $"userName", $"metadata")
> ste.sqlQuery(
>   """
> |SELECT * from firstTable
> |JOIN secondTable ON id = number AND data = metadata
> |""".stripMargin
> )
>
>
> Will Table API respect the fact that I used `KeyedStream` and will it keep
> the data partitioned by the keys above ?
>
> I am asking since when after the JOIN I tried to *reinterpretAsKeyedStream
> *I
> was getting the *NullPointerException* when accessing the state inside the
> KeyedProcessFunction which suggests that the partitioning has indeed
> changed. So Is it possible to enforce partitioning when working with Table
> API ??
>
> Thanks in Advance,
> Best Regards,
> Dom.
>

[jira] [Created] (FLINK-21069) Configuration "parallelism.default" doesn't take effect for TableEnvironment#explainSql

2021-01-21 Thread Jark Wu (Jira)

Jark Wu created FLINK-21069:
---

 Summary: Configuration "parallelism.default" doesn't take effect 
for TableEnvironment#explainSql
 Key: FLINK-21069
 URL: https://issues.apache.org/jira/browse/FLINK-21069
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Jark Wu
 Fix For: 1.13.0


I tried the following test, and the printed node parallelism in json plan is 
not 5. 

{code:scala}
  @Test
  def testExplainAndExecuteSingleSink(): Unit = {
val env = TableEnvironmentImpl.create(settings)
val conf = new Configuration();
conf.setInteger("parallelism.default", 5)
env.getConfig.addConfiguration(conf)
TestTableSourceSinks.createCsvTemporarySinkTable(
  tEnv, new TableSchema(Array("first"), Array(STRING)), "MySink1")
println(tEnv.explainSql("insert into MySink1 select first from MyTable",
  ExplainDetail.JSON_EXECUTION_PLAN))
env.executeSql("insert into MySink1 select first from MyTable")
}
{code}

I think the bug is because TableEnvironemnt#explain will not invoke  
{{PlannerBase#translate(modifyOperations: util.List[ModifyOperation])}} where 
we configure the configuration into underlying {{StreamExecutionEnvironment}}.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Correct time-related function behavior in Flink SQL

2021-01-20 Thread Jark Wu

Great examples to understand the problem and the proposed changes, @Kurt!

Thanks Leonard for investigating this problem.
The time-zone problems around time functions and windows have bothered a
lot of users. It's time to fix them!

The return value changes sound reasonable to me, and keeping the return
type unchanged will minimize the surprise to the users.
Besides that, I think it would be better to mention how this affects the
window behaviors, and the interoperability with DataStream.

I think this definitely deserves a FLIP.



Hi zhisheng,

Do you have examples to illustrate which case will get the wrong window
boundaries?
That will help to verify whether the proposed changes can solve your
problem.

Best,
Jark


On Thu, 21 Jan 2021 at 12:54, zhisheng <173855...@qq.com> wrote:

> Thanks to Leonard Xu for discussing this tricky topic. At present, there
> are many Flink jobs in our production environment that are used to count
> day-level reports (eg: count PV/UV ).
>
>
> If use the default Flink SQL, the window time range of the
> statistics is incorrect, then the statistical results will naturally be
> incorrect.
>
>
> The user needs to deal with the time zone manually in order to solve the
> problem.
>
>
> If Flink itself can solve these time zone issues, then I think it will be
> user-friendly.
>
>
> Thank you
>
>
> Best!
> zhisheng
>
>
> --原始邮件--
> 发件人:
>   "dev"
> <
> xbjt...@gmail.com;
> 发送时间:2021年1月19日(星期二) 晚上6:35
> 收件人:"dev"
> 主题:Re: [DISCUSS] Correct time-related function behavior in Flink SQL
>
>
>
> I found above example format may mess up in different mail client, I post
> a picture here[1].
>
> Best,
> Leonard
>
> [1]
> https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png
> <
> https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png;
>
>
>  在 2021年1月19日，16:22，Leonard Xu  
>  Hi, all
> 
>  I want to start the discussion about correcting time-related function
> behavior in Flink SQL, this is a tricky topic but I think it’s time to
> address it.
> 
>  Currently some temporal function behaviors are wired to users.
>  1. When users use a PROCTIME() in SQL, the value of PROCTIME()
> has a timezone offset with the wall-clock time in users' local time zone,
> users need to add their local time zone offset manually to get expected
> local timestamp(e.g: Users in Germany need to +1h to get expected local
> timestamp).
> 
>  2. Users can not use
> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP to get wall-clock
> timestamp in local time zone, and thus they need write UDF in their SQL
> just for implementing a simple filter like WHERE date_col =
> CURRENT_DATE.
> 
>  3. Another common case is the time window with day
> interval based on PROCTIME(), user plan to put all data from one day into
> the same window, but the window is assigned using timestamp in UTC+0
> timezone rather than the session timezone which leads to the window starts
> with an offset(e.g: Users in China need to add -8h in their business sql
> start and then +8h when output the result, the conversion like a magic for
> users).
> 
>  These problems come from that lots of time-related functions like
> PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are
> returning time values based on UTC+0 time zone.
> 
>  This topic will lead to a comparison of the three types, i.e.
> TIMESTAMP/TIMESTAMP WITHOUT TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE and
> TIMESTAMP WITH TIME ZONE. In order to better understand the three types, I
> wrote a document[1] to help understand them better. You can also know the
> tree timestamp types behavior in Hadoop ecosystem from the reference link
> int the doc.
> 
> 
>  I Invested all Flink time-related functions current behavior and
> compared with other DB vendors like Pg,Presto, Hive, Spark,
> Snowflake, I made an excel [2] to organize them well, we can use it
> for the next discussion. Please let me know if I missed something.
>  From my investigation, I think we need to correct the behavior of
> function NOW()/PROCTIME()/CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, to
> correct them, we can change the function return type or function return
> value or change return type and return value both. All of those way are
> valid because SQL:2011 does not specify the function return type and every
> SQL engine vendor has its own implementation. For example the
> CURRENT_TIMESTAMP function,
> 
>  FLINK  current behaviorexisted problem other vendors'
> behavior proposed change
>  CURRENT_TIMESTAMP  CURRENT_TIMESTAMP
>  TIMESTAMP(0) NOT NULL
> 
>  #session timezone: UTC
>  2020-12-28T23:52:52
> 
>  #session timezone: UTC+8
>  2020-12-28T23:52:52
> 
>  wall clock:
>  UTC+8:

[jira] [Created] (FLINK-21054) Implement mini-batch optimized slicing window aggregate operator

2021-01-20 Thread Jark Wu (Jira)

Jark Wu created FLINK-21054:
---

 Summary: Implement mini-batch optimized slicing window aggregate 
operator
 Key: FLINK-21054
 URL: https://issues.apache.org/jira/browse/FLINK-21054
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Runtime
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.13.0


We have supported cumulative windows in FLINK-19605. However, the current 
cumulative window is not efficient, because the slices are not shared. 

We leverages the slicing ideas proposed in FLINK-7001 and this design doc [1]. 
The slicing is an optimized implementation for hopping, cumulative, tumbling 
windows. Besides of that, we introduced ManagedMemory based mini-batch 
optimization for the slicing window aggregate operator, this can tremendously 
reduce the accessing of state and get the higher throughtput without latency 
loss.  

[1]: 
https://docs.google.com/document/d/1ziVsuW_HQnvJr_4a9yKwx_LEnhVkdlde2Z5l6sx5HlY/edit#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Re: [ANNOUNCE] Welcome Guowei Ma as a new Apache Flink Committer

2021-01-19 Thread Jark Wu

Congratulations Guowei!

Cheers,
Jark

On Wed, 20 Jan 2021 at 14:36, SHI Xiaogang  wrote:

> Congratulations MA!
>
> Regards,
> Xiaogang
>
> Yun Tang  于2021年1月20日周三 下午2:24写道：
>
> > Congratulations Guowei!
> >
> > Best
> > Yun Tang
> > 
> > From: Yang Wang 
> > Sent: Wednesday, January 20, 2021 13:59
> > To: dev 
> > Subject: Re: Re: [ANNOUNCE] Welcome Guowei Ma as a new Apache Flink
> > Committer
> >
> > Congratulations Guowei!
> >
> >
> > Best,
> > Yang
> >
> > Yun Gao  于2021年1月20日周三 下午1:52写道：
> >
> > > Congratulations Guowei!
> > >
> > > Best,
> > >  Yun--
> > > Sender:Yangze Guo
> > > Date:2021/01/20 13:48:52
> > > Recipient:dev
> > > Theme:Re: [ANNOUNCE] Welcome Guowei Ma as a new Apache Flink Committer
> > >
> > > Congratulations, Guowei! Well deserved.
> > >
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Wed, Jan 20, 2021 at 1:46 PM Xintong Song 
> > > wrote:
> > > >
> > > > Congratulations, Guowei~!
> > > >
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Wed, Jan 20, 2021 at 1:42 PM Yuan Mei 
> > wrote:
> > > >
> > > > > Congrats Guowei :-)
> > > > >
> > > > > Best,
> > > > > Yuan
> > > > >
> > > > > On Wed, Jan 20, 2021 at 1:36 PM tison 
> wrote:
> > > > >
> > > > > > Congrats Guowei!
> > > > > >
> > > > > > Best,
> > > > > > tison.
> > > > > >
> > > > > >
> > > > > > Kurt Young  于2021年1月20日周三 下午1:34写道：
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > I'm very happy to announce that Guowei Ma has accepted the
> > > invitation
> > > > > to
> > > > > > > become a Flink committer.
> > > > > > >
> > > > > > > Guowei is a very long term Flink developer, he has been
> extremely
> > > > > helpful
> > > > > > > with
> > > > > > > some important runtime changes, and also been  active with
> > > answering
> > > > > user
> > > > > > > questions as well as discussing designs.
> > > > > > >
> > > > > > > Please join me in congratulating Guowei for becoming a Flink
> > > committer!
> > > > > > >
> > > > > > > Best,
> > > > > > > Kurt
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>

[jira] [Created] (FLINK-21027) Add isKeyValueImmutable() method to KeyedStateBackend interface

2021-01-19 Thread Jark Wu (Jira)

Jark Wu created FLINK-21027:
---

 Summary: Add isKeyValueImmutable() method to KeyedStateBackend 
interface
 Key: FLINK-21027
 URL: https://issues.apache.org/jira/browse/FLINK-21027
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Reporter: Jark Wu


In Table/SQL operators, we have some optimizations that reuse objects of keys 
and records. For example, we buffer input records in {{BytesMultiMap}} and use 
the reused object to map to the underlying memory segment to reduce bytes copy. 

However, if we put the reused key and value into Heap statebackend, the result 
will be wrong, because it is not allowed to mutate keys and values in Heap 
statebackend. 

Therefore, it would be great if {{KeyedStateBackend}} can expose such API, so 
that Table/SQL can dynamically decide whether to copy the keys and values 
before putting into state. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [Vote] FLIP-157 Migrate Flink Documentation from Jekyll to Hugo

2021-01-18 Thread Jark Wu

+1

On Tue, 19 Jan 2021 at 01:59, Till Rohrmann  wrote:

> +1,
>
> Cheers,
> Till
>
> On Mon, Jan 18, 2021 at 4:12 PM Chesnay Schepler 
> wrote:
>
> > +1
> > On 1/18/2021 3:50 PM, Seth Wiesman wrote:
> > > Addendum, 72 hours from now is Thursday the 21st :)
> > >
> > > sorry for the mistake.
> > >
> > > Seth
> > >
> > > On Mon, Jan 18, 2021 at 8:41 AM Timo Walther 
> wrote:
> > >
> > >> +1
> > >>
> > >> Thanks for upgrading our docs infrastructure.
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >> On 18.01.21 15:29, Seth Wiesman wrote:
> > >>> Hi devs,
> > >>>
> > >>> The discussion of the FLIP-157 [1] seems has reached a consensus
> > through
> > >>> the mailing thread [2]. I would like to start a vote for it.
> > >>>
> > >>> The vote will be opened until 20th January (72h), unless there is an
> > >>> objection or no enough votes.
> > >>>
> > >>> Best,
> > >>> Seth
> > >>>
> > >>> [1]
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-157+Migrate+Flink+Documentation+from+Jekyll+to+Hugo
> > >>> [2]
> > >>>
> > >>
> >
> https://lists.apache.org/thread.html/r88152bf178381c5e3bc2d7b3554cea3d61cff9ac0edb713dc518d9c7%40%3Cdev.flink.apache.org%3E
> > >>
> >
> >
>

[jira] [Created] (FLINK-21005) Introduce new provider for unified Sink API and implement in planner

2021-01-17 Thread Jark Wu (Jira)

Jark Wu created FLINK-21005:
---

 Summary: Introduce new provider for unified Sink API and implement 
in planner
 Key: FLINK-21005
 URL: https://issues.apache.org/jira/browse/FLINK-21005
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Table SQL / Planner
Reporter: Jark Wu
 Fix For: 1.13.0


FLIP-143 [1] introduced the unified sink API, we should add a 
{{SinkRuntimeProvider}} for it and support it in planner. So that Table SQL 
users can also use the unified sink APIs. 


[1]: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-143%3A+Unified+Sink+API



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21002) Support exactly once sink for JDBC in Table SQL API

2021-01-17 Thread Jark Wu (Jira)

Jark Wu created FLINK-21002:
---

 Summary: Support exactly once sink for JDBC in Table SQL API
 Key: FLINK-21002
 URL: https://issues.apache.org/jira/browse/FLINK-21002
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / JDBC, Table SQL / Ecosystem
Reporter: Jark Wu


FLINK-15578 implements the exactly once JDBC sink based on XA transaction.
We should also expose this feature in Table API/SQL. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-157 Migrate Flink Documentation from Jekyll to Hugo

2021-01-17 Thread Jark Wu

I have tried in my local env, and the build time is really fast.
Looking forward Hugo can help to better cooperate with the translation
work.

+1 to start a vote.

best,
Jark

On Fri, 15 Jan 2021 at 23:02, Seth Wiesman  wrote:

> Great, if there aren't any other concerns I will open this up for a vote on
> Monday.
>
> Seth
>
> On Thu, Jan 14, 2021 at 9:03 AM Seth Wiesman  wrote:
>
> > Happy to see there is enthusiasm for this change, let me try and answers
> > each of these questions.
> >
> > @Jark Wu  Hugo has proper support for i18n which means
> > we can move content into external files that can be easily translated[1].
> > For reference, Kubernetes has successfully used Hugo's built-in features
> to
> > maintain 14 different languages[2]. Additionally, Hugo's md files are
> > standard markdown which could allow us to integrate with other tooling.
> For
> > example, we may look into using Crowdin for managing translations as the
> > pulsar community does.
> >
> > @Till Rohrmann  None that I have found. In the
> > proof of concept, I have already implemented all the Jekyll functionality
> > we are using in the docs[4]. I have found Hugo shortcodes to be a more
> > flexible alternative to liquid tags.
> >
> > @Chesnay Schepler  Not yet, I do not have access to
> > the build bot (it is PMC only). I will work with INFRA to get Hugo
> > installed if it is not already and Robert has agreed to set-up the build
> > script on the build bot itself.
> >
> > Seth
> >
> > [1] https://gohugo.io/functions/i18n/
> > [2] https://github.com/kubernetes/website/
> > [3] https://github.com/apache/pulsar-translation
> > [4]
> >
> https://github.com/sjwiesman/flink-docs-v2/tree/master/layouts/shortcodes
> >
> >
> >
> > On Thu, Jan 14, 2021 at 7:03 AM David Anderson 
> > wrote:
> >
> >> I've spent a few hours digging into this with Seth, and can report that
> >> this makes working on the docs much less of a chore.
> >>
> >> +1 (with enthusiasm)
> >>
> >> Best,
> >> David
> >>
> >> On Thu, Jan 14, 2021 at 1:34 PM Kostas Kloudas 
> >> wrote:
> >>
> >> > +1 for moving to Hugo.
> >> >
> >> > Cheers,
> >> > Kostas
> >> >
> >> > On Thu, Jan 14, 2021 at 1:27 PM Wei Zhong 
> >> wrote:
> >> > >
> >> > > +1 for migrating to Hugo.
> >> > >
> >> > > Currently we have developed many plugins based on Jekyll because the
> >> > native features of Jekyll cannot meet our needs. It seems all of them
> >> can
> >> > be supported via Hugo shortcodes and will become more concise.
> >> > >
> >> > > Best,
> >> > > Wei
> >> > >
> >> > > > 在 2021年1月14日，18:21，Aljoscha Krettek  写道：
> >> > > >
> >> > > > +1
> >> > > >
> >> > > > The build times on Jekyll have just become to annoying for me. I
> >> > realize that that is also a function of how we structure our
> >> documentation,
> >> > and especially how we construct the nav sidebar, but I think overall
> >> moving
> >> > to Hugo is still a benefit.
> >> > > >
> >> > > > Aljoscha
> >> > > >
> >> > > > On 2021/01/13 10:14, Seth Wiesman wrote:
> >> > > >> Hi All,
> >> > > >>
> >> > > >> I would like to start a discussion for FLIP-157: Migrating the
> >> Flink
> >> > docs
> >> > > >> from Jekyll to Hugo.
> >> > > >>
> >> > > >> This will allow us:
> >> > > >>
> >> > > >>  - Proper internationalization
> >> > > >>  - Working Search
> >> > > >>  - Sub-second build time ;)
> >> > > >>
> >> > > >> Please take a look and let me know what you think.
> >> > > >>
> >> > > >> Seth
> >> > > >>
> >> > > >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-157+Migrate+Flink+Documentation+from+Jekyll+to+Hugo
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] Planning Flink 1.13

2021-01-14 Thread Jark Wu

Thanks for starting the 1.13 release cycle and volunteering as the release
managers!

The feature freeze date sounds good to me.

Best,
Jark

On Thu, 14 Jan 2021 at 18:19, Khachatryan Roman 
wrote:

> Thanks for doing this!
>
> Some teams are currently doing (or already finalizing) the "usability
> sprint". I was wondering whether it makes sense to release the implemented
> features without waiting for the major changes. If it works well we could
> decide to shorten the next release as well.
>
> Regards,
> Roman
>
>
> On Thu, Jan 14, 2021 at 10:28 AM Till Rohrmann 
> wrote:
>
> > Thanks for volunteering as the release managers for the 1.13 release
> Guowei
> > and Dawid. I'd also be in favour of targeting the end of March as the
> > feature freeze date.
> >
> > I've created a 1.13 wiki page [1] where we can collect the features we
> want
> > to complete for the 1.13 release.
> >
> > [1] https://cwiki.apache.org/confluence/display/FLINK/1.13+Release
> >
> > Cheers,
> > Till
> >
> > On Thu, Jan 14, 2021 at 3:27 AM Xintong Song 
> > wrote:
> >
> > > Thanks for kicking off the 1.13 release cycle and volunteering as the
> > > release managers.
> > >
> > > +1 for Dawid & Guowei as the 1.13 release managers.
> > > +1 for targeting feature freeze at the end of March
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Wed, Jan 13, 2021 at 10:48 PM Dawid Wysakowicz <
> > dwysakow...@apache.org>
> > > wrote:
> > >
> > > > Hi all,
> > > > With the 1.12 being released some time ago already I thought it would
> > be
> > > > a good time to kickstart the 1.13 release cycle.
> > > >
> > > > What do you think about Guowei and me being the release managers for
> > > > Flink 1.13? We are happy to volunteer for it.
> > > >
> > > > The second topic I wanted to raise was the rough timeline for the
> > > > release. According to our usual 3 months + the release
> > > > testing/stabilising period
> > > > we should aim with the feature freeze for the end of March/beginning
> of
> > > > April. Does that work for everyone?
> > > >
> > > > Let me know what you think.
> > > >
> > > > Best,
> > > > Dawid
> > > >
> > > >
> > > >
> > >
> >
>

Re: [VOTE] Release 1.12.1, release candidate #2

2021-01-14 Thread Jark Wu

+1

- checked/verified signatures and hashes
- reviewed the release pull request
- started cluster for Scala 2.11, ran examples, verified web ui and log
output, nothing unexpected
- started cluster and SQL CLI, run some SQL queries for kafka,
upsert-kafka, elasticsearch, and mysql connectors, nothing unexpected

Best,
Jark

On Thu, 14 Jan 2021 at 12:11, Yang Wang  wrote:

> +1 (non-binding)
>
> * Check the checksum and signatures
> * Build from source successfully
> * Deploy Flink on K8s natively with HA mode and check the related fixes
>   * FLINK-20650, flink binary could work with the updated
> docker-entrypoint.sh in flink-docker
>   * FLINK-20664, support service account for TaskManager pod
>   * FLINK-20648, restore job from savepoint when using Kubernetes based HA
> services
> * Check the webUI and logs without abnormal information
>
> Best,
> Yang
>
> Xintong Song  于2021年1月14日周四 上午10:21写道：
>
> > +1 (non-binding)
> >
> > - verified checksums and signatures
> > - no binaries found in source archive
> > - build from source
> > - played with a couple of example jobs
> > - played with various deployment modes
> > - webui and logs look good
> >
> > On Thu, Jan 14, 2021 at 1:02 AM Chesnay Schepler 
> > wrote:
> >
> > > +1
> > >
> > > - no dependencies have been changed requiring license updates
> > > - nothing seems to be missing from the maven repository
> > > - verified checksums/signatures
> > >
> > > On 1/10/2021 2:23 AM, Xintong Song wrote:
> > > > Hi everyone,
> > > >
> > > > Please review and vote on the release candidate #2 for the version
> > > 1.12.1,
> > > > as follows:
> > > >
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > > The complete staging area is available for your review, which
> includes:
> > > > * JIRA release notes [1],
> > > > * the official Apache source release and binary convenience releases
> to
> > > be
> > > > deployed to dist.apache.org [2], which are signed with the key with
> > > > fingerprint F8E419AA0B60C28879E876859DFF40967ABFC5A4 [3],
> > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > > * source code tag "release-1.12.1-rc2" [5],
> > > > * website pull request listing the new release and adding
> announcement
> > > blog
> > > > post [6].
> > > >
> > > > The vote will be open for at least 72 hours. It is adopted by
> majority
> > > > approval, with at least 3 PMC affirmative votes.
> > > >
> > > > Thanks,
> > > > Xintong Song
> > > >
> > > > [1]
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12349459
> > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.12.1-rc2
> > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > [4]
> > > https://repository.apache.org/content/repositories/orgapacheflink-1411
> > > > [5] https://github.com/apache/flink/releases/tag/release-1.12.1-rc2
> > > > [6] https://github.com/apache/flink-web/pull/405
> > > >
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-157 Migrate Flink Documentation from Jekyll to Hugo

2021-01-13 Thread Jark Wu

The build time sounds impressive.

Could you explain more what strong internationalization features it
provides?

Best,
Jark


On Thu, 14 Jan 2021 at 01:01, Ufuk Celebi  wrote:

> +1 to do this. I really like what you have build and the advantages to
> Jekyll seem overwhelming to me. Hugo is very flexible and I've seen a few
> other projects use it successfully for docs.
>
> On Wed, Jan 13, 2021, at 5:14 PM, Seth Wiesman wrote:
> > Hi All,
> >
> > I would like to start a discussion for FLIP-157: Migrating the Flink docs
> > from Jekyll to Hugo.
> >
> > This will allow us:
> >
> >- Proper internationalization
> >- Working Search
> >- Sub-second build time ;)
> >
> > Please take a look and let me know what you think.
> >
> > Seth
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-157+Migrate+Flink+Documentation+from+Jekyll+to+Hugo
> >
>

[jira] [Created] (FLINK-20952) Changelog json formats should support inherit options from JSON format

2021-01-12 Thread Jark Wu (Jira)

Jark Wu created FLINK-20952:
---

 Summary: Changelog json formats should support inherit options 
from JSON format
 Key: FLINK-20952
 URL: https://issues.apache.org/jira/browse/FLINK-20952
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
SQL / Ecosystem
Reporter: Jark Wu
 Fix For: 1.13.0


Recently, we introduced several config options for json format, e.g. 
FLINK-20861. It reveals a potential problem that adding a small config option 
into json may need touch debezium-json, canal-json, maxwell-json formats. This 
is verbose and error-prone. We need an abstract machanism support reuable 
options. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Welcome Danny Cranmer as a new Apache Flink Committer

2021-01-12 Thread Jark Wu

Congratulations Danny!

Best,
Jark

On Tue, 12 Jan 2021 at 17:59, Yangze Guo  wrote:

> Congrats, Danny!
>
> Best,
> Yangze Guo
>
> On Tue, Jan 12, 2021 at 5:55 PM Xingbo Huang  wrote:
> >
> > Congratulations, Danny!
> >
> > Best,
> > Xingbo
> >
> > Dian Fu  于2021年1月12日周二 下午5:48写道：
> >
> > > Congratulations, Danny!
> > >
> > > Regards,
> > > Dian
> > >
> > > > 在 2021年1月12日，下午5:40，Till Rohrmann  写道：
> > > >
> > > > Congrats and welcome Danny!
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Jan 12, 2021 at 10:09 AM Dawid Wysakowicz <
> > > dwysakow...@apache.org>
> > > > wrote:
> > > >
> > > >> Congratulations, Danny!
> > > >>
> > > >> Best,
> > > >>
> > > >> Dawid
> > > >>
> > > >> On 12/01/2021 09:52, Paul Lam wrote:
> > > >>> Congrats, Danny!
> > > >>>
> > > >>> Best,
> > > >>> Paul Lam
> > > >>>
> > >  2021年1月12日 16:48，Tzu-Li (Gordon) Tai  写道：
> > > 
> > >  Hi everyone,
> > > 
> > >  I'm very happy to announce that the Flink PMC has accepted Danny
> > > >> Cranmer to
> > >  become a committer of the project.
> > > 
> > >  Danny has been extremely helpful on the mailing lists with
> answering
> > > >> user
> > >  questions on the AWS Kinesis connector, and has driven numerous
> new
> > >  features and timely bug fixes for the connector as well.
> > > 
> > >  Please join me in welcoming Danny :)
> > > 
> > >  Cheers,
> > >  Gordon
> > > >>>
> > > >>
> > > >>
> > >
> > >
>

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 1589 matches

Mail list logo