[jira] [Created] (FLINK-14526) Support Hive version 1.1.0 and 1.1.1

2019-10-24 Thread Rui Li (Jira)
Rui Li created FLINK-14526:
--

 Summary: Support Hive version 1.1.0 and 1.1.1
 Key: FLINK-14526
 URL: https://issues.apache.org/jira/browse/FLINK-14526
 Project: Flink
  Issue Type: Task
  Components: Connectors / Hive
Reporter: Rui Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14525) buffer pool is destroyed

2019-10-24 Thread Saqib (Jira)
Saqib created FLINK-14525:
-

 Summary: buffer pool is destroyed
 Key: FLINK-14525
 URL: https://issues.apache.org/jira/browse/FLINK-14525
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.7.2
Reporter: Saqib


Have a flink app running in standalone mode. The app runs ok in our non-prod 
env. However on our prod server it throws this exception:

Buffer pool is destroyed. 

 

This error is being thrown as a RuntimeException on the collect call, on the 
flatmap function. The flatmap is just collecting a Tuple, the 
Document is a XML Document object.

 

As mentioned the non prod env  (and we have multiple, DEV,QA,UAT) this is not 
happening. The UAT box is spec-ed exactly as our Prod host with 4CPU. The java 
version is the same too.

 

Not sure how to proceed.

 

Thanks

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-78: Flink Python UDF Environment and Dependency Management

2019-10-24 Thread Wei Zhong
Hi Max,

Is there any other concerns from your side? I appreciate if you can give some 
feedback and vote on this.

Best,
Wei

> 在 2019年10月25日,09:33,jincheng sun  写道:
> 
> Hi Thomas,
> 
> Thanks for your explanation. I understand your original intention. I will
> seriously consider this issue. After I have the initial solution, I will
> bring up a further discussion in Beam ML.
> 
> Thanks for your voting. :)
> 
> Best,
> Jincheng
> 
> 
> Thomas Weise  于2019年10月25日周五 上午7:32写道:
> 
>> Hi Jincheng,
>> 
>> Yes, this topic can be further discussed on the Beam ML. The only reason I
>> brought it up here is that it would be desirable from Beam Flink runner
>> perspective for the artifact staging mechanism that you work on to be
>> reusable.
>> 
>> Stage 1 in Beam is also up to the runner, artifact staging is a service
>> discovered from the job server and that the Flink job server currently uses
>> DFS is not set in stone. My interest was more regarding assumptions
>> regarding the artifact structure, which may or may not allow for reusable
>> implementation.
>> 
>> +1 for the proposal otherwise
>> 
>> Thomas
>> 
>> 
>> On Mon, Oct 21, 2019 at 8:40 PM jincheng sun 
>> wrote:
>> 
>>> Hi Thomas,
>>> 
>>> Thanks for sharing your thoughts. I think improve and solve the
>> limitations
>>> of the Beam artifact staging is good topic(For beam).
>>> 
>>> As I understand it as follows:
>>> 
>>> For Beam(data):
>>>Stage1: BeamClient --> JobService (data will be upload to DFS).
>>>Stage2: JobService(FlinkClient) -->  FlinkJob (operator download
>>> the data from DFS)
>>>Stage3: Operator --> Harness(artifact staging service)
>>> 
>>> For Flink(data):
>>>Stage1: FlinkClient(data(local) upload to BlobServer using distribute
>>> cache) --> Operator (data will be download from BlobServer). Do not
>>> have to depend on DFS.
>>>Stage2: Operator --> Harness(for docker we using artifact staging
>>> service)
>>> 
>>> So, I think Beam have to depend on DFS in Stage1. and Stage2 can using
>>> distribute cache if we remove the dependency of DFS for Beam in
>> Stage1.(Of
>>> course we need more detail here),  we can bring up the discussion in a
>>> separate Beam dev@ ML, the current discussion focuses on Flink 1.10
>>> version
>>> of  UDF Environment and Dependency Management for python, so I recommend
>>> voting in the current ML for Flink 1.10, Beam artifact staging
>> improvements
>>> are discussed in a separate Beam dev@.
>>> 
>>> What do you think?
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> Thomas Weise  于2019年10月21日周一 下午10:25写道:
>>> 
 Beam artifact staging currently relies on shared file system and there
>>> are
 limitations, for example when running locally with Docker and local FS.
>>> It
 sounds like a distributed cache based implementation might be a good
 (better?) option for artifact staging even for the Beam Flink runner?
 
 If so, can the implementation you propose be compatible with the Beam
 artifact staging service so that it can be plugged into the Beam Flink
 runner?
 
 Thanks,
 Thomas
 
 
 On Mon, Oct 21, 2019 at 2:34 AM jincheng sun >> 
 wrote:
 
> Hi Max,
> 
> Sorry for the late reply. Regarding the issue you mentioned above,
>> I'm
 glad
> to share my thoughts:
> 
>> For process-based execution we use Flink's cache distribution
>> instead
 of
> Beam's artifact staging.
> 
> In current design, we use Flink's cache distribution to upload users'
 files
> from client to cluster in both docker mode and process mode. That is,
> Flink's cache distribution and Beam's artifact staging service work
> together in docker mode.
> 
> 
>> Do we want to implement two different ways of staging artifacts? It
 seems
> sensible to use the same artifact staging functionality also for the
> process-based execution.
> 
> I agree that the implementation will be simple if we use the same
 artifact
> staging functionality also for the process-based execution. However,
>>> it's
> not the best for performance as it will introduce an additional
>> network
> transmission, as in process mode TaskManager and python worker share
>>> the
> same environment, in which case the user files in Flink Distribute
>>> Cache
> can be accessed by python worker directly. We do not need the staging
> service in this case.
> 
>> Apart from being simpler, this would also allow the process-based
> execution to run in other environments than the Flink TaskManager
> environment.
> 
> IMHO, this case is more like docker mode, and we can share or reuse
>> the
> code of Beam docker mode. Furthermore, in this case python worker is
> launched by the operator, so it is always in the same environment as
>>> the
> operator.
> 
> Thanks again for your feedback, and it is valuable for find out the
>>> final
> best 

[jira] [Created] (FLINK-14524) PostgreSQL JDBC sink generates invalid SQL in upsert mode

2019-10-24 Thread Fawad Halim (Jira)
Fawad Halim created FLINK-14524:
---

 Summary: PostgreSQL JDBC sink generates invalid SQL in upsert mode
 Key: FLINK-14524
 URL: https://issues.apache.org/jira/browse/FLINK-14524
 Project: Flink
  Issue Type: Bug
  Components: Connectors / JDBC
Affects Versions: 1.9.1, 1.10.0
Reporter: Fawad Halim


The "upsert" query generated for the PostgreSQL dialect is missing a closing 
parenthesis in the ON CONFLICT clause, causing the INSERT statement to error 
out with the error

 

{{ERROR o.a.f.s.runtime.tasks.StreamTask - Error during disposal of stream 
operator.}}
{{java.lang.RuntimeException: Writing records to JDBC failed.}}
{{ at 
org.apache.flink.api.java.io.jdbc.JDBCUpsertOutputFormat.checkFlushException(JDBCUpsertOutputFormat.java:135)}}
{{ at 
org.apache.flink.api.java.io.jdbc.JDBCUpsertOutputFormat.close(JDBCUpsertOutputFormat.java:184)}}
{{ at 
org.apache.flink.api.java.io.jdbc.JDBCUpsertSinkFunction.close(JDBCUpsertSinkFunction.java:61)}}
{{ at 
org.apache.flink.api.common.functions.util.FunctionUtils.closeFunction(FunctionUtils.java:43)}}
{{ at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:117)}}
{{ at 
org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:585)}}
{{ at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:484)}}
{{ at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)}}
{{ at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)}}
{{ at java.lang.Thread.run(Thread.java:748)}}
{{Caused by: java.sql.BatchUpdateException: Batch entry 0 INSERT INTO 
"public.temperature"("id", "timestamp", "temperature") VALUES ('sensor_17', 
'2019-10-25 00:39:10-05', 20.27573964210997) ON CONFLICT ("id", "timestamp" DO 
UPDATE SET "id"=EXCLUDED."id", "timestamp"=EXCLUDED."timestamp", 
"temperature"=EXCLUDED."temperature" was aborted: ERROR: syntax error at or 
near "DO"}}
{{ Position: 119 Call getNextException to see other errors in the batch.}}
{{ at 
org.postgresql.jdbc.BatchResultHandler.handleCompletion(BatchResultHandler.java:163)}}
{{ at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:838)}}
{{ at 
org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1546)}}
{{ at 
org.apache.flink.api.java.io.jdbc.writer.UpsertWriter$UpsertWriterUsingUpsertStatement.internalExecuteBatch(UpsertWriter.java:177)}}
{{ at 
org.apache.flink.api.java.io.jdbc.writer.UpsertWriter.executeBatch(UpsertWriter.java:117)}}
{{ at 
org.apache.flink.api.java.io.jdbc.JDBCUpsertOutputFormat.flush(JDBCUpsertOutputFormat.java:159)}}
{{ at 
org.apache.flink.api.java.io.jdbc.JDBCUpsertOutputFormat.lambda$open$0(JDBCUpsertOutputFormat.java:124)}}
{{ at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
{{ at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)}}
{{ at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)}}
{{ at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)}}
{{ at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
{{ at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
{{ ... 1 common frames omitted}}
{{Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or near 
"DO"}}
{{ Position: 119}}
{{ at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2497)}}
{{ at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2233)}}
{{ at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:310)}}
{{ at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:834)}}
{{ ... 12 common frames omitted}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-78: Flink Python UDF Environment and Dependency Management

2019-10-24 Thread jincheng sun
Hi Thomas,

Thanks for your explanation. I understand your original intention. I will
seriously consider this issue. After I have the initial solution, I will
bring up a further discussion in Beam ML.

Thanks for your voting. :)

Best,
Jincheng


Thomas Weise  于2019年10月25日周五 上午7:32写道:

> Hi Jincheng,
>
> Yes, this topic can be further discussed on the Beam ML. The only reason I
> brought it up here is that it would be desirable from Beam Flink runner
> perspective for the artifact staging mechanism that you work on to be
> reusable.
>
> Stage 1 in Beam is also up to the runner, artifact staging is a service
> discovered from the job server and that the Flink job server currently uses
> DFS is not set in stone. My interest was more regarding assumptions
> regarding the artifact structure, which may or may not allow for reusable
> implementation.
>
> +1 for the proposal otherwise
>
> Thomas
>
>
> On Mon, Oct 21, 2019 at 8:40 PM jincheng sun 
> wrote:
>
> > Hi Thomas,
> >
> > Thanks for sharing your thoughts. I think improve and solve the
> limitations
> > of the Beam artifact staging is good topic(For beam).
> >
> > As I understand it as follows:
> >
> > For Beam(data):
> > Stage1: BeamClient --> JobService (data will be upload to DFS).
> > Stage2: JobService(FlinkClient) -->  FlinkJob (operator download
> > the data from DFS)
> > Stage3: Operator --> Harness(artifact staging service)
> >
> > For Flink(data):
> > Stage1: FlinkClient(data(local) upload to BlobServer using distribute
> > cache) --> Operator (data will be download from BlobServer). Do not
> > have to depend on DFS.
> > Stage2: Operator --> Harness(for docker we using artifact staging
> > service)
> >
> > So, I think Beam have to depend on DFS in Stage1. and Stage2 can using
> > distribute cache if we remove the dependency of DFS for Beam in
> Stage1.(Of
> > course we need more detail here),  we can bring up the discussion in a
> > separate Beam dev@ ML, the current discussion focuses on Flink 1.10
> > version
> > of  UDF Environment and Dependency Management for python, so I recommend
> > voting in the current ML for Flink 1.10, Beam artifact staging
> improvements
> > are discussed in a separate Beam dev@.
> >
> > What do you think?
> >
> > Best,
> > Jincheng
> >
> > Thomas Weise  于2019年10月21日周一 下午10:25写道:
> >
> > > Beam artifact staging currently relies on shared file system and there
> > are
> > > limitations, for example when running locally with Docker and local FS.
> > It
> > > sounds like a distributed cache based implementation might be a good
> > > (better?) option for artifact staging even for the Beam Flink runner?
> > >
> > > If so, can the implementation you propose be compatible with the Beam
> > > artifact staging service so that it can be plugged into the Beam Flink
> > > runner?
> > >
> > > Thanks,
> > > Thomas
> > >
> > >
> > > On Mon, Oct 21, 2019 at 2:34 AM jincheng sun  >
> > > wrote:
> > >
> > > > Hi Max,
> > > >
> > > > Sorry for the late reply. Regarding the issue you mentioned above,
> I'm
> > > glad
> > > > to share my thoughts:
> > > >
> > > > > For process-based execution we use Flink's cache distribution
> instead
> > > of
> > > > Beam's artifact staging.
> > > >
> > > > In current design, we use Flink's cache distribution to upload users'
> > > files
> > > > from client to cluster in both docker mode and process mode. That is,
> > > > Flink's cache distribution and Beam's artifact staging service work
> > > > together in docker mode.
> > > >
> > > >
> > > > > Do we want to implement two different ways of staging artifacts? It
> > > seems
> > > > sensible to use the same artifact staging functionality also for the
> > > > process-based execution.
> > > >
> > > > I agree that the implementation will be simple if we use the same
> > > artifact
> > > > staging functionality also for the process-based execution. However,
> > it's
> > > > not the best for performance as it will introduce an additional
> network
> > > > transmission, as in process mode TaskManager and python worker share
> > the
> > > > same environment, in which case the user files in Flink Distribute
> > Cache
> > > > can be accessed by python worker directly. We do not need the staging
> > > > service in this case.
> > > >
> > > > > Apart from being simpler, this would also allow the process-based
> > > > execution to run in other environments than the Flink TaskManager
> > > > environment.
> > > >
> > > > IMHO, this case is more like docker mode, and we can share or reuse
> the
> > > > code of Beam docker mode. Furthermore, in this case python worker is
> > > > launched by the operator, so it is always in the same environment as
> > the
> > > > operator.
> > > >
> > > > Thanks again for your feedback, and it is valuable for find out the
> > final
> > > > best architecture.
> > > >
> > > > Feel free to correct me if there is anything incorrect.
> > > >
> > > > Best,
> > > > Jincheng
> > > >
> > > > Maximilian 

Re: [VOTE] FLIP-78: Flink Python UDF Environment and Dependency Management

2019-10-24 Thread Thomas Weise
Hi Jincheng,

Yes, this topic can be further discussed on the Beam ML. The only reason I
brought it up here is that it would be desirable from Beam Flink runner
perspective for the artifact staging mechanism that you work on to be
reusable.

Stage 1 in Beam is also up to the runner, artifact staging is a service
discovered from the job server and that the Flink job server currently uses
DFS is not set in stone. My interest was more regarding assumptions
regarding the artifact structure, which may or may not allow for reusable
implementation.

+1 for the proposal otherwise

Thomas


On Mon, Oct 21, 2019 at 8:40 PM jincheng sun 
wrote:

> Hi Thomas,
>
> Thanks for sharing your thoughts. I think improve and solve the limitations
> of the Beam artifact staging is good topic(For beam).
>
> As I understand it as follows:
>
> For Beam(data):
> Stage1: BeamClient --> JobService (data will be upload to DFS).
> Stage2: JobService(FlinkClient) -->  FlinkJob (operator download
> the data from DFS)
> Stage3: Operator --> Harness(artifact staging service)
>
> For Flink(data):
> Stage1: FlinkClient(data(local) upload to BlobServer using distribute
> cache) --> Operator (data will be download from BlobServer). Do not
> have to depend on DFS.
> Stage2: Operator --> Harness(for docker we using artifact staging
> service)
>
> So, I think Beam have to depend on DFS in Stage1. and Stage2 can using
> distribute cache if we remove the dependency of DFS for Beam in Stage1.(Of
> course we need more detail here),  we can bring up the discussion in a
> separate Beam dev@ ML, the current discussion focuses on Flink 1.10
> version
> of  UDF Environment and Dependency Management for python, so I recommend
> voting in the current ML for Flink 1.10, Beam artifact staging improvements
> are discussed in a separate Beam dev@.
>
> What do you think?
>
> Best,
> Jincheng
>
> Thomas Weise  于2019年10月21日周一 下午10:25写道:
>
> > Beam artifact staging currently relies on shared file system and there
> are
> > limitations, for example when running locally with Docker and local FS.
> It
> > sounds like a distributed cache based implementation might be a good
> > (better?) option for artifact staging even for the Beam Flink runner?
> >
> > If so, can the implementation you propose be compatible with the Beam
> > artifact staging service so that it can be plugged into the Beam Flink
> > runner?
> >
> > Thanks,
> > Thomas
> >
> >
> > On Mon, Oct 21, 2019 at 2:34 AM jincheng sun 
> > wrote:
> >
> > > Hi Max,
> > >
> > > Sorry for the late reply. Regarding the issue you mentioned above, I'm
> > glad
> > > to share my thoughts:
> > >
> > > > For process-based execution we use Flink's cache distribution instead
> > of
> > > Beam's artifact staging.
> > >
> > > In current design, we use Flink's cache distribution to upload users'
> > files
> > > from client to cluster in both docker mode and process mode. That is,
> > > Flink's cache distribution and Beam's artifact staging service work
> > > together in docker mode.
> > >
> > >
> > > > Do we want to implement two different ways of staging artifacts? It
> > seems
> > > sensible to use the same artifact staging functionality also for the
> > > process-based execution.
> > >
> > > I agree that the implementation will be simple if we use the same
> > artifact
> > > staging functionality also for the process-based execution. However,
> it's
> > > not the best for performance as it will introduce an additional network
> > > transmission, as in process mode TaskManager and python worker share
> the
> > > same environment, in which case the user files in Flink Distribute
> Cache
> > > can be accessed by python worker directly. We do not need the staging
> > > service in this case.
> > >
> > > > Apart from being simpler, this would also allow the process-based
> > > execution to run in other environments than the Flink TaskManager
> > > environment.
> > >
> > > IMHO, this case is more like docker mode, and we can share or reuse the
> > > code of Beam docker mode. Furthermore, in this case python worker is
> > > launched by the operator, so it is always in the same environment as
> the
> > > operator.
> > >
> > > Thanks again for your feedback, and it is valuable for find out the
> final
> > > best architecture.
> > >
> > > Feel free to correct me if there is anything incorrect.
> > >
> > > Best,
> > > Jincheng
> > >
> > > Maximilian Michels  于2019年10月16日周三 下午4:23写道:
> > >
> > > > I'm also late to the party here :) When I saw the first draft, I was
> > > > thinking how exactly the design doc would tie in with Beam. Thanks
> for
> > > > the update.
> > > >
> > > > A couple of comments with this regard:
> > > >
> > > > > Flink has provided a distributed cache mechanism and allows users
> to
> > > > upload their files using "registerCachedFile" method in
> > > > ExecutionEnvironment/StreamExecutionEnvironment. The python files
> users
> > > > specified through "add_python_file", 

[jira] [Created] (FLINK-14523) Flink Stops Consuming from Kafka after Leader Election

2019-10-24 Thread James B. Fitzgerald (Jira)
James B. Fitzgerald created FLINK-14523:
---

 Summary: Flink Stops Consuming from Kafka after Leader Election
 Key: FLINK-14523
 URL: https://issues.apache.org/jira/browse/FLINK-14523
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
 Environment: In AWS we run the app on EMR with the following versions:


 * EMR Release - emr-5.15.0
 * Hadoop distribution - Amazon 2.8.3
 * Flink - Flink 1.4.2

We submit the job to the cluster as an EMR step using ***command-runner.jar*. 
We submit the job with the following arguments:

{code:java}
"Args": [
  "flink", "run", "-m", "yarn-cluster", 
  "-c", "com.salesforce.sde.streamingsearches.StreamingSearchesJob", 
  "-yst", "-ys", "4", "-yn", "10", "-yjm", "2800", "-ytm", "2800",
  "-ynm", "streaming-searches-prod",
  "-d", "/home/hadoop/streaming-searches-1.0-SNAPSHOT.jar"
]{code}
Additionally we build our application jar with Flink 1.4.2 and Kafka 0.11.

 
Reporter: James B. Fitzgerald


We have a Flink application running in AWS on EMR that streams input from a 
single Kafka topic. Whenever there is a Kafka leader election for any partition 
of the input topic, our Flink application stops consuming from Kafka entirely. 
To begin consuming from Kafka again the YARN app must be killed and restarted. 
We run this same application on premises and in AWS on EMR. We have only 
observed this behavior when it is running on EMR. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Introduce a location-oriented two-stage query mechanism toimprove the queryable state.

2019-10-24 Thread bupt_ljy
Hi vino,
+1 for improvement on queryable state feature. This reminds me of the 
state-processing-api module, which is very helpful when we analyze state in 
offline. However currently we don’t have many ways to know what is happening 
about the state inside a running application, which makes me feel that this has 
a good potential. Since these two modules are seperate but doing the similar 
work(anaylyzing state), maybe we have to think more about their orientation, or 
maybe integrate them in a graceful way in the future.
Anyway, this is a great work and it’d be better if we can hear more thoughts 
and use cases.


Best Regards,
Jiayi Liao


 Original Message 
Sender: vino yang
Recipient: dev@flink.apache.org
Date: Tuesday, Oct 22, 2019 15:42
Subject: [DISCUSS] Introduce a location-oriented two-stage query mechanism 
toimprove the queryable state.


Hi guys,


Currently, queryable state's client is hard to use. Because it requires users 
to know the address of TaskManager and the port of the proxy. Actually, most 
users who do not have good knowledge about the Flink's inner and runtime in 
production. The queryable state clients directly interact with query state 
client proxies which host on each TaskExecutor. This design requires users to 
know too much detail.
 
We introduce a location service component to improve the architecture of the 
queryable state and hide the details of the task executors. We first give a 
brief introduction to our design in Section 2 and then detail the 
implementation in Section 3. At last, we describe some future work that can be 
done.








I have given an initialized implementation in my Flink repository[2]. One thing 
that needs to be stated is that we have not changed the existing solution, so 
it still works according to the previous modes.



The design documentation is here[3].


Any suggestion and feedback are welcome and appriciated.


[1]: https://statefun.io/
[2]: https://github.com/yanghua/flink/tree/improve-queryable-state-master
[3]: 
https://docs.google.com/document/d/181qYVIiHQGrc3hCj3QBn1iEHF4bUztdw4XO8VSaf_uI/edit?usp=sharing


Best,
Vino

Intermittent No FileSystem found exception

2019-10-24 Thread Maulik Soneji
Hi everyone,

We are running a Batch job on flink that reads data from GCS and does some
aggregation on this data.
We are intermittently getting issue: `No filesystem found for scheme gs`

We are running Beam version 2.15.0 with FlinkRunner, Flink version: 1.6.4

On remote debugging the task managers, we found that in a few task
managers, the *GcsFileSystemRegistrar is not added to the list of
FileSystem Schemes*. In these task managers, we get this issue.

The collection *SCHEME_TO_FILESYSTEM* is getting modified only in
*setDefaultPipelineOptions* function call in
org.apache.beam.sdk.io.FileSystems class and this function is not getting
called and thus the GcsFileSystemRegistrar is not added to
*SCHEME_TO_FILESYSTEM*.

*Detailed stacktrace:*


java.lang.IllegalArgumentException: No filesystem found for scheme gs
at 
org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:463)
at 
org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:533)
at 
org.apache.beam.sdk.io.fs.ResourceIdCoder.decode(ResourceIdCoder.java:49)
at 
org.apache.beam.sdk.io.fs.MetadataCoder.decodeBuilder(MetadataCoder.java:62)
at org.apache.beam.sdk.io.fs.MetadataCoder.decode(MetadataCoder.java:58)
at org.apache.beam.sdk.io.fs.MetadataCoder.decode(MetadataCoder.java:36)
at org.apache.beam.sdk.coders.Coder.decode(Coder.java:159)
at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:82)
at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:36)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:592)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:583)
at 
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.decode(WindowedValue.java:529)
at 
org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.deserialize(CoderTypeSerializer.java:92)
at 
org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
at 
org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
at 
org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:72)
at 
org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
at 
org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:94)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
at 
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)

Inorder to resolve this issue, we tried calling the following in
PTransform's expand function:

FileSystems.setDefaultPipelineOptions(PipelineOptionsFactory.create());

This function call is to make sure that the GcsFileSystemRegistrar is added
to the list, but this hasn't solved the issue.

Can someone please help in checking why this might be happening and what
can be done to resolve this issue.

Thanks and Regards,
Maulik


[jira] [Created] (FLINK-14522) Adjust GC Cleaner for unsafe memory and Java 11

2019-10-24 Thread Andrey Zagrebin (Jira)
Andrey Zagrebin created FLINK-14522:
---

 Summary: Adjust GC Cleaner for unsafe memory and Java 11 
 Key: FLINK-14522
 URL: https://issues.apache.org/jira/browse/FLINK-14522
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: Andrey Zagrebin
Assignee: Andrey Zagrebin
 Fix For: 1.10.0


sun.misc.Cleaner is not available in Java 11.
It was moved to jdk.internal.ref.Cleaner of java.base module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14521) CoLocationGroup is not set into JobVertex if the stream node chained with others

2019-10-24 Thread Yun Gao (Jira)
Yun Gao created FLINK-14521:
---

 Summary: CoLocationGroup is not set into JobVertex if the stream 
node chained with others
 Key: FLINK-14521
 URL: https://issues.apache.org/jira/browse/FLINK-14521
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.9.1, 1.9.0
Reporter: Yun Gao


StreamingJobGraphGenerator.isChainable dose not consider the coLocationGroup, 
if A -> B is chained, the coLocationGroup of the corresponding JobVertex will 
be set with that of the head node, namely A. Therefore, if B has declared 
coLocationGroup but A does not, then the coLocationGroup of B will be ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-70 - Support Computed Column for Flink SQL

2019-10-24 Thread Jark Wu
If we move "STORED" to future work section, it is still unclear that
whether we should have an consensu on the design of "STORED"?

1) no consensus needed, we just put the design effort in the future part to
continue the work if we want to support it in the future. So the vote on
this FLIP dosen't contain the future work part.
2) consensus needed, we target the MVP part in 1.10 release, and postpone
the "STORED" implementation work to the future release. So the vote on this
FLIP contains the future work part.

IMO, "STORED" keyword makes the design much complex, and I didn't see a
requirement for this yet.
In order to have a consensus on this FLIP soon, I would lean to #1.

What do you think?

Best,
Jark


On Thu, 24 Oct 2019 at 17:25, Jark Wu  wrote:

> Yes. I think it makes sense to move to "Future Work" section.
>
> Best,
> Jark
>
> On Thu, 24 Oct 2019 at 17:11, Kurt Young  wrote:
>
>> +1 to move to a future section. By deleting it I mean remove from
>> the content describing the current processing procedure.
>>
>> Best,
>> Kurt
>>
>>
>> On Thu, Oct 24, 2019 at 5:01 PM Timo Walther  wrote:
>>
>> > Having an MVP and a limited scope sounds good to me. But I would not
>> > remove the STORED keyword entirely from the document.
>> >
>> > It shows that we have a long-term vision. Instead of deleting this
>> > content, I would move it to a Outlook/Future Work section.
>> >
>> > Regards,
>> > Timo
>> >
>> >
>> > On 24.10.19 10:55, Jark Wu wrote:
>> > > +1 to remove “STORED” related content. We can add them when user
>> > requires.
>> > > Others looks good to me in general.
>> > >
>> > > Thanks,
>> > > Jark
>> > >
>> > >
>> > >> 在 2019年10月24日,14:58,Kurt Young  写道:
>> > >>
>> > >> Hi Danny,
>> > >>
>> > >> Thanks for preparing this design document. IMO It's a very useful
>> > >> feature, especially combined with time attribute support to specify
>> > >> watermark in DDL.
>> > >>
>> > >> The design doc looks quite good, but I would suggest to reduce the
>> > >> scope of the first version. Like we don't have to support "STORED"
>> > >> in the first MVP version, and you can also delete related content in
>> > >> document to make it more clean and easier to understand.
>> > >>
>> > >> Best,
>> > >> Kurt
>> > >>
>> > >>
>> > >> On Tue, Sep 17, 2019 at 9:18 PM Qi Luo  wrote:
>> > >>
>> > >>> Fantastic! We're also very interested in this feature.
>> > >>>
>> > >>> +Boxiu
>> > >>>
>> > >>> On Tue, Sep 17, 2019 at 11:31 AM Danny Chan 
>> > wrote:
>> > >>>
>> >  In umbrella task FLINK-10232 we have introduced CREATE TABLE
>> grammar
>> > in
>> >  our new module flink-sql-parser. And we proposed to use computed
>> > column
>> > >>> to
>> >  describe the time attribute of process time in the design doc FLINK
>> > SQL
>> >  DDL, so user may create a table with process time attribute as
>> > follows:
>> >  create table T1(
>> >    a int,
>> >    b bigint,
>> >    c varchar,
>> >    d as PROCTIME,
>> >  ) with (
>> >    'k1' = 'v1',
>> >    'k2' = 'v2'
>> >  );
>> > 
>> >  The column d would be a process time attribute for table T1.
>> > 
>> >  Besides that, computed  columns have several other use cases, such
>> as
>> >  these [2]:
>> > 
>> > 
>> >  • Virtual generated columns can be used as a way to simplify and
>> unify
>> >  queries. A complicated condition can be defined as a generated
>> column
>> > and
>> >  referred to from multiple queries on the table to ensure that all
>> of
>> > them
>> >  use exactly the same condition.
>> >  • Stored generated columns can be used as a materialized cache for
>> >  complicated conditions that are costly to calculate on the fly.
>> >  • Generated columns can simulate functional indexes: Use a
>> generated
>> >  column to define a functional expression and index it. This can be
>> > useful
>> >  for working with columns of types that cannot be indexed directly,
>> > such
>> > >>> as
>> >  JSON columns.
>> >  • For stored generated columns, the disadvantage of this approach
>> is
>> > that
>> >  values are stored twice; once as the value of the generated column
>> and
>> > >>> once
>> >  in the index.
>> >  • If a generated column is indexed, the optimizer recognizes query
>> >  expressions that match the column definition and uses indexes from
>> the
>> >  column as appropriate during query execution(Not supported yet).
>> > 
>> > 
>> > 
>> >  Computed columns are introduced in SQL-SERVER-2016 [1], MYSQL-5.6
>> [2]
>> > and
>> >  ORACLE-11g [3].
>> > 
>> >  This is the design doc:
>> > 
>> > 
>> > >>>
>> >
>> https://docs.google.com/document/d/110TseRtTCphxETPY7uhiHpu-dph3NEesh3mYKtJ7QOY/edit?usp=sharing
>> > 
>> >  Any suggestions are appreciated, thanks.
>> > 
>> >  [1]
>> > 
>> > >>>
>> >
>> 

[jira] [Created] (FLINK-14520) Could not find a suitable table factory, but Factory is available

2019-10-24 Thread Miguel Serrano (Jira)
Miguel Serrano created FLINK-14520:
--

 Summary: Could not find a suitable table factory, but Factory is 
available
 Key: FLINK-14520
 URL: https://issues.apache.org/jira/browse/FLINK-14520
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 1.9.1, 1.9.0
 Environment: MacOS 10.14.5 and Ubuntu 16.10
Reporter: Miguel Serrano
 Attachments: example.zip

*Description*

Flink can't find JSON table factory. {color:#24292e}JsonRowFormatFactory{color} 
is considered but won't match properties.

 

gist with code and error: 
[https://gist.github.com/mserranom/4b2e0088b6000b892c38bd7f93d4fe73]

Attached is a zip file for reproduction.

 

*Error message excerpt* 
{code:java}
rg.apache.flink.table.api.TableException: findAndCreateTableSink failed.
at 
org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSink(TableFactoryUtil.java:87)
at 
org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSink(TableFactoryUtil.java:77)

...

Caused by: org.apache.flink.table.api.NoMatchingTableFactoryException: Could 
not find a suitable table factory for 
'org.apache.flink.table.factories.TableSinkFactory' in
the classpath.

...

The following properties are requested:

connector.path=file://./data.json
connector.property-version=1
connector.type=filesystem
format.derive-schema=true
format.fail-on-missing-field=false
format.property-version=1
format.type=json
schema.0.name=f0
schema.0.type=BIGINT
update-mode=append

...

The following factories have been considered:
org.apache.flink.formats.json.JsonRowFormatFactory
org.apache.flink.table.sources.CsvBatchTableSourceFactory

...
{code}
*Code*
{code:java}
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
EnvironmentSettings settings =

EnvironmentSettings.newInstance().useOldPlanner().inStreamingMode().build();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env, 
settings);DataStreamSource stream = env.fromElements(1L, 21L, 22L);   
 Table table = tableEnv.fromDataStream(stream);
tableEnv.registerTable("data", table);tableEnv
.connect(new FileSystem().path("file://./data.json"))
.withSchema(new Schema().field("f0", Types.LONG))
.withFormat(new Json().failOnMissingField(false).deriveSchema())
.inAppendMode()
.registerTableSink("sink");env.execute();
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14519) Fail on invoking declineCheckpoint remotely

2019-10-24 Thread Jiayi Liao (Jira)
Jiayi Liao created FLINK-14519:
--

 Summary: Fail on invoking declineCheckpoint remotely
 Key: FLINK-14519
 URL: https://issues.apache.org/jira/browse/FLINK-14519
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 1.9.1
Reporter: Jiayi Liao


On invoking declineCheckpoint, the reason field of DeclineCheckpoint is not 
serializable when  it is a RocksDBException because org.rocksdb.Status doesn't 
implement serializable.

Execption:


{panel:title=My title}
Caused by: java.io.IOException: Could not serialize 0th argument of method 
declineCheckpoint. This indicates that the argument type [Ljava.lang.Object; is 
not serializable. Arguments have to be serializable for remote rpc calls.
at 
org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation$MethodInvocation.writeObject(RemoteRpcInvocation.java:186)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at 
org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:586)
at org.apache.flink.util.SerializedValue.(SerializedValue.java:52)
at 
org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation.(RemoteRpcInvocation.java:53)
at 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.createRpcInvocationMessage(AkkaInvocationHandler.java:264)
at 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invokeRpc(AkkaInvocationHandler.java:200)
at 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.invoke(AkkaInvocationHandler.java:129)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaInvocationHandler.invoke(FencedAkkaInvocationHandler.java:78)
... 18 more
Caused by: java.io.NotSerializableException: org.rocksdb.Status
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:441)
at java.lang.Throwable.writeObject(Throwable.java:985)
at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at 
org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation$MethodInvocation.writeObject(RemoteRpcInvocation.java:182)
... 33 more
{panel}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14518) Generalize TM->RM payload

2019-10-24 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-14518:


 Summary: Generalize TM->RM payload
 Key: FLINK-14518
 URL: https://issues.apache.org/jira/browse/FLINK-14518
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.10.0


The TaskExecutor currently sends a {{SlotReport}} via heartbeats to the RM. In 
the future we also want to submit a report about available cluster partitions.

We should introduce a generic {{TaskExecutorHeartbeatPayload}} that for now 
only contains the slot report, so we can extend the payload later on more 
easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISUCSS] FLIP-80: Expression String Serializable and Deserializable

2019-10-24 Thread Jingsong Li
Thanks Jark and Timo for your explain.

+1 to use SQL strings. Just took a look to calcite unparse, it
really complex.

Jark, yeah, The only thing need by ExpressionConverter is RelBuilder.

Best,
Jingsong Lee

On Thu, Oct 24, 2019 at 5:09 PM Timo Walther  wrote:

> Hi Jark,
>
> +1 for your suggestion. I think it will simplify the design a lot if we
> serialize all expressions as SQL strings and will avoid duplicate parser
> code. Initially, I had concerns that there might be expressions in the
> future that cannot be represented in SQL. But currently I cannot come up
> with a counter example.
>
> Table operations will be a different topic that will require a custom
> string syntax. But expressions as SQL expressions sounds good to me.
>
> @Jingsong: Jark is right. Table API expression strings are outdated and
> error-prone. They will be removed at some point.
>
> Regards,
> Timo
>
>
> On 24.10.19 10:50, Jark Wu wrote:
> > Thanks Jingsong,
> >
> > As discussed in Java Expression DSL, we are planning to drop the Java
> Expression string API.
> > So I think we don’t have a plan to unify #1 and #2. And in the future,
> we may only have SQL parser to parse a string expression.
> > The only thing to consider is, whether can all the resolved expression
> be converted to SqlNode.
> > AFAIK, currently, after expression resolving, all the expressions can be
> converted to SqlNodes.
> >
> > Best,
> > Jark
> >
> > [1]:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduction-of-a-Table-API-Java-Expression-DSL-td27787.html
> <
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduction-of-a-Table-API-Java-Expression-DSL-td27787.html
> >
> >
> >> 在 2019年10月24日,13:30,Jingsong Li  写道:
> >>
> >> Thanks Jark for your proposal.
> >>
> >> If we introduce a new kind of string presentation for expression, we
> will
> >> have 3 string presentation now:
> >> 1. Java expression string api. We have PlannerExpressionParser to parse
> >> string to Expressions.
> >> 2. Sql string presentation, as you said, we can use calcite classes to
> >> parse and unparse.
> >> 3. New kind of string presentation for serialize.
> >>
> >>  From this point of view, I prefer not to introduce a new kind of string
> >> presentation to reduce the complexity.
> >>
> >> There are some differences between #1 and #2:
> >> - method invoking: "f0.substring(1, f7)" and "SUBSTRING(f0, 1, f7)"
> >> - bigint literal: "1L" and "cast(1 as BIGINT)"
> >>
> >> Now it is two completely independent sets., Whether we can unify #1 and
> #2
> >> into one set, and we all use one parser?
> >>
> >> Best,
> >> Jingsong Lee
> >>
> >> On Tue, Oct 22, 2019 at 7:57 PM Jark Wu  wrote:
> >>
> >>> Hi Timo,
> >>>
> >>> I think it's a good idea to use `SqlParser#parseExpression()` to parse
> >>> literals.
> >>> That means the string format of literal is SQL compatible.
> >>> After some discussion with Kurt, we think why not one more step
> forward,
> >>> i.e. convert the whole expression to SQL format.
> >>>
> >>> For example, the above expression will be converted to:
> >>>
> >>> `cat`.`db`.`func`(`cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
> >>>
> >>> There are some benefits from this:
> >>> 0) the string representation is more readable, and can be manually
> typed
> >>> more easily.
> >>> 1) the string format is SQL syntax, not customized, which means it can
> be
> >>> integrated by third party projects.
> >>> 2) we can reuse Calcite's SqlParser to parse string and
> SqlNode#unparse to
> >>> generate string, this can avoid introducing duplicate code and a custom
> >>> parser.
> >>> 3) no compatible problems.
> >>>
> >>> Regarding to how Expression can be converted into a SQL string, I
> think we
> >>> can leverage some Calcite utils:
> >>>
> >>> ResolvedExpression ---(ExpressionConverter)---> RexNode
> >>> (RexToSqlNodeConverter)---> SqlNode --> SqlNode#unparse()
> >>>
> >>> What do you think?
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Mon, 21 Oct 2019 at 22:08, Timo Walther  wrote:
> >>>
>  Hi Jark,
> 
>  thanks for the proposal. This is a great effort to finalize the new
> API
>  design.
> 
>  I'm wondering if we could simply use the SQL parser like
>  `org.apache.calcite.sql.parser.SqlParser#parseExpression(..)` to parse
>  an expression that contain only literals. This would avoid any
>  discussion as the syntax is already defined by the SQL standard. And
> it
>  would also be very unlikely to have a need for a version.
> 
>  For example:
> 
>  CALL('FUNC', FIELD('f0'), VALUE('TIMESTAMP(3)', TIMESTAMP '2019-10-21
>  12:12:12'))
> 
>  Or even further if the SQL parser allows that:
> 
>  CALL('FUNC', `cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
> 
>  I would find it confusing if we use different representation for
>  literals such as intervals and timestamps in the properties. This
> would
>  also reduce code duplication 

Re: [DISUCSS] FLIP-80: Expression String Serializable and Deserializable

2019-10-24 Thread Jark Wu
Thank you all, then I will update the design doc which should be pretty
simple now...

Best,
Jark

On Thu, 24 Oct 2019 at 17:09, Timo Walther  wrote:

> Hi Jark,
>
> +1 for your suggestion. I think it will simplify the design a lot if we
> serialize all expressions as SQL strings and will avoid duplicate parser
> code. Initially, I had concerns that there might be expressions in the
> future that cannot be represented in SQL. But currently I cannot come up
> with a counter example.
>
> Table operations will be a different topic that will require a custom
> string syntax. But expressions as SQL expressions sounds good to me.
>
> @Jingsong: Jark is right. Table API expression strings are outdated and
> error-prone. They will be removed at some point.
>
> Regards,
> Timo
>
>
> On 24.10.19 10:50, Jark Wu wrote:
> > Thanks Jingsong,
> >
> > As discussed in Java Expression DSL, we are planning to drop the Java
> Expression string API.
> > So I think we don’t have a plan to unify #1 and #2. And in the future,
> we may only have SQL parser to parse a string expression.
> > The only thing to consider is, whether can all the resolved expression
> be converted to SqlNode.
> > AFAIK, currently, after expression resolving, all the expressions can be
> converted to SqlNodes.
> >
> > Best,
> > Jark
> >
> > [1]:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduction-of-a-Table-API-Java-Expression-DSL-td27787.html
> <
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduction-of-a-Table-API-Java-Expression-DSL-td27787.html
> >
> >
> >> 在 2019年10月24日,13:30,Jingsong Li  写道:
> >>
> >> Thanks Jark for your proposal.
> >>
> >> If we introduce a new kind of string presentation for expression, we
> will
> >> have 3 string presentation now:
> >> 1. Java expression string api. We have PlannerExpressionParser to parse
> >> string to Expressions.
> >> 2. Sql string presentation, as you said, we can use calcite classes to
> >> parse and unparse.
> >> 3. New kind of string presentation for serialize.
> >>
> >>  From this point of view, I prefer not to introduce a new kind of string
> >> presentation to reduce the complexity.
> >>
> >> There are some differences between #1 and #2:
> >> - method invoking: "f0.substring(1, f7)" and "SUBSTRING(f0, 1, f7)"
> >> - bigint literal: "1L" and "cast(1 as BIGINT)"
> >>
> >> Now it is two completely independent sets., Whether we can unify #1 and
> #2
> >> into one set, and we all use one parser?
> >>
> >> Best,
> >> Jingsong Lee
> >>
> >> On Tue, Oct 22, 2019 at 7:57 PM Jark Wu  wrote:
> >>
> >>> Hi Timo,
> >>>
> >>> I think it's a good idea to use `SqlParser#parseExpression()` to parse
> >>> literals.
> >>> That means the string format of literal is SQL compatible.
> >>> After some discussion with Kurt, we think why not one more step
> forward,
> >>> i.e. convert the whole expression to SQL format.
> >>>
> >>> For example, the above expression will be converted to:
> >>>
> >>> `cat`.`db`.`func`(`cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
> >>>
> >>> There are some benefits from this:
> >>> 0) the string representation is more readable, and can be manually
> typed
> >>> more easily.
> >>> 1) the string format is SQL syntax, not customized, which means it can
> be
> >>> integrated by third party projects.
> >>> 2) we can reuse Calcite's SqlParser to parse string and
> SqlNode#unparse to
> >>> generate string, this can avoid introducing duplicate code and a custom
> >>> parser.
> >>> 3) no compatible problems.
> >>>
> >>> Regarding to how Expression can be converted into a SQL string, I
> think we
> >>> can leverage some Calcite utils:
> >>>
> >>> ResolvedExpression ---(ExpressionConverter)---> RexNode
> >>> (RexToSqlNodeConverter)---> SqlNode --> SqlNode#unparse()
> >>>
> >>> What do you think?
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Mon, 21 Oct 2019 at 22:08, Timo Walther  wrote:
> >>>
>  Hi Jark,
> 
>  thanks for the proposal. This is a great effort to finalize the new
> API
>  design.
> 
>  I'm wondering if we could simply use the SQL parser like
>  `org.apache.calcite.sql.parser.SqlParser#parseExpression(..)` to parse
>  an expression that contain only literals. This would avoid any
>  discussion as the syntax is already defined by the SQL standard. And
> it
>  would also be very unlikely to have a need for a version.
> 
>  For example:
> 
>  CALL('FUNC', FIELD('f0'), VALUE('TIMESTAMP(3)', TIMESTAMP '2019-10-21
>  12:12:12'))
> 
>  Or even further if the SQL parser allows that:
> 
>  CALL('FUNC', `cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
> 
>  I would find it confusing if we use different representation for
>  literals such as intervals and timestamps in the properties. This
> would
>  also reduce code duplication as we reuse logic for parsing identifiers
> >>> etc.
> 
>  What do you think?
> 
>  Regards,
> 

Re: [DISCUSS] FLIP-70 - Support Computed Column for Flink SQL

2019-10-24 Thread Jark Wu
Yes. I think it makes sense to move to "Future Work" section.

Best,
Jark

On Thu, 24 Oct 2019 at 17:11, Kurt Young  wrote:

> +1 to move to a future section. By deleting it I mean remove from
> the content describing the current processing procedure.
>
> Best,
> Kurt
>
>
> On Thu, Oct 24, 2019 at 5:01 PM Timo Walther  wrote:
>
> > Having an MVP and a limited scope sounds good to me. But I would not
> > remove the STORED keyword entirely from the document.
> >
> > It shows that we have a long-term vision. Instead of deleting this
> > content, I would move it to a Outlook/Future Work section.
> >
> > Regards,
> > Timo
> >
> >
> > On 24.10.19 10:55, Jark Wu wrote:
> > > +1 to remove “STORED” related content. We can add them when user
> > requires.
> > > Others looks good to me in general.
> > >
> > > Thanks,
> > > Jark
> > >
> > >
> > >> 在 2019年10月24日,14:58,Kurt Young  写道:
> > >>
> > >> Hi Danny,
> > >>
> > >> Thanks for preparing this design document. IMO It's a very useful
> > >> feature, especially combined with time attribute support to specify
> > >> watermark in DDL.
> > >>
> > >> The design doc looks quite good, but I would suggest to reduce the
> > >> scope of the first version. Like we don't have to support "STORED"
> > >> in the first MVP version, and you can also delete related content in
> > >> document to make it more clean and easier to understand.
> > >>
> > >> Best,
> > >> Kurt
> > >>
> > >>
> > >> On Tue, Sep 17, 2019 at 9:18 PM Qi Luo  wrote:
> > >>
> > >>> Fantastic! We're also very interested in this feature.
> > >>>
> > >>> +Boxiu
> > >>>
> > >>> On Tue, Sep 17, 2019 at 11:31 AM Danny Chan 
> > wrote:
> > >>>
> >  In umbrella task FLINK-10232 we have introduced CREATE TABLE grammar
> > in
> >  our new module flink-sql-parser. And we proposed to use computed
> > column
> > >>> to
> >  describe the time attribute of process time in the design doc FLINK
> > SQL
> >  DDL, so user may create a table with process time attribute as
> > follows:
> >  create table T1(
> >    a int,
> >    b bigint,
> >    c varchar,
> >    d as PROCTIME,
> >  ) with (
> >    'k1' = 'v1',
> >    'k2' = 'v2'
> >  );
> > 
> >  The column d would be a process time attribute for table T1.
> > 
> >  Besides that, computed  columns have several other use cases, such
> as
> >  these [2]:
> > 
> > 
> >  • Virtual generated columns can be used as a way to simplify and
> unify
> >  queries. A complicated condition can be defined as a generated
> column
> > and
> >  referred to from multiple queries on the table to ensure that all of
> > them
> >  use exactly the same condition.
> >  • Stored generated columns can be used as a materialized cache for
> >  complicated conditions that are costly to calculate on the fly.
> >  • Generated columns can simulate functional indexes: Use a generated
> >  column to define a functional expression and index it. This can be
> > useful
> >  for working with columns of types that cannot be indexed directly,
> > such
> > >>> as
> >  JSON columns.
> >  • For stored generated columns, the disadvantage of this approach is
> > that
> >  values are stored twice; once as the value of the generated column
> and
> > >>> once
> >  in the index.
> >  • If a generated column is indexed, the optimizer recognizes query
> >  expressions that match the column definition and uses indexes from
> the
> >  column as appropriate during query execution(Not supported yet).
> > 
> > 
> > 
> >  Computed columns are introduced in SQL-SERVER-2016 [1], MYSQL-5.6
> [2]
> > and
> >  ORACLE-11g [3].
> > 
> >  This is the design doc:
> > 
> > 
> > >>>
> >
> https://docs.google.com/document/d/110TseRtTCphxETPY7uhiHpu-dph3NEesh3mYKtJ7QOY/edit?usp=sharing
> > 
> >  Any suggestions are appreciated, thanks.
> > 
> >  [1]
> > 
> > >>>
> >
> https://docs.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-2016
> >  [2]
> > 
> > >>>
> >
> https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html
> >  [3] https://oracle-base.com/articles/11g/virtual-columns-11gr1
> > 
> >  Best,
> >  Danny Chan
> > 
> > >>>
> >
> >
>


[RESULT][VOTE] FLIP-65: New type inference for Table API UDFs

2019-10-24 Thread Timo Walther

Hi everyone,

the voting time for FLIP-65 has passed.

There were 7 +1 votes, 6 of which are binding:
- Aljoscha (binding)
- Jark (binding)
- Hequn (binding)
- Jincheng (binding)
- Xingcan (binding)
- Rong (binding)
- Jingsong (non-binding)

There were no disapproving votes.

Thus, FLIP-65 has been accepted.

Thanks everyone for joining the discussion and giving feedback,
Timo

On 24.10.19 04:42, Jingsong Li wrote:

+1 (non-binding)

Best,
Jingsong Lee

On Mon, Oct 21, 2019 at 11:36 PM Rong Rong  wrote:


+1 (binding)

Thanks Timo for driving this.

--
Rong

On Mon, Oct 21, 2019 at 8:19 AM  wrote:


+1 (binding)

Best,
Xingcan

-Original Message-
From: jincheng sun 
Sent: Monday, October 21, 2019 5:04
To: dev 
Subject: Re: [VOTE] FLIP-65: New type inference for Table API UDFs

+1(binding)

Best,
Jincheng

Jark Wu  于2019年10月21日周一 下午4:55写道:


+1 (binding)

Best,
Jark


在 2019年10月21日,16:52,Aljoscha Krettek  写道:

+1 (binding)

Aljoscha


On 21. Oct 2019, at 10:50, Timo Walther  wrote:

Hi everyone,

I would like to start a vote on FLIP-65. The discussion seems to
have reached an agreement.

Please vote for the following design document:



https://cwiki.apache.org/confluence/display/FLINK/FLIP-65%3A+New+type+
inference+for+Table+API+UDFs



The discussion can be found at:



https://lists.apache.org/thread.html/2b8cfd811a927d64a79f47387b7412c3b
98a11fcb9358d3c23ef666c@%3Cdev.flink.apache.org%3E


This voting will be open for at least 72 hours. I'll try to close
it on
2019-10-24 9:00 UTC, unless there is an objection or not enough

votes.


Best,

Timo


















Re: [Discussion] FLIP-79 Flink Function DDL Support

2019-10-24 Thread Timo Walther

Hi Peter,

thanks for your proposal. I left some comments in the FLIP document. I 
agree with Terry that we can have a MVP in Flink 1.10 but should already 
discuss the bigger picture as a DDL string cannot be changed easily once 
released.


In particular we should discuss how resources for function are loaded. 
If they are simply added to the JobGraph they are available to all 
functions and could potentially interfere with each other, right?


Thanks,
Timo



On 24.10.19 05:32, Terry Wang wrote:

Hi Peter,

Sorry late to reply. Thanks for your efforts on this and I just looked through 
your design.
I left some comments in the doc about alter function section and  function 
catalog interface.
IMO, the overall design is ok and we can discuss further more about some 
details.
I also think it’s necessary to have this awesome feature limit to basic 
function (of course better to have all :) ) in 1.10 release.

Best,
Terry Wang




2019年10月16日 14:19,Peter Huang  写道:

Hi Xuefu,

Thank you for the feedback. I think you are pointing out a similar concern
with Bowen. Let me describe
how the catalog function and function factory will be changed in the
implementation section.
Then, we can have more discussion in detail.


Best Regards
Peter Huang

On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z  wrote:


Thanks to Peter for the proposal!

I left some comments in the google doc. Besides what Bowen pointed out, I'm
unclear about how things  work end to end from the document. For instance,
SQL DDL-like function definition is mentioned. I guess just having a DDL
for it doesn't explain how it's supported functionally. I think it's better
to have some clarification on what is expected work and what's for the
future.

Thanks,
Xuefu


On Tue, Oct 15, 2019 at 11:05 AM Bowen Li  wrote:


Hi Zhenqiu,

Thanks for taking on this effort!

A couple questions:
- Though this FLIP is about function DDL, can we also think about how the
created functions can be mapped to CatalogFunction and see if we need to
modify CatalogFunction interface? Syntax changes need to be backed by the
backend.
- Can we define a clearer, smaller scope targeting for Flink 1.10 among

all

the proposed changes? The current overall scope seems to be quite wide,

and

it may be unrealistic to get everything in a single release, or even a
couple. However, I believe the most common user story can be something as
simple as "being able to create and persist a java class-based udf and

use

it later in queries", which will add great value for most Flink users and
is achievable in 1.10.

Bowen

On Sun, Oct 13, 2019 at 10:46 PM Peter Huang 
Dear Community,

FLIP-79 Flink Function DDL Support
<




https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#




This proposal aims to support function DDL with the consideration of

SQL

syntax, language compliance, and advanced external UDF lib

registration.

The Flink DDL is initialized and discussed in the design
<




https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil



[1] by Shuyi Chen and Timo. As the initial discussion mainly focused on

the

table, type and view. FLIP-69 [2] extend it with a more detailed

discussion

of DDL for catalog, database, and function. Original the function DDL

was

under the scope of FLIP-69. After some discussion
 with the community,

we

found that there are several ongoing efforts, such as FLIP-64 [3],

FLIP-65

[4], and FLIP-78 [5]. As they will directly impact the SQL syntax of
function DDL, the proposal wants to describe the problem clearly with

the

consideration of existing works and make sure the design aligns with
efforts of API change of temporary objects and type inference for UDF
defined by different languages.

The FlLIP outlines the requirements from related works, and propose a

SQL

syntax to meet those requirements. The corresponding implementation is

also

discussed. Please kindly review and give feedback.


Best Regards
Peter Huang






--
Xuefu Zhang

"In Honey We Trust!"





Re: [DISCUSS] FLIP-70 - Support Computed Column for Flink SQL

2019-10-24 Thread Kurt Young
+1 to move to a future section. By deleting it I mean remove from
the content describing the current processing procedure.

Best,
Kurt


On Thu, Oct 24, 2019 at 5:01 PM Timo Walther  wrote:

> Having an MVP and a limited scope sounds good to me. But I would not
> remove the STORED keyword entirely from the document.
>
> It shows that we have a long-term vision. Instead of deleting this
> content, I would move it to a Outlook/Future Work section.
>
> Regards,
> Timo
>
>
> On 24.10.19 10:55, Jark Wu wrote:
> > +1 to remove “STORED” related content. We can add them when user
> requires.
> > Others looks good to me in general.
> >
> > Thanks,
> > Jark
> >
> >
> >> 在 2019年10月24日,14:58,Kurt Young  写道:
> >>
> >> Hi Danny,
> >>
> >> Thanks for preparing this design document. IMO It's a very useful
> >> feature, especially combined with time attribute support to specify
> >> watermark in DDL.
> >>
> >> The design doc looks quite good, but I would suggest to reduce the
> >> scope of the first version. Like we don't have to support "STORED"
> >> in the first MVP version, and you can also delete related content in
> >> document to make it more clean and easier to understand.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Tue, Sep 17, 2019 at 9:18 PM Qi Luo  wrote:
> >>
> >>> Fantastic! We're also very interested in this feature.
> >>>
> >>> +Boxiu
> >>>
> >>> On Tue, Sep 17, 2019 at 11:31 AM Danny Chan 
> wrote:
> >>>
>  In umbrella task FLINK-10232 we have introduced CREATE TABLE grammar
> in
>  our new module flink-sql-parser. And we proposed to use computed
> column
> >>> to
>  describe the time attribute of process time in the design doc FLINK
> SQL
>  DDL, so user may create a table with process time attribute as
> follows:
>  create table T1(
>    a int,
>    b bigint,
>    c varchar,
>    d as PROCTIME,
>  ) with (
>    'k1' = 'v1',
>    'k2' = 'v2'
>  );
> 
>  The column d would be a process time attribute for table T1.
> 
>  Besides that, computed  columns have several other use cases, such as
>  these [2]:
> 
> 
>  • Virtual generated columns can be used as a way to simplify and unify
>  queries. A complicated condition can be defined as a generated column
> and
>  referred to from multiple queries on the table to ensure that all of
> them
>  use exactly the same condition.
>  • Stored generated columns can be used as a materialized cache for
>  complicated conditions that are costly to calculate on the fly.
>  • Generated columns can simulate functional indexes: Use a generated
>  column to define a functional expression and index it. This can be
> useful
>  for working with columns of types that cannot be indexed directly,
> such
> >>> as
>  JSON columns.
>  • For stored generated columns, the disadvantage of this approach is
> that
>  values are stored twice; once as the value of the generated column and
> >>> once
>  in the index.
>  • If a generated column is indexed, the optimizer recognizes query
>  expressions that match the column definition and uses indexes from the
>  column as appropriate during query execution(Not supported yet).
> 
> 
> 
>  Computed columns are introduced in SQL-SERVER-2016 [1], MYSQL-5.6 [2]
> and
>  ORACLE-11g [3].
> 
>  This is the design doc:
> 
> 
> >>>
> https://docs.google.com/document/d/110TseRtTCphxETPY7uhiHpu-dph3NEesh3mYKtJ7QOY/edit?usp=sharing
> 
>  Any suggestions are appreciated, thanks.
> 
>  [1]
> 
> >>>
> https://docs.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-2016
>  [2]
> 
> >>>
> https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html
>  [3] https://oracle-base.com/articles/11g/virtual-columns-11gr1
> 
>  Best,
>  Danny Chan
> 
> >>>
>
>


Re: [DISUCSS] FLIP-80: Expression String Serializable and Deserializable

2019-10-24 Thread Timo Walther

Hi Jark,

+1 for your suggestion. I think it will simplify the design a lot if we 
serialize all expressions as SQL strings and will avoid duplicate parser 
code. Initially, I had concerns that there might be expressions in the 
future that cannot be represented in SQL. But currently I cannot come up 
with a counter example.


Table operations will be a different topic that will require a custom 
string syntax. But expressions as SQL expressions sounds good to me.


@Jingsong: Jark is right. Table API expression strings are outdated and 
error-prone. They will be removed at some point.


Regards,
Timo


On 24.10.19 10:50, Jark Wu wrote:

Thanks Jingsong,

As discussed in Java Expression DSL, we are planning to drop the Java 
Expression string API.
So I think we don’t have a plan to unify #1 and #2. And in the future, we may 
only have SQL parser to parse a string expression.
The only thing to consider is, whether can all the resolved expression be 
converted to SqlNode.
AFAIK, currently, after expression resolving, all the expressions can be 
converted to SqlNodes.

Best,
Jark

[1]: 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduction-of-a-Table-API-Java-Expression-DSL-td27787.html
 



在 2019年10月24日,13:30,Jingsong Li  写道:

Thanks Jark for your proposal.

If we introduce a new kind of string presentation for expression, we will
have 3 string presentation now:
1. Java expression string api. We have PlannerExpressionParser to parse
string to Expressions.
2. Sql string presentation, as you said, we can use calcite classes to
parse and unparse.
3. New kind of string presentation for serialize.

 From this point of view, I prefer not to introduce a new kind of string
presentation to reduce the complexity.

There are some differences between #1 and #2:
- method invoking: "f0.substring(1, f7)" and "SUBSTRING(f0, 1, f7)"
- bigint literal: "1L" and "cast(1 as BIGINT)"

Now it is two completely independent sets., Whether we can unify #1 and #2
into one set, and we all use one parser?

Best,
Jingsong Lee

On Tue, Oct 22, 2019 at 7:57 PM Jark Wu  wrote:


Hi Timo,

I think it's a good idea to use `SqlParser#parseExpression()` to parse
literals.
That means the string format of literal is SQL compatible.
After some discussion with Kurt, we think why not one more step forward,
i.e. convert the whole expression to SQL format.

For example, the above expression will be converted to:

`cat`.`db`.`func`(`cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')

There are some benefits from this:
0) the string representation is more readable, and can be manually typed
more easily.
1) the string format is SQL syntax, not customized, which means it can be
integrated by third party projects.
2) we can reuse Calcite's SqlParser to parse string and SqlNode#unparse to
generate string, this can avoid introducing duplicate code and a custom
parser.
3) no compatible problems.

Regarding to how Expression can be converted into a SQL string, I think we
can leverage some Calcite utils:

ResolvedExpression ---(ExpressionConverter)---> RexNode
(RexToSqlNodeConverter)---> SqlNode --> SqlNode#unparse()

What do you think?

Best,
Jark

On Mon, 21 Oct 2019 at 22:08, Timo Walther  wrote:


Hi Jark,

thanks for the proposal. This is a great effort to finalize the new API
design.

I'm wondering if we could simply use the SQL parser like
`org.apache.calcite.sql.parser.SqlParser#parseExpression(..)` to parse
an expression that contain only literals. This would avoid any
discussion as the syntax is already defined by the SQL standard. And it
would also be very unlikely to have a need for a version.

For example:

CALL('FUNC', FIELD('f0'), VALUE('TIMESTAMP(3)', TIMESTAMP '2019-10-21
12:12:12'))

Or even further if the SQL parser allows that:

CALL('FUNC', `cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')

I would find it confusing if we use different representation for
literals such as intervals and timestamps in the properties. This would
also reduce code duplication as we reuse logic for parsing identifiers

etc.


What do you think?

Regards,
Timo


On 18.10.19 12:28, Jark Wu wrote:

Hi everyone,

I would like to start a discussion[1] about how to make Expression

string

serializable and deserializable. Expression is the general interface

for

all kinds of expressions in Flink Table API & SQL, it represents a

logical

tree for producing a computation result. In FLIP-66[2] and FLIP-70[3],

we

introduced watermark and computed column syntax in DDL. The watermark
strategy and computed column are both represented in Expression. In

order

to persist watermark and computed column information in catalog, we

need

to

figure out how to persist and restore Expression.

FLIP-80:




https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing


Thanks for 

Re: [DISCUSS] FLIP-70 - Support Computed Column for Flink SQL

2019-10-24 Thread Timo Walther
Having an MVP and a limited scope sounds good to me. But I would not 
remove the STORED keyword entirely from the document.


It shows that we have a long-term vision. Instead of deleting this 
content, I would move it to a Outlook/Future Work section.


Regards,
Timo


On 24.10.19 10:55, Jark Wu wrote:

+1 to remove “STORED” related content. We can add them when user requires.
Others looks good to me in general.

Thanks,
Jark



在 2019年10月24日,14:58,Kurt Young  写道:

Hi Danny,

Thanks for preparing this design document. IMO It's a very useful
feature, especially combined with time attribute support to specify
watermark in DDL.

The design doc looks quite good, but I would suggest to reduce the
scope of the first version. Like we don't have to support "STORED"
in the first MVP version, and you can also delete related content in
document to make it more clean and easier to understand.

Best,
Kurt


On Tue, Sep 17, 2019 at 9:18 PM Qi Luo  wrote:


Fantastic! We're also very interested in this feature.

+Boxiu

On Tue, Sep 17, 2019 at 11:31 AM Danny Chan  wrote:


In umbrella task FLINK-10232 we have introduced CREATE TABLE grammar in
our new module flink-sql-parser. And we proposed to use computed column

to

describe the time attribute of process time in the design doc FLINK SQL
DDL, so user may create a table with process time attribute as follows:
create table T1(
  a int,
  b bigint,
  c varchar,
  d as PROCTIME,
) with (
  'k1' = 'v1',
  'k2' = 'v2'
);

The column d would be a process time attribute for table T1.

Besides that, computed  columns have several other use cases, such as
these [2]:


• Virtual generated columns can be used as a way to simplify and unify
queries. A complicated condition can be defined as a generated column and
referred to from multiple queries on the table to ensure that all of them
use exactly the same condition.
• Stored generated columns can be used as a materialized cache for
complicated conditions that are costly to calculate on the fly.
• Generated columns can simulate functional indexes: Use a generated
column to define a functional expression and index it. This can be useful
for working with columns of types that cannot be indexed directly, such

as

JSON columns.
• For stored generated columns, the disadvantage of this approach is that
values are stored twice; once as the value of the generated column and

once

in the index.
• If a generated column is indexed, the optimizer recognizes query
expressions that match the column definition and uses indexes from the
column as appropriate during query execution(Not supported yet).



Computed columns are introduced in SQL-SERVER-2016 [1], MYSQL-5.6 [2] and
ORACLE-11g [3].

This is the design doc:



https://docs.google.com/document/d/110TseRtTCphxETPY7uhiHpu-dph3NEesh3mYKtJ7QOY/edit?usp=sharing


Any suggestions are appreciated, thanks.

[1]


https://docs.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-2016

[2]


https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html

[3] https://oracle-base.com/articles/11g/virtual-columns-11gr1

Best,
Danny Chan







Re: [DISUCSS] FLIP-80: Expression String Serializable and Deserializable

2019-10-24 Thread Jark Wu
Thanks Jingsong,

As discussed in Java Expression DSL, we are planning to drop the Java 
Expression string API. 
So I think we don’t have a plan to unify #1 and #2. And in the future, we may 
only have SQL parser to parse a string expression.
The only thing to consider is, whether can all the resolved expression be 
converted to SqlNode. 
AFAIK, currently, after expression resolving, all the expressions can be 
converted to SqlNodes. 

Best,
Jark

[1]: 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduction-of-a-Table-API-Java-Expression-DSL-td27787.html
 


> 在 2019年10月24日,13:30,Jingsong Li  写道:
> 
> Thanks Jark for your proposal.
> 
> If we introduce a new kind of string presentation for expression, we will
> have 3 string presentation now:
> 1. Java expression string api. We have PlannerExpressionParser to parse
> string to Expressions.
> 2. Sql string presentation, as you said, we can use calcite classes to
> parse and unparse.
> 3. New kind of string presentation for serialize.
> 
> From this point of view, I prefer not to introduce a new kind of string
> presentation to reduce the complexity.
> 
> There are some differences between #1 and #2:
> - method invoking: "f0.substring(1, f7)" and "SUBSTRING(f0, 1, f7)"
> - bigint literal: "1L" and "cast(1 as BIGINT)"
> 
> Now it is two completely independent sets., Whether we can unify #1 and #2
> into one set, and we all use one parser?
> 
> Best,
> Jingsong Lee
> 
> On Tue, Oct 22, 2019 at 7:57 PM Jark Wu  wrote:
> 
>> Hi Timo,
>> 
>> I think it's a good idea to use `SqlParser#parseExpression()` to parse
>> literals.
>> That means the string format of literal is SQL compatible.
>> After some discussion with Kurt, we think why not one more step forward,
>> i.e. convert the whole expression to SQL format.
>> 
>> For example, the above expression will be converted to:
>> 
>> `cat`.`db`.`func`(`cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
>> 
>> There are some benefits from this:
>> 0) the string representation is more readable, and can be manually typed
>> more easily.
>> 1) the string format is SQL syntax, not customized, which means it can be
>> integrated by third party projects.
>> 2) we can reuse Calcite's SqlParser to parse string and SqlNode#unparse to
>> generate string, this can avoid introducing duplicate code and a custom
>> parser.
>> 3) no compatible problems.
>> 
>> Regarding to how Expression can be converted into a SQL string, I think we
>> can leverage some Calcite utils:
>> 
>> ResolvedExpression ---(ExpressionConverter)---> RexNode
>> (RexToSqlNodeConverter)---> SqlNode --> SqlNode#unparse()
>> 
>> What do you think?
>> 
>> Best,
>> Jark
>> 
>> On Mon, 21 Oct 2019 at 22:08, Timo Walther  wrote:
>> 
>>> Hi Jark,
>>> 
>>> thanks for the proposal. This is a great effort to finalize the new API
>>> design.
>>> 
>>> I'm wondering if we could simply use the SQL parser like
>>> `org.apache.calcite.sql.parser.SqlParser#parseExpression(..)` to parse
>>> an expression that contain only literals. This would avoid any
>>> discussion as the syntax is already defined by the SQL standard. And it
>>> would also be very unlikely to have a need for a version.
>>> 
>>> For example:
>>> 
>>> CALL('FUNC', FIELD('f0'), VALUE('TIMESTAMP(3)', TIMESTAMP '2019-10-21
>>> 12:12:12'))
>>> 
>>> Or even further if the SQL parser allows that:
>>> 
>>> CALL('FUNC', `cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
>>> 
>>> I would find it confusing if we use different representation for
>>> literals such as intervals and timestamps in the properties. This would
>>> also reduce code duplication as we reuse logic for parsing identifiers
>> etc.
>>> 
>>> What do you think?
>>> 
>>> Regards,
>>> Timo
>>> 
>>> 
>>> On 18.10.19 12:28, Jark Wu wrote:
 Hi everyone,
 
 I would like to start a discussion[1] about how to make Expression
>> string
 serializable and deserializable. Expression is the general interface
>> for
 all kinds of expressions in Flink Table API & SQL, it represents a
>>> logical
 tree for producing a computation result. In FLIP-66[2] and FLIP-70[3],
>> we
 introduced watermark and computed column syntax in DDL. The watermark
 strategy and computed column are both represented in Expression. In
>> order
 to persist watermark and computed column information in catalog, we
>> need
>>> to
 figure out how to persist and restore Expression.
 
 FLIP-80:
 
>>> 
>> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
 
 Thanks for any feedback!
 
 Best,
 Jark
 
 [1]:
 
>>> 
>> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
 [2]:
 
>>> 
>> 

Re: [DISCUSS] Rename the SQL ANY type to OPAQUE type

2019-10-24 Thread Timo Walther

Thanks everyone for the feedback!

We have a clear favorite which is the RAW type. I will make sure that 
this change is applied to the Flink code base.


Thanks again,
Timo


On 22.10.19 04:07, Terry Wang wrote:

“OPAQUE” seems a little strange to me.
+ 1 for ‘RAW’.

Best,
Terry Wang




2019年10月22日 09:19,Kurt Young  写道:

+1 to RAW, if there's no better candidate comes up.

Best,
Kurt


On Mon, Oct 21, 2019 at 9:25 PM Timo Walther  wrote:


I would also avoid `UNKNOWN` because of the mentioned reasons.

I'm fine with `RAW`. I will wait another day or two until I conclude the
discussion.

Thanks,
Timo


On 21.10.19 12:23, Jark Wu wrote:

I also think `UNKNOWN` is not suitable here.
Because we already have `UNKNOWN` value in SQL, i.e. the three-valued

logic

(TRUE, FALSE, UNKNOWN) of BOOLEAN type.
It will confuse users here, what's the relationship between them.

Best,
Jark

On Mon, 21 Oct 2019 at 17:53, Paul Lam  wrote:


Hi,

IMHO, `UNKNOWN` does not fully reflects the situation here, because the
types are
actually “known” to users, and users just want to leave them out of

Flink

type system.

+1 for `RAW`, for it's more intuitive than `OPAQUE`.

Best,
Paul Lam


在 2019年10月21日,16:43,Kurt Young  写道:

OPAQUE seems to be a little bit advanced to a lot non-english
speakers (including me). I think Xuefu raised a good alternative:
UNKNOWN. What do you think about it?

Best,
Kurt


On Mon, Oct 21, 2019 at 3:49 PM Aljoscha Krettek 
wrote:


I prefer OPAQUE compared to ANY because any is often the root object

in

an

object hierarchy and would indicate to users the wrong thing.

Aljoscha


On 18. Oct 2019, at 18:41, Xuefu Z  wrote:

Thanks to Timo for bringing up an interesting topic.

Personally, "OPAQUE" doesn't seem very intuitive with respect to

types.

(It

suits pretty well to glasses, thought. :)) Anyway, could we just use
"UNKNOWN", which is more explicit and true reflects its nature?

Thanks,
Xuefu


On Fri, Oct 18, 2019 at 7:51 AM Timo Walther 

wrote:

Hi everyone,

Stephan pointed out that our naming of a generic/blackbox/opaque

type

in

SQL might be not intuitive for users. As the term ANY rather

describes a

"super-class of all types" which is not the case in our type system.

Our

current ANY type stands for a type that is just a blackbox within

SQL,

serialized by some custom serializer, that can only be modified

within

UDFs.

I also gathered feedback from a training instructor and native

English

speaker (David in CC) where I received the following:

"The way I’m thinking about this is this: there’s a concept here

that

people have to become aware of, which is that Flink SQL is able to
operate generically on opaquely typed things — and folks need to be

able

to connect what they see in code examples, etc. with this concept

(which

they may be unaware of initially).
I feel like ANY misses the mark a little bit, but isn’t particularly
bad. I do worry that it may cause some confusion about its purpose

and

power. I think OPAQUE would more clearly express what’s going on."

Also resources like Wikipedia [1] show that this terminology is

common:

"a data type whose concrete data structure is not defined [...] its
values can only be manipulated by calling subroutines that have

access

to the missing information"

I would therefore vote for refactoring the type name because it is

not

used much yet.

Implications are:

- a new parser keyword "OPAQUE" and changed SQL parser

- changes for logical type root, logical type visitors, and their

usages

What do you think?

Thanks,

Timo

[1] https://en.wikipedia.org/wiki/Opaque_data_type




--
Xuefu Zhang

"In Honey We Trust!"











Re: [DISCUSS] FLIP-70 - Support Computed Column for Flink SQL

2019-10-24 Thread Jark Wu
+1 to remove “STORED” related content. We can add them when user requires. 
Others looks good to me in general. 

Thanks,
Jark


> 在 2019年10月24日,14:58,Kurt Young  写道:
> 
> Hi Danny,
> 
> Thanks for preparing this design document. IMO It's a very useful
> feature, especially combined with time attribute support to specify
> watermark in DDL.
> 
> The design doc looks quite good, but I would suggest to reduce the
> scope of the first version. Like we don't have to support "STORED"
> in the first MVP version, and you can also delete related content in
> document to make it more clean and easier to understand.
> 
> Best,
> Kurt
> 
> 
> On Tue, Sep 17, 2019 at 9:18 PM Qi Luo  wrote:
> 
>> Fantastic! We're also very interested in this feature.
>> 
>> +Boxiu
>> 
>> On Tue, Sep 17, 2019 at 11:31 AM Danny Chan  wrote:
>> 
>>> In umbrella task FLINK-10232 we have introduced CREATE TABLE grammar in
>>> our new module flink-sql-parser. And we proposed to use computed column
>> to
>>> describe the time attribute of process time in the design doc FLINK SQL
>>> DDL, so user may create a table with process time attribute as follows:
>>> create table T1(
>>>  a int,
>>>  b bigint,
>>>  c varchar,
>>>  d as PROCTIME,
>>> ) with (
>>>  'k1' = 'v1',
>>>  'k2' = 'v2'
>>> );
>>> 
>>> The column d would be a process time attribute for table T1.
>>> 
>>> Besides that, computed  columns have several other use cases, such as
>>> these [2]:
>>> 
>>> 
>>> • Virtual generated columns can be used as a way to simplify and unify
>>> queries. A complicated condition can be defined as a generated column and
>>> referred to from multiple queries on the table to ensure that all of them
>>> use exactly the same condition.
>>> • Stored generated columns can be used as a materialized cache for
>>> complicated conditions that are costly to calculate on the fly.
>>> • Generated columns can simulate functional indexes: Use a generated
>>> column to define a functional expression and index it. This can be useful
>>> for working with columns of types that cannot be indexed directly, such
>> as
>>> JSON columns.
>>> • For stored generated columns, the disadvantage of this approach is that
>>> values are stored twice; once as the value of the generated column and
>> once
>>> in the index.
>>> • If a generated column is indexed, the optimizer recognizes query
>>> expressions that match the column definition and uses indexes from the
>>> column as appropriate during query execution(Not supported yet).
>>> 
>>> 
>>> 
>>> Computed columns are introduced in SQL-SERVER-2016 [1], MYSQL-5.6 [2] and
>>> ORACLE-11g [3].
>>> 
>>> This is the design doc:
>>> 
>>> 
>> https://docs.google.com/document/d/110TseRtTCphxETPY7uhiHpu-dph3NEesh3mYKtJ7QOY/edit?usp=sharing
>>> 
>>> Any suggestions are appreciated, thanks.
>>> 
>>> [1]
>>> 
>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-2016
>>> [2]
>>> 
>> https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html
>>> [3] https://oracle-base.com/articles/11g/virtual-columns-11gr1
>>> 
>>> Best,
>>> Danny Chan
>>> 
>> 



Re: [VOTE] FLIP-81: Executor-related new ConfigOptions

2019-10-24 Thread Jeff Zhang
+1 (non-binding)

vino yang  于2019年10月24日周四 下午4:06写道:

> +1 (non-binding)
>
> Best,
> Vino
>
> Yang Wang  于2019年10月23日周三 下午9:05写道:
>
> > We could benefit a lot from unifying the cli options and config options.
> >
> > +1(non-binding)
> >
> > Best,
> > Yang
> >
> > Aljoscha Krettek  于2019年10月23日周三 下午5:24写道:
> >
> > > +1 (binding)
> > >
> > > Aljoscha
> > >
> > > > On 23. Oct 2019, at 10:41, Zili Chen  wrote:
> > > >
> > > > Thanks for starting this voting process. I have looked at the FLIP
> and
> > > the
> > > > discussion
> > > > thread. These options added make sense to me.
> > > >
> > > > +1 from my side.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Kostas Kloudas  于2019年10月23日周三 下午4:12写道:
> > > >
> > > >> Hi all,
> > > >>
> > > >> This is the voting thread for FLIP-81, as the title says.
> > > >>
> > > >> The FLIP can be found in [1] and the discussion thread in [2].
> > > >>
> > > >> As per the bylaws, the vote will stay open until Friday 26-10 (3
> days)
> > > >> or until it gets 3 votes.
> > > >>
> > > >> Thank you,
> > > >> Kostas
> > > >>
> > > >> [1]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=133631524
> > > >> [2]
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/a4dd8e0c7b79350c59f5afefc1bc583dac99abcf94caaa8c22017974@%3Cdev.flink.apache.org%3E
> > > >>
> > >
> > >
> >
>


-- 
Best Regards

Jeff Zhang


Re: [VOTE] FLIP-81: Executor-related new ConfigOptions

2019-10-24 Thread vino yang
+1 (non-binding)

Best,
Vino

Yang Wang  于2019年10月23日周三 下午9:05写道:

> We could benefit a lot from unifying the cli options and config options.
>
> +1(non-binding)
>
> Best,
> Yang
>
> Aljoscha Krettek  于2019年10月23日周三 下午5:24写道:
>
> > +1 (binding)
> >
> > Aljoscha
> >
> > > On 23. Oct 2019, at 10:41, Zili Chen  wrote:
> > >
> > > Thanks for starting this voting process. I have looked at the FLIP and
> > the
> > > discussion
> > > thread. These options added make sense to me.
> > >
> > > +1 from my side.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Kostas Kloudas  于2019年10月23日周三 下午4:12写道:
> > >
> > >> Hi all,
> > >>
> > >> This is the voting thread for FLIP-81, as the title says.
> > >>
> > >> The FLIP can be found in [1] and the discussion thread in [2].
> > >>
> > >> As per the bylaws, the vote will stay open until Friday 26-10 (3 days)
> > >> or until it gets 3 votes.
> > >>
> > >> Thank you,
> > >> Kostas
> > >>
> > >> [1]
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=133631524
> > >> [2]
> > >>
> >
> https://lists.apache.org/thread.html/a4dd8e0c7b79350c59f5afefc1bc583dac99abcf94caaa8c22017974@%3Cdev.flink.apache.org%3E
> > >>
> >
> >
>


[jira] [Created] (FLINK-14517) Handle escape characters for partition path

2019-10-24 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-14517:


 Summary: Handle escape characters for partition path
 Key: FLINK-14517
 URL: https://issues.apache.org/jira/browse/FLINK-14517
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / FileSystem
Reporter: Jingsong Lee
 Fix For: 1.10.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14516) Remove non credit based network code

2019-10-24 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-14516:
--

 Summary: Remove non credit based network code
 Key: FLINK-14516
 URL: https://issues.apache.org/jira/browse/FLINK-14516
 Project: Flink
  Issue Type: Task
  Components: Runtime / Network
Affects Versions: 1.9.0
Reporter: Piotr Nowojski


After [a survey on the dev mailing 
list|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Dropping-non-Credit-based-Flow-Control-td33714.html]
 the feedback was that old code path is not used and no longer needed. Because 
of that we should be safe to drop it and make credit based flow control the 
only option (currently it's the default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [SURVEY] Dropping non Credit-based Flow Control

2019-10-24 Thread Piotr Nowojski
Hi,

Thank you all for the feedback. I’ve created a ticket [1] to remove the non 
Credit-based Flow Control code paths.

Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-14516 


> On 23 Oct 2019, at 10:59, Nico Kruber  wrote:
> 
> +1
> 
> I have not heard of a real-world use-case that suffered more than it
> gained and also think it is time to remove the old paths.
> 
> There are, however, still improvements to be made in credit-based flow
> control (like [1]) but that should not stop us from removing the old
> paths if no-one is really using them anyway
> 
> 
> Nico
> 
> [1] https://issues.apache.org/jira/browse/FLINK-10742
> 
> On 21/10/2019 03:14, SHI Xiaogang wrote:
>> +1
>> 
>> Credit-based flow control has long been used in our production environment
>> as well. It works fine and there seems no reason to use non credit-based
>> implementation.
>> 
>> Regards,
>> Xiaogang
>> 
>> Zhu Zhu  于2019年10月19日周六 下午3:01写道:
>> 
>>> +1 to drop the non credit-based flow control.
>>> We have turned to credit-based flow control for long in production. It has
>>> been good for all our cases.
>>> The non credit-based flow control code has been a burden when we are trying
>>> to change the network stack code for new features.
>>> 
>>> Thanks,
>>> Zhu Zhu
>>> 
>>> 
>>> Biao Liu  于2019年10月10日周四 下午5:45写道:
>>> 
 Thanks for start this survey, Piotr.
 
 We have benefitted from credit-based flow control a lot. I can't figure
>>> out
 a reason to use non credit-based model.
 I think we have kept the older code paths long enough (1.5 -> 1.9).
>>> That's
 a big burden to maintain. Especially there are a lot duplicated codes
 between credit-based and non credit-based model.
 
 So +1 to do the cleanup.
 
 Thanks,
 Biao /'bɪ.aʊ/
 
 
 
 On Thu, 10 Oct 2019 at 11:15, zhijiang >>> .invalid>
 wrote:
 
> Thanks for bringing this survey Piotr.
> 
> Actually I was also trying to dropping the non credit-based code path
 from
> release-1.9, and now I think it is the proper time to do it motivated
>>> by
> [3].
> The credit-based mode is as default from Flink 1.5 and it has been
> verified to be stable and reliable in many versions. In Alibaba we are
> always using the default credit-based mode in all products.
> It can reduce much overhead of maintaining non credit-based code path,
>>> so
> +1 from my side to drop it.
> 
> Best,
> Zhijiang
> --
> From:Piotr Nowojski 
> Send Time:2019年10月2日(星期三) 17:01
> To:dev 
> Subject:[SURVEY] Dropping non Credit-based Flow Control
> 
> Hi,
> 
> In Flink 1.5 we have introduced Credit-based Flow Control [1] in the
> network stack. Back then we were aware about potential downsides of it
 [2]
> and we decided to keep the old model in the code base ( configurable by
> setting  `taskmanager.network.credit-model: false` ). Now, that we are
> about to modify internals of the network stack again [3], it might be a
> good time to clean up the code and remove the older code paths.
> 
> Is anyone still using the non default non Credit-based model (
> `taskmanager.network.credit-model: false`)? If so, why?
> 
> Piotrek
> 
> [1] https://flink.apache.org/2019/06/05/flink-network-stack.html <
> https://flink.apache.org/2019/06/05/flink-network-stack.html>
> [2]
> 
 
>>> https://flink.apache.org/2019/06/05/flink-network-stack.html#what-do-we-gain-where-is-the-catch
> <
> 
 
>>> https://flink.apache.org/2019/06/05/flink-network-stack.html#what-do-we-gain-where-is-the-catch
>> 
> [3]
> 
 
>>> https://lists.apache.org/thread.html/a2b58b7b2b24b9bd4814b2aa51d2fc44b08a919eddbb5b1256be5b6a@%3Cdev.flink.apache.org%3E
> <
> 
 
>>> https://lists.apache.org/thread.html/a2b58b7b2b24b9bd4814b2aa51d2fc44b08a919eddbb5b1256be5b6a@%3Cdev.flink.apache.org%3E
>> 
> 
> 
 
>>> 
>> 
> 
> -- 
> Nico Kruber | Solutions Architect
> 
> Follow us @VervericaData Ververica
> --
> Join Flink Forward - The Apache Flink Conference
> Stream Processing | Event Driven | Real Time
> --
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Tony) Cheng
> 



Re: [DISCUSS] FLIP-70 - Support Computed Column for Flink SQL

2019-10-24 Thread Kurt Young
Hi Danny,

Thanks for preparing this design document. IMO It's a very useful
feature, especially combined with time attribute support to specify
watermark in DDL.

The design doc looks quite good, but I would suggest to reduce the
scope of the first version. Like we don't have to support "STORED"
in the first MVP version, and you can also delete related content in
document to make it more clean and easier to understand.

Best,
Kurt


On Tue, Sep 17, 2019 at 9:18 PM Qi Luo  wrote:

> Fantastic! We're also very interested in this feature.
>
> +Boxiu
>
> On Tue, Sep 17, 2019 at 11:31 AM Danny Chan  wrote:
>
> > In umbrella task FLINK-10232 we have introduced CREATE TABLE grammar in
> > our new module flink-sql-parser. And we proposed to use computed column
> to
> > describe the time attribute of process time in the design doc FLINK SQL
> > DDL, so user may create a table with process time attribute as follows:
> > create table T1(
> >   a int,
> >   b bigint,
> >   c varchar,
> >   d as PROCTIME,
> > ) with (
> >   'k1' = 'v1',
> >   'k2' = 'v2'
> > );
> >
> > The column d would be a process time attribute for table T1.
> >
> > Besides that, computed  columns have several other use cases, such as
> > these [2]:
> >
> >
> > • Virtual generated columns can be used as a way to simplify and unify
> > queries. A complicated condition can be defined as a generated column and
> > referred to from multiple queries on the table to ensure that all of them
> > use exactly the same condition.
> > • Stored generated columns can be used as a materialized cache for
> > complicated conditions that are costly to calculate on the fly.
> > • Generated columns can simulate functional indexes: Use a generated
> > column to define a functional expression and index it. This can be useful
> > for working with columns of types that cannot be indexed directly, such
> as
> > JSON columns.
> > • For stored generated columns, the disadvantage of this approach is that
> > values are stored twice; once as the value of the generated column and
> once
> > in the index.
> > • If a generated column is indexed, the optimizer recognizes query
> > expressions that match the column definition and uses indexes from the
> > column as appropriate during query execution(Not supported yet).
> >
> >
> >
> > Computed columns are introduced in SQL-SERVER-2016 [1], MYSQL-5.6 [2] and
> > ORACLE-11g [3].
> >
> > This is the design doc:
> >
> >
> https://docs.google.com/document/d/110TseRtTCphxETPY7uhiHpu-dph3NEesh3mYKtJ7QOY/edit?usp=sharing
> >
> > Any suggestions are appreciated, thanks.
> >
> > [1]
> >
> https://docs.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-2016
> > [2]
> >
> https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html
> > [3] https://oracle-base.com/articles/11g/virtual-columns-11gr1
> >
> > Best,
> > Danny Chan
> >
>


Re: [DISCUSS] FLIP-75: Flink Web UI Improvement Proposal

2019-10-24 Thread lining jing
Hi all, I have updated the backend design in FLIP-75

.

Here are some brief introductions:

   - Add metric for manage memory FLINK-14406
   .
   - Expose TaskExecutor resource configurations to REST API FLINK-14422
   .
   - Add TaskManagerResourceInfo in TaskManagerDetailsInfo to show
   TaskManager Resource FLINK-14435
   .

I will continue to update the rest part of the backend design in the doc,
let's keep discuss here, any feedback is appreciated.

Yadong Xie  于2019年9月27日周五 上午10:13写道:

> Hi all
>
> Flink Web UI is the main platform for most users to monitor their jobs and
> clusters. We have reconstructed Flink web in 1.9.0 version, but there are
> still some shortcomings.
>
> This discussion thread aims to provide a better experience for Flink UI
> users.
>
> Here is the design doc I drafted:
>
>
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
>
>
> The FLIP can be found at [2].
>
> Please keep the discussion here, in the mailing list.
>
> Looking forward to your opinions, any feedbacks are welcome.
>
> [1]:
>
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> <
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit#
> >
> [2]:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-75%3A+Flink+Web+UI+Improvement+Proposal
>