[jira] [Created] (FLINK-14342) Remove method FunctionDefinition#getLanguage

2019-10-07 Thread Dian Fu (Jira)
Dian Fu created FLINK-14342:
---

 Summary: Remove method FunctionDefinition#getLanguage 
 Key: FLINK-14342
 URL: https://issues.apache.org/jira/browse/FLINK-14342
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python, Table SQL / API
Affects Versions: 1.10.0
Reporter: Dian Fu
 Fix For: 1.10.0


As discussed in 
[https://github.com/apache/flink/pull/9748#discussion_r329575228], all the 
Python *ScalarFunction* need to implement the interface *PythonFunction* and we 
can determine if a ScalarFunction is Python or not via if it's an instance of 
*PythonFunction*, the method *FunctionDefinition#getLanguage* is not needed any 
more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14341) Flink-python builds with failure: no such option: --prefix

2019-10-07 Thread liupengcheng (Jira)
liupengcheng created FLINK-14341:


 Summary: Flink-python builds with failure: no such option: --prefix
 Key: FLINK-14341
 URL: https://issues.apache.org/jira/browse/FLINK-14341
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.9.0
 Environment: Command: mvn clean install -DskipTests

pip: 7.1.2

python: 2.7.9
Reporter: liupengcheng


{code:java}
[INFO] --- protoc-jar-maven-plugin:3.7.1:run (default) @ flink-python_2.11 ---
Downloading from nexus: 
http://nexus.d.xiaomi.net/nexus/content/groups/public/com/github/os72/protoc-jar/3.7.1/protoc-jar-3.7.1.jar
Downloaded from nexus: 
http://nexus.d.xiaomi.net/nexus/content/groups/public/com/github/os72/protoc-jar/3.7.1/protoc-jar-3.7.1.jar
 (10 MB at 29 MB/s)
[INFO] Protoc version: 3.7.1
protoc-jar: protoc version: 3.7.1, detected platform: linux-x86_64 (linux/amd64)
protoc-jar: embedded: bin/3.7.1/protoc-3.7.1-linux-x86_64.exe
protoc-jar: executing: [/tmp/protocjar8378491914719706170/bin/protoc.exe, 
--version]
libprotoc 3.7.1
[INFO] Protoc command: /tmp/protocjar8378491914719706170/bin/protoc.exe
[INFO] Input directories:
[INFO] /home/liupengcheng/work/git/xiaomi/flink/flink-python/pyflink/proto
[INFO] Output targets:
[INFO] java: 
/home/liupengcheng/work/git/xiaomi/flink/flink-python/target/generated-sources 
(add: main, clean: false, plugin: null, outputOptions: null)
[INFO] 
/home/liupengcheng/work/git/xiaomi/flink/flink-python/target/generated-sources 
does not exist. Creating...
[INFO] Processing (java): flink-fn-execution.proto
protoc-jar: executing: [/tmp/protocjar8378491914719706170/bin/protoc.exe, 
-I/home/liupengcheng/work/git/xiaomi/flink/flink-python/pyflink/proto, 
--java_out=/home/liupengcheng/work/git/xiaomi/flink/flink-python/target/generated-sources,
 
/home/liupengcheng/work/git/xiaomi/flink/flink-python/pyflink/proto/flink-fn-execution.proto]
[INFO] Adding generated sources (java): 
/home/liupengcheng/work/git/xiaomi/flink/flink-python/target/generated-sources
[INFO] 
[INFO] --- exec-maven-plugin:1.5.0:exec (Protos Generation) @ flink-python_2.11 
---
/home/liupengcheng/work/git/xiaomi/flink/flink-python/pyflink/gen_protos.py:49: 
UserWarning: Installing grpcio-tools is recommended for development.
  warnings.warn('Installing grpcio-tools is recommended for development.')
WARNING:root:Installing grpcio-tools into 
/home/liupengcheng/work/git/xiaomi/flink/flink-python/pyflink/../.eggs/grpcio-wheelsUsage:
   
  /usr/local/bin/python -m pip install [options]  
[package-index-options] ...
  /usr/local/bin/python -m pip install [options] -r  
[package-index-options] ...
  /usr/local/bin/python -m pip install [options] [-e]  ...
  /usr/local/bin/python -m pip install [options] [-e]  ...
  /usr/local/bin/python -m pip install [options]  ...no such 
option: --prefix
Process Process-1:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/multiprocessing/process.py", line 258, in 
_bootstrap
self.run()
  File "/usr/local/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
  File 
"/home/liupengcheng/work/git/xiaomi/flink/flink-python/pyflink/gen_protos.py", 
line 126, in _install_grpcio_tools_and_generate_proto_files
'--upgrade', GRPC_TOOLS, "-I"])
  File "/usr/local/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['/usr/local/bin/python', '-m', 'pip', 'install', 
'--prefix', 
'/home/liupengcheng/work/git/xiaomi/flink/flink-python/pyflink/../.eggs/grpcio-wheels',
 '--build', 
'/home/liupengcheng/work/git/xiaomi/flink/flink-python/pyflink/../.eggs/grpcio-wheels-build',
 '--upgrade', 'grpcio-tools>=1.3.5,<=1.14.2', '-I']' returned non-zero exit 
status 2
Traceback (most recent call last):
  File 
"/home/liupengcheng/work/git/xiaomi/flink/flink-python/pyflink/gen_protos.py", 
line 146, in 
generate_proto_files(force=True)
  File 
"/home/liupengcheng/work/git/xiaomi/flink/flink-python/pyflink/gen_protos.py", 
line 91, in generate_proto_files
raise ValueError("Proto generation failed (see log for details).")
ValueError: Proto generation failed (see log for details).
[ERROR] Command execution failed.

{code}
I find the root cause from the docs: [https://pip.pypa.io/en/stable/news/] , I 
think it's because the `–prefix` option is only supported since pip v8.0.

In flink build docs, there are no description about the pip version, so I think 
here we can use the `--install-option="–prefix=xxx"` for better compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

2019-10-07 Thread Jark Wu
Hi Bowen,

Thanks for the proposal. I have two thoughts:

1) Regarding to "loadModule", how about
tableEnv.loadModule("xxx" [, propertiesMap]);
tableEnv.unloadModule(“xxx”);

This makes the API similar to SQL. IMO, instance of Module is not needed
and verbose as parameter.
And this makes it easier to load a simple module without any additional
properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")

2) In current design, the module interface only defines function metadata,
but no implementations.
I'm wondering how to call/map the implementations in runtime? Am I missing
something?

Besides, I left some minor comments in the doc.

Best,
Jark


On Sat, 5 Oct 2019 at 08:42, Xuefu Z  wrote:

> I agree with Timo that the new table APIs need to be consistent. I'd go
> further that an name (or id) is needed for module definition in YAML file.
> In the current design, name is skipped and type has binary meanings.
>
> Thanks,
> Xuefu
>
> On Fri, Oct 4, 2019 at 5:24 AM Timo Walther  wrote:
>
> > Hi everyone,
> >
> > first, I was also questioning my proposal. But Bowen's proposal of
> > `tEnv.offloadToYaml()` would not work with the current design
> > because we don't know how to serialize a catalog or module into
> > properties. Currently, there is no converter from instance to
> > properties. It is a one way conversion. We can add a `toProperties`
> > method to both Catalog and Module class in the future to solve this.
> > Solving the table environment serializability can be future work.
> >
> > However, I find the current proposal for the TableEnvironment methods is
> > contradicting:
> >
> > tableEnv.loadModule(new Yyy());
> > tableEnv.unloadModule(“Xxx”);
> >
> > The loading is specified programmatically whereas the unloading requires
> > a string that is not specified in the module itself. But is defined in
> > the factory according to the current design.
> >
> > SQL does it more consistently. There, the name `xxx` is used when
> > loading and unloading the module:
> >
> > LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> > UNLOAD MODULE 'xxx’
> >
> > How about:
> >
> > tableEnv.loadModule("xxx", new Yyy());
> > tableEnv.unloadModule(“xxx”);
> >
> > This would be similar to the catalog interfaces. The name is not part of
> > the instance itself.
> >
> > What do you think?
> >
> > Thanks,
> > Timo
> >
> >
> >
> >
> > On 01.10.19 21:17, Bowen Li wrote:
> > > If something like the yaml file is the way to go and achieve such
> > > motivation, we would cover that with current design.
> > >
> > > On Tue, Oct 1, 2019 at 12:05 Bowen Li  wrote:
> > >
> > >> Hi Timo, Dawid,
> > >>
> > >> I've added the suggested SQL and related changes to TableEnvironment
> API
> > >> and other classes to the google doc. Also removed "USE MODULE" and its
> > >> APIs. Will update FLIP wiki once we have a consensus.
> > >>
> > >> W.r.t. descriptor approach, my gut feeling is similar to Dawid's.
> > Besides,
> > >> I feel yaml file would be a better solution to persist serializable
> > state
> > >> of an environment as the file itself is in serializable format
> already.
> > >> Though yaml file only serves SQL CLI at this moment, we may be able to
> > >> extend its reach to Table API and allow users to load/offload a
> > >> TableEnvironment from/to yaml files, as something like
> "TableEnvironment
> > >> tEnv = TableEnvironment.loadFromYaml()" and
> > >> "tEnv.offloadToYaml()" to restore and persist state, and
> try
> > to
> > >> make yaml file more expressive.
> > >>
> > >>
> > >> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <
> dwysakow...@apache.org
> > >
> > >> wrote:
> > >>
> > >>> Hi Timo, Bowen,
> > >>>
> > >>> Unfortunately I did not have enough time to go through all the
> > >>> suggestions in details so I can not comment on the whole FLIP.
> > >>>
> > >>> I just wanted to give my opinion on the "descriptor approach in
> > >>> loadModule" part. I am not sure if we need it here. We might be
> > >>> overthinking this a bit. It definitely makes sense for objects like
> > >>> TableSource/TableSink etc. as they are logical definitions that
> nearly
> > >>> always have to be persisted in a Catalog. I'm not sure if we really
> > need
> > >>> the same for a whole session. If we need a resume session feature,
> the
> > >>> way to go would probably be to keep the session in memory on the
> server
> > >>> side. I fear we will never be able to serialize the whole session
> > >>> entirely (temporary objects, objects derived from DataStream etc.)
> > >>>
> > >>> I think it is ok to use instances for objects like Catalogs or
> Modules
> > >>> and have an overlay on top of that that can create instances from
> > >>> properties.
> > >>>
> > >>> Best,
> > >>>
> > >>> Dawid
> > >>>
> > >>> On 01/10/2019 11:28, Timo Walther wrote:
> >  Hi Bowen,
> > 
> >  thanks for your response.
> > 
> >  Re 2) I also don't have a better approach for this issue. It is
> >  similar to changing the general TableConfig between two 

[jira] [Created] (FLINK-14340) Specify an unique DFSClient name for Hadoop FileSystem

2019-10-07 Thread Congxian Qiu(klion26) (Jira)
Congxian Qiu(klion26) created FLINK-14340:
-

 Summary: Specify an unique DFSClient name for Hadoop FileSystem
 Key: FLINK-14340
 URL: https://issues.apache.org/jira/browse/FLINK-14340
 Project: Flink
  Issue Type: Improvement
  Components: FileSystems
Reporter: Congxian Qiu(klion26)
 Fix For: 1.10.0


Currently, when Flink read/write to HDFS, we do not set the DFSClient name for 
all the connections, so we can’t distinguish the connections, and can’t find 
the specific Job or TM quickly.

This issue wants to add the {{container_id}} as a unique name when init Hadoop 
File System, so we can easily distinguish the connections belongs to which 
Job/TM.

 

Core changes is add a line such as below in 
{{org.apache.flink.runtime.fs.hdfs.HadoopFsFactory#create}}

 
{code:java}
hadoopConfig.set(“mapreduce.task.attempt.id”, 
System.getenv().getOrDefault(CONTAINER_KEY_IN_ENV, DEFAULT_CONTAINER_ID));{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-75: Flink Web UI Improvement Proposal

2019-10-07 Thread Xintong Song
2. Sounds good to me.

4. If that is the case, I would suggest to make a large default page size,
so incase of huge log data we have less large pages rather than lots of
small pages.

Thank you~

Xintong Song



On Tue, Oct 8, 2019 at 11:03 AM Yadong Xie  wrote:

> Hi Xintong Song
>
> 2. We could switch between the detailed mode(including cpu, task heap,
> task off-heap, shuffle, on-heap managed, off-heap managed) and the summary
> mode(only including cpu and mem), which is very easy to do in UI design.
>
> 4. I think the key point is not pagination in Web UI but the REST API will
> totally *break* without pagination in current design mode.
> In my opinion, pagination is better than nothing, the pagination is a
> solution to keep log API work, and it would be great if there is another
> way to keep it work with huge log data.
>
> Xintong Song  于2019年9月30日周一 下午7:19写道:
>
> > @Yadong
> >
> > 2. I agree that we can update the task executor ui after flip-56 is done.
> > But I would suggest keep it on discussion to come up with a proper ui
> > design for task executor resources. I don't think the mentioned image
> from
> > flip-56 is a good choice. That image is a simplified figure with cpu and
> > total memory only, for the purpose of demonstrating dynamic slot
> > allocation. In fact, there are 6 fields to be displayed (cpu, task heap,
> > task off-heap, shuffle, on-heap managed, off-heap managed). If we display
> > cpu and total memory only, then user will be confused when seeing a task
> > executor with enough remaining resources but tasks cannot be deployed
> onto
> > it (because the desired type of memory might be used up).
> >
> > 4. I've been using blink webui, which already have log pagination. It's
> > quite common that we need do search for some keywords (e.g., exception,
> > error, warning) from a large amount of logs for diagnosing problems. I
> find
> > it very inconvenient that I have to click into each page searching for
> the
> > keywords, and I'd rather take the effort to find the original log files
> > from the filesystem to view the log. Personally speaking, if the keyword
> > searching cannot be supported, I would prefer to take some time loading
> the
> > non-paginated logs over than paginated ones. Or we may at least have a
> > button on the webui for switching between the two alternatives.
> >
> > @Till
> >
> > Thanks for the inputs.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Mon, Sep 30, 2019 at 5:55 PM Till Rohrmann 
> > wrote:
> >
> > > For 3. At the moment the log and stdout file serving requires the
> > > TaskExecutor to be running. But in some scenarios when having a NFS, it
> > > should be enough to know where the file is located. However, this
> > > assumption does not hold in the general case.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Mon, Sep 30, 2019 at 11:43 AM Yadong Xie 
> wrote:
> > >
> > > > Hi Xintong Song
> > > >
> > > > Thanks for your comments!
> > > >
> > > > 1. I think it is a good idea that to align CPU and memory usage with
> > > > FLIP-49 if it will release in version 1.10
> > > > 2. We can update the task executor UI design after FLIP-56 merged
> into
> > > > master. Actually, the image
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/download/attachments/125309297/BlinkResourceTM.png?version=1=1566223821000=v2
> > > > >
> > > > in FLIP-56 is a good UI design, we can follow it in the Flink web.
> > > > 3. No idea about it, maybe anyone famailar with the runtime part
> could
> > > > answer it? but it would be great to add it to the web UI in my
> opinion.
> > > > 4. I'm not sure will keyword searching across all the pages may cost
> > too
> > > > many resources in job manager, but I think it would be very useful if
> > the
> > > > REST API could support it.
> > > >
> > > > Best,
> > > > Yadong
> > > >
> > > > Xintong Song  于2019年9月29日周日 下午8:11写道:
> > > >
> > > > > Thanks for drafting the FLIP and starting this discussion, Yadong.
> > > > >
> > > > >
> > > > > I have some comments:
> > > > >
> > > > >
> > > > >- I can see that the proposed memory and cpu usage to be
> displayed
> > > (in
> > > > >section 1.1) are aligned with the current ResourceProfile
> fields.
> > > > > However,
> > > > >we are working on changing the memory fields in 1.10 with
> FLIP-49
> > > > [1]. I
> > > > >suggest we align the UI design with the new FLIP-49 memory
> fields.
> > > > >- The task executor overview design (in section 1.2) is based on
> > the
> > > > >current slot model. The coming FLIP-56 [2] which is also planned
> > for
> > > > > 1.10
> > > > >is changing the model so that task executors no longer have
> fixed
> > > > > number of
> > > > >slots, but allocated slots (may have different resources) and
> > > > available
> > > > >resources.
> > > > >   - I can see that there's discussions in the google doc about
> > > using
> > > > >   different color for available resources. However, the
> resource

Re: [DISCUSS] FLIP-75: Flink Web UI Improvement Proposal

2019-10-07 Thread Yadong Xie
Hi Xintong Song

2. We could switch between the detailed mode(including cpu, task heap,
task off-heap, shuffle, on-heap managed, off-heap managed) and the summary
mode(only including cpu and mem), which is very easy to do in UI design.

4. I think the key point is not pagination in Web UI but the REST API will
totally *break* without pagination in current design mode.
In my opinion, pagination is better than nothing, the pagination is a
solution to keep log API work, and it would be great if there is another
way to keep it work with huge log data.

Xintong Song  于2019年9月30日周一 下午7:19写道:

> @Yadong
>
> 2. I agree that we can update the task executor ui after flip-56 is done.
> But I would suggest keep it on discussion to come up with a proper ui
> design for task executor resources. I don't think the mentioned image from
> flip-56 is a good choice. That image is a simplified figure with cpu and
> total memory only, for the purpose of demonstrating dynamic slot
> allocation. In fact, there are 6 fields to be displayed (cpu, task heap,
> task off-heap, shuffle, on-heap managed, off-heap managed). If we display
> cpu and total memory only, then user will be confused when seeing a task
> executor with enough remaining resources but tasks cannot be deployed onto
> it (because the desired type of memory might be used up).
>
> 4. I've been using blink webui, which already have log pagination. It's
> quite common that we need do search for some keywords (e.g., exception,
> error, warning) from a large amount of logs for diagnosing problems. I find
> it very inconvenient that I have to click into each page searching for the
> keywords, and I'd rather take the effort to find the original log files
> from the filesystem to view the log. Personally speaking, if the keyword
> searching cannot be supported, I would prefer to take some time loading the
> non-paginated logs over than paginated ones. Or we may at least have a
> button on the webui for switching between the two alternatives.
>
> @Till
>
> Thanks for the inputs.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Sep 30, 2019 at 5:55 PM Till Rohrmann 
> wrote:
>
> > For 3. At the moment the log and stdout file serving requires the
> > TaskExecutor to be running. But in some scenarios when having a NFS, it
> > should be enough to know where the file is located. However, this
> > assumption does not hold in the general case.
> >
> > Cheers,
> > Till
> >
> > On Mon, Sep 30, 2019 at 11:43 AM Yadong Xie  wrote:
> >
> > > Hi Xintong Song
> > >
> > > Thanks for your comments!
> > >
> > > 1. I think it is a good idea that to align CPU and memory usage with
> > > FLIP-49 if it will release in version 1.10
> > > 2. We can update the task executor UI design after FLIP-56 merged into
> > > master. Actually, the image
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/download/attachments/125309297/BlinkResourceTM.png?version=1=1566223821000=v2
> > > >
> > > in FLIP-56 is a good UI design, we can follow it in the Flink web.
> > > 3. No idea about it, maybe anyone famailar with the runtime part could
> > > answer it? but it would be great to add it to the web UI in my opinion.
> > > 4. I'm not sure will keyword searching across all the pages may cost
> too
> > > many resources in job manager, but I think it would be very useful if
> the
> > > REST API could support it.
> > >
> > > Best,
> > > Yadong
> > >
> > > Xintong Song  于2019年9月29日周日 下午8:11写道:
> > >
> > > > Thanks for drafting the FLIP and starting this discussion, Yadong.
> > > >
> > > >
> > > > I have some comments:
> > > >
> > > >
> > > >- I can see that the proposed memory and cpu usage to be displayed
> > (in
> > > >section 1.1) are aligned with the current ResourceProfile fields.
> > > > However,
> > > >we are working on changing the memory fields in 1.10 with FLIP-49
> > > [1]. I
> > > >suggest we align the UI design with the new FLIP-49 memory fields.
> > > >- The task executor overview design (in section 1.2) is based on
> the
> > > >current slot model. The coming FLIP-56 [2] which is also planned
> for
> > > > 1.10
> > > >is changing the model so that task executors no longer have fixed
> > > > number of
> > > >slots, but allocated slots (may have different resources) and
> > > available
> > > >resources.
> > > >   - I can see that there's discussions in the google doc about
> > using
> > > >   different color for available resources. However, the resource
> > > > availability
> > > >   for different fields can be different, and may not be simply
> > > > displayed by a
> > > >   different color. E.g., a task executor may have two slot, while
> > > slot
> > > > 1
> > > >   takes (20% cpu, 10% heap mem, 50% managed mem, etc.), slot 2
> > takes
> > > > (10%
> > > >   cpu,  35% heap mem, 0% managed mem etc.), and the remaining
> > > > resources in
> > > >   the task executor are (70% cpu, 55% heap mem, 50% managed mem,
> > > > etc.). How
> > > >

[jira] [Created] (FLINK-14339) The checkpoint ID count wrong on restore savepoint log

2019-10-07 Thread king's uncle (Jira)
king's uncle created FLINK-14339:


 Summary: The checkpoint ID count wrong on restore savepoint log
 Key: FLINK-14339
 URL: https://issues.apache.org/jira/browse/FLINK-14339
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.8.0
Reporter: king's uncle


I saw the below log when I tested Flink restore from the savepoint.
{code:java}
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - 
Recovering checkpoints from ZooKeeper.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Found 0 
checkpoints in ZooKeeper.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Trying 
to fetch 0 checkpoints from storage.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Starting job 
 from savepoint 
/nfsdata/ecs/flink-savepoints/flink-savepoint-test//201910080158/savepoint-00-003c9b080832
 (allowing non restored state)
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Reset the 
checkpoint ID of job  to 12285.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - 
Recovering checkpoints from ZooKeeper.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Found 1 
checkpoints in ZooKeeper.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Trying 
to fetch 1 checkpoints from storage.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Trying 
to retrieve checkpoint 12284.
[flink-akka.actor.default-dispatcher-2] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Restoring job 
 from latest valid checkpoint: Checkpoint 12284 
@ 0 for .
{code}
You can find the final resotre checkpoint ID is 12284, but we can see the log 
print "Reset the checkpoint ID of job  to 
12285". So, I checked the source code.
{code:java}
// Reset the checkpoint ID counter
long nextCheckpointId = savepoint.getCheckpointID() + 1;
checkpointIdCounter.setCount(nextCheckpointId);

LOG.info("Reset the checkpoint ID of job {} to {}.", job, nextCheckpointId);
{code}
I think they should print a checkpoint ID instead of the next checkpoint ID.
{code:java}
LOG.info("Reset the checkpoint ID of job {} to {}.", job, 
savepoint.getCheckpointID());
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release 1.9.1, release candidate #1

2019-10-07 Thread Jark Wu
Hi Zili,

Thanks for reminding me this, because of the Chinese National Day and Flink
Forward Europe,
we didn't receive any verification on the 1.9.1 RC1. And I guess we have to
extend the voting time after Flink Forward.
So I'm fine to have FLINK-14315 and rebuild another RC. What do you think
@Till @Jincheng?

I guess FLINK-14315 will be merged soon as it is approved 4 days ago? Could
you help to merge it once it is passed ? @Zili Chen 

Best,
Jark

On Tue, 8 Oct 2019 at 09:14, Zili Chen  wrote:

> Hi Jark,
>
> I notice a critical bug[1] is marked resolved in 1.9.1 but given 1.9.1
> has been cut I'd like to throw the issue here so that we're sure
> whether or not it is included in 1.9.1.
>
> Best,
> tison.
>
> [1] https://issues.apache.org/jira/browse/FLINK-14315
>
>
> Jark Wu  于2019年9月30日周一 下午3:25写道:
>
>>  Hi everyone,
>>
>> Please review and vote on the release candidate #1 for the version 1.9.1,
>> as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> The complete staging area is available for your review, which includes:
>> * JIRA release notes [1],
>> * the official Apache source release and binary convenience releases to be
>> deployed to dist.apache.org [2], which are signed with the key with
>> fingerprint E2C45417BED5C104154F341085BACB5AEFAE3202 [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "release-1.9.1-rc1" [5],
>> * website pull request listing the new release and adding announcement
>> blog
>> post [6].
>>
>> The vote will be open for at least 72 hours.
>> Please cast your votes before *Oct. 3th 2019, 08:00 UTC*.
>>
>> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>> Jark
>>
>> [1]
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346003
>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.1-rc1/
>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapacheflink-1272/
>> [5]
>>
>> https://github.com/apache/flink/commit/4d56de81cb692c68a7d1dbfff13087a5079a8252
>> [6] https://github.com/apache/flink-web/pull/274
>>
>


Re: [COMMITTER] repo locked due to synchronization issues

2019-10-07 Thread Jark Wu
Thanks Rong for updating the wiki pages. 

Best,
Jark

> 在 2019年10月7日,18:37,Till Rohrmann  写道:
> 
> Thanks for updating the wiki pages Rong. Looks good to me.
> 
> Cheers,
> Till
> 
> On Sun, Oct 6, 2019 at 7:21 PM Rong Rong  wrote:
> 
>> Thanks everyone for the suggestions. I think the majority of us voted for
>> using Github repo.
>> 
>> I have updated the merging pull request wiki page [1] with the new
>> instructions.
>> In addition, I have also added some general information I found useful when
>> setting up my committer account [2], including how to use Github.
>> 
>> Please kindly take a look. Any comments or suggestions are highly
>> appreciated!
>> 
>> --
>> Rong
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
>> [2]
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/General+Information+for+Committers#GeneralInformationforCommitters-UsingGithub
>> 
>> On Mon, Sep 30, 2019 at 2:42 AM Till Rohrmann 
>> wrote:
>> 
>>> Thanks Rong for volunteering. From my side +1 for using Github repo.
>>> 
>>> Cheers,
>>> Till
>>> 
>>> On Fri, Sep 27, 2019 at 9:58 PM Thomas Weise  wrote:
>>> 
 +1 for recommendation to use the github repo
 
 Thanks,
 Thomas
 
 On Fri, Sep 27, 2019 at 9:29 AM Rong Rong  wrote:
 
> +1 on to state with one recommendation method in the wiki.
> I haven't encountered this often, so I do not have a preference
>>> regarding
> which way to go (Gitbox or Github). However, I've experienced some
> different issues on both Github and Gitbox when setting up new
>>> committer
> credentials.
> 
> If possible, I would like to volunteer to help updating the wiki page
 with
> recommendations once we decided which way to go, along with some
>> useful
> resources I found from other ASF project wikis.
> 
> Thanks,
> Rong
> 
> 
> On Wed, Sep 25, 2019 at 4:41 AM Jark Wu  wrote:
> 
>> +1 to sticking with Github repo.
>> I have encountered once that pushing to ASF repo is blocked in
>> China,
 but
>> GitHub always works for me and has better performance.
>> And this keeps the awesome green button for merging PRs from GitHub
>>> UI.
>> 
>> Best,
>> Jark
>> 
>> On Wed, 25 Sep 2019 at 15:57, Till Rohrmann 
> wrote:
>> 
>>> I think it makes sense to state a recommendation in the wiki.
>>> 
>>> If we decide on something I would be in favour of pushing to
>> Github
>> because
>>> settings credentials up (private keys and signing keys) can
>> happen
>>> in
> one
>>> place. Moreover, one can reuse these credentials to push to
> contributors
>> PR
>>> branches which is super helpful. Hence there is no need to
>> interact
> with
>>> two different systems (Gitbox and Github).
>>> 
>>> Cheers,
>>> Till
>>> 
>>> On Tue, Sep 24, 2019 at 8:03 PM Bowen Li 
 wrote:
>>> 
 Thanks everyone for sharing your practices! The problem seems
>> to
>>> be
>> gone
>>> as
 usual and I am able to push to ASF repo.
 
 Verified by ASF INFRA [1], it's indeed caused by the mixed push
> problem
 that Fabian said. Quote from INFRA, "This issue can sometimes
>>> occur
>> when
 people commit conflicting branches at the same time on gitbox
>> vs
>> github,
>>> so
 we recommend that projects stick with one or the other for
 commits."
 
 Though I'm alright with pushing to Github, can we have a
>> single,
>> standard
 way of pushing commits to ASF repo. Right now we don't seem to
>>> have
>> such
>>> a
 standard way according to wiki [2]. The standardization helps
>> to
 not
>> only
 avoid the issue mentioned above, but also eradicate problems
>>> where,
>> IIRC,
 some committers used to forgot reformat commit messages or
>> squash
> PR's
 commits when merging PRs from Github UI.
 
 That's saying, I wonder if we can get consensus on pushing
>>> commits
> only
>>> to
 ASF gitbox repo and disable committers' write access to the
>>> Github
>>> mirror?
 
 [1] https://issues.apache.org/jira/browse/INFRA-18992
 [2]
 
>> 
 
>> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
 
 On Tue, Sep 24, 2019 at 4:44 AM Hequn Cheng <
>>> chenghe...@gmail.com>
>>> wrote:
 
> I met the same problem. Pushing to the GitHub repo directly
>>> works
>> fine
 and
> it seems will resync the two repos.
> 
> Best, Hequn
> 
> On Tue, Sep 24, 2019 at 4:59 PM Fabian Hueske <
>>> fhue...@gmail.com
> 
>>> wrote:
> 
>> Maybe it's a mix of pushing to the ASF repository and
>> Github
>> mirrors?
>> I'm only pushing to the ASF repositories 

Re: Multiple Taskmanagers per node for standalone cluster

2019-10-07 Thread Xintong Song
I don't think using zookeeper should cause any problem on starting multiple
TMs per node.

For standalone cluster, having one TM per node is usually the easiest way.
It is easy to config (simply config the TM resource to the available
resource of the node), and it is more efficient (less TM framework
overhead, no isolation between JVMs of different TMs). But that doesn't
mean Flink can not have multiple TMs per node, and given your requirements
of job isolation, I think you should do that. You just need to carefully
config the TM number and resources to make sure they do not exceed the
node's available resource.

I don't have much experience running standalone clusters with multiple TMs
per node. We mainly run Flink on Yarn in our production, with multiple TMs
(Yarn containers) per node (Yarn NM), and it works fine for us.

Thank you~

Xintong Song



On Mon, Oct 7, 2019 at 9:05 PM Ethan Li  wrote:

> Thank you very much Xintong for the information.
>
> I am currently using zookeeper. I guess starting another task manager will
> just work? But I am not sure if multiple TaskManagers per node is good idea
> to use Flink. Do you have any experience on this? Thanks!
>
> Best,
> Ethan
>
>
> On Oct 5, 2019, at 3:43 AM, Xintong Song  wrote:
>
> For having multiple task managers on the same host, you can put multiple
> duplicated lines with the target host in '/conf/slaves'.
> The task managers on the same host will share the same config files on the
> host.
>
> Thank you~
> Xintong Song
>
>
>
> On Sat, Oct 5, 2019 at 5:02 AM Ethan Li  wrote:
>
>> Hello,
>>
>> Does/did anyone try to set up a standalone cluster with multiple
>> TaskManagers per node? We are working on moving to flink-on-yarn solution.
>> But before that happens,  I am thinking about the following setup to  get
>> jobs isolated from each other
>>
>> 1) multiple taskmanager per host
>> 2) 1 taskSlot per TaskManager
>>
>>
>> Currently we have 1 TaskManger per node and many taskSlot per TM, tasks
>> from different job will be scheduled into one JVM process and it’s
>> basically impossible to debug. One bad job will kill the whole cluster.
>>
>> Could you please share if you have any experience on this and what’re the
>> problems that might have?
>>
>> Thank you very much. Really appreciate it.
>>
>> Best,
>> Ethan
>
>
>


[jira] [Created] (FLINK-14338) Upgrade Calcite version to 1.22 for Flink SQL

2019-10-07 Thread Danny Chen (Jira)
Danny Chen created FLINK-14338:
--

 Summary: Upgrade Calcite version to 1.22 for Flink SQL
 Key: FLINK-14338
 URL: https://issues.apache.org/jira/browse/FLINK-14338
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Affects Versions: 1.10.0
Reporter: Danny Chen


Since FLINK-14092 has been resolved, now we can prepare to upgrade to Calcite 
1.22.

There are some fix code introduced by 1.21 upgrade:
 * Move java files in flink-sql-parser module package org.apache.calcite.sql to 
org.apache.flink.sql.parser.type, since CALCITE-3360 has been resolved
 * Remove SqlFunction from flink-table-planner and flink-table-planner-blink 
since CALCITE-3360 has been resolved



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release 1.9.1, release candidate #1

2019-10-07 Thread Zili Chen
Hi Jark,

I notice a critical bug[1] is marked resolved in 1.9.1 but given 1.9.1
has been cut I'd like to throw the issue here so that we're sure
whether or not it is included in 1.9.1.

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-14315


Jark Wu  于2019年9月30日周一 下午3:25写道:

>  Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 1.9.1,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint E2C45417BED5C104154F341085BACB5AEFAE3202 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.9.1-rc1" [5],
> * website pull request listing the new release and adding announcement blog
> post [6].
>
> The vote will be open for at least 72 hours.
> Please cast your votes before *Oct. 3th 2019, 08:00 UTC*.
>
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Jark
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346003
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.1-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1272/
> [5]
>
> https://github.com/apache/flink/commit/4d56de81cb692c68a7d1dbfff13087a5079a8252
> [6] https://github.com/apache/flink-web/pull/274
>


[jira] [Created] (FLINK-14337) HistoryServerTest.testHistoryServerIntegration failed on Travis

2019-10-07 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-14337:
-

 Summary: HistoryServerTest.testHistoryServerIntegration failed on 
Travis
 Key: FLINK-14337
 URL: https://issues.apache.org/jira/browse/FLINK-14337
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.9.0, 1.8.2, 1.10.0
Reporter: Till Rohrmann


The {{HistoryServerTest.testHistoryServerIntegration}} failed on Travis with

{code}
[ERROR] testHistoryServerIntegration[Flink version less than 1.4: 
false](org.apache.flink.runtime.webmonitor.history.HistoryServerTest)  Time 
elapsed: 10.667 s  <<< FAILURE!
java.lang.AssertionError: expected:<3> but was:<2>
{code}

https://api.travis-ci.org/v3/job/594533358/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14336) Exceptions in AsyncCheckpointRunnable#run are never logged

2019-10-07 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-14336:


 Summary: Exceptions in AsyncCheckpointRunnable#run are never logged
 Key: FLINK-14336
 URL: https://issues.apache.org/jira/browse/FLINK-14336
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.8.0
Reporter: Chesnay Schepler


Exceptions that are thrown in {{AsyncCheckpointRunnable#run}} and subsequently 
being handled in {{AsyncCheckpointRunnable#handleExecutionException}} are never 
logged on the TaskExecutor side, and are only forwarded to the JobMaster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [COMMITTER] repo locked due to synchronization issues

2019-10-07 Thread Till Rohrmann
Thanks for updating the wiki pages Rong. Looks good to me.

Cheers,
Till

On Sun, Oct 6, 2019 at 7:21 PM Rong Rong  wrote:

> Thanks everyone for the suggestions. I think the majority of us voted for
> using Github repo.
>
> I have updated the merging pull request wiki page [1] with the new
> instructions.
> In addition, I have also added some general information I found useful when
> setting up my committer account [2], including how to use Github.
>
> Please kindly take a look. Any comments or suggestions are highly
> appreciated!
>
> --
> Rong
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/General+Information+for+Committers#GeneralInformationforCommitters-UsingGithub
>
> On Mon, Sep 30, 2019 at 2:42 AM Till Rohrmann 
> wrote:
>
> > Thanks Rong for volunteering. From my side +1 for using Github repo.
> >
> > Cheers,
> > Till
> >
> > On Fri, Sep 27, 2019 at 9:58 PM Thomas Weise  wrote:
> >
> > > +1 for recommendation to use the github repo
> > >
> > > Thanks,
> > > Thomas
> > >
> > > On Fri, Sep 27, 2019 at 9:29 AM Rong Rong  wrote:
> > >
> > > > +1 on to state with one recommendation method in the wiki.
> > > > I haven't encountered this often, so I do not have a preference
> > regarding
> > > > which way to go (Gitbox or Github). However, I've experienced some
> > > > different issues on both Github and Gitbox when setting up new
> > committer
> > > > credentials.
> > > >
> > > > If possible, I would like to volunteer to help updating the wiki page
> > > with
> > > > recommendations once we decided which way to go, along with some
> useful
> > > > resources I found from other ASF project wikis.
> > > >
> > > > Thanks,
> > > > Rong
> > > >
> > > >
> > > > On Wed, Sep 25, 2019 at 4:41 AM Jark Wu  wrote:
> > > >
> > > > > +1 to sticking with Github repo.
> > > > > I have encountered once that pushing to ASF repo is blocked in
> China,
> > > but
> > > > > GitHub always works for me and has better performance.
> > > > > And this keeps the awesome green button for merging PRs from GitHub
> > UI.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > On Wed, 25 Sep 2019 at 15:57, Till Rohrmann 
> > > > wrote:
> > > > >
> > > > > > I think it makes sense to state a recommendation in the wiki.
> > > > > >
> > > > > > If we decide on something I would be in favour of pushing to
> Github
> > > > > because
> > > > > > settings credentials up (private keys and signing keys) can
> happen
> > in
> > > > one
> > > > > > place. Moreover, one can reuse these credentials to push to
> > > > contributors
> > > > > PR
> > > > > > branches which is super helpful. Hence there is no need to
> interact
> > > > with
> > > > > > two different systems (Gitbox and Github).
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Tue, Sep 24, 2019 at 8:03 PM Bowen Li 
> > > wrote:
> > > > > >
> > > > > > > Thanks everyone for sharing your practices! The problem seems
> to
> > be
> > > > > gone
> > > > > > as
> > > > > > > usual and I am able to push to ASF repo.
> > > > > > >
> > > > > > > Verified by ASF INFRA [1], it's indeed caused by the mixed push
> > > > problem
> > > > > > > that Fabian said. Quote from INFRA, "This issue can sometimes
> > occur
> > > > > when
> > > > > > > people commit conflicting branches at the same time on gitbox
> vs
> > > > > github,
> > > > > > so
> > > > > > > we recommend that projects stick with one or the other for
> > > commits."
> > > > > > >
> > > > > > > Though I'm alright with pushing to Github, can we have a
> single,
> > > > > standard
> > > > > > > way of pushing commits to ASF repo. Right now we don't seem to
> > have
> > > > > such
> > > > > > a
> > > > > > > standard way according to wiki [2]. The standardization helps
> to
> > > not
> > > > > only
> > > > > > > avoid the issue mentioned above, but also eradicate problems
> > where,
> > > > > IIRC,
> > > > > > > some committers used to forgot reformat commit messages or
> squash
> > > > PR's
> > > > > > > commits when merging PRs from Github UI.
> > > > > > >
> > > > > > > That's saying, I wonder if we can get consensus on pushing
> > commits
> > > > only
> > > > > > to
> > > > > > > ASF gitbox repo and disable committers' write access to the
> > Github
> > > > > > mirror?
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/INFRA-18992
> > > > > > > [2]
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > > > >
> > > > > > > On Tue, Sep 24, 2019 at 4:44 AM Hequn Cheng <
> > chenghe...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > I met the same problem. Pushing to the GitHub repo directly
> > works
> > > > > fine
> > > > > > > and
> > > > > > > > it seems will resync the two repos.
> > > > > > > >
> > > > > > > > Best, Hequn
> > > > > > > >
> > > > > > > > On Tue, Sep 24, 2019 at 4:59 PM Fabian Hueske <
> > fhue...@gmail.com
> > 

Re: [CODE STYLE] Parameters of method are always final

2019-10-07 Thread Arvid Heise
For Piotr's comment. IntelliJ does support adding/removing final parameters
automatically.

You could even automatically add them on save with the Save Actions plugin
[1].

[1] https://plugins.jetbrains.com/plugin/7642-save-actions

On Mon, Oct 7, 2019 at 11:22 AM Piotr Nowojski  wrote:

> After checking out the check style modules mentioned by Tison, I really do
> not see a point of enforcing/adding `final`. With ParameterAssignment [1]
> it’s redundant and will cause problems that I mentioned before.
>
> Additionally enabling ParameterAssignment [1] would be probably much
> easier in our code base compared to enabling FinalParameters [2].
>
> However still I’m not sure if that’s worth our troubles. I would be in
> favour of doing it only If enabling ParameterAssignment [1] doesn’t require
> many changes in our code base.
>
> Piotrek
>
> [1]
> https://checkstyle.sourceforge.io/config_coding.html#ParameterAssignment <
> https://checkstyle.sourceforge.io/config_coding.html#ParameterAssignment>
> [2] https://checkstyle.sourceforge.io/config_misc.html#FinalParameters <
> https://checkstyle.sourceforge.io/config_misc.html#FinalParameters>
>
> > On 7 Oct 2019, at 11:09, Dawid Wysakowicz 
> wrote:
> >
> > Hi all,
> >
> > My quick take on it.
> >
> > 1. If I were to introduce any rule on that I agree with Aljoscha, we
> > should rather enforce the `final` keyword rather than the opposite.
> >
> > 2. I think it does not make sense to enforce rules on such an
> > unimportant (in my opinion) topic. Generally I agree with Piotr that if
> > we want to introduce some rule we should have a way to automatically
> > enforce it.
> >
> > 3. I also agree with Piotr that problems with parameters reassignment
> > appear nearly exclusively in a very long methods. If we break long
> > methods in a shorter ones than there is usually no problem with a
> > parameter reassignment.
> >
> > 4. I would be ok with adding a point to our code style guidelines, but
> > honestly I am a bit skeptical. In the end we are not writing a book on
> > how to write a good software, but we rather list the most important
> > problems that we've seen so far in Flink code base and how to better
> > solve it. I'm not sure that's the case with the parameters reassignment.
> >
> > Best,
> >
> > Dawid
> >
> > On 07/10/2019 10:11, Zili Chen wrote:
> >> Thanks for your thoughts Arvid & Piotr,
> >>
> >> I check out the effect of ParameterAssignment[1] and it does
> >> prevent codes from modifying parameter which I argued above
> >> the most value introduced by `final` modifier of parameter.
> >>
> >> So firstly, I think it's good to enable ParameterAssignment in our
> >> codebase.
> >>
> >> Besides, there is no rule to forbid `final` modifier of parameter.
> >> Instead, there is a rule to enforce `final` modifier[2] but given [1]
> >> enabled it is actually redundant.
> >>
> >> The main purpose for, given enforced not modify parameters, reducing
> >> `final` modifiers of parameter is to remove redundant modifier so that
> we
> >> don't see have declaration like
> >>
> >> T fn(
> >>  final ArgumentType1 argument1,
> >>  final ArgumentType2 argument2,
> >>  ...)
> >>
> >> because we actually don't mark final everywhere so it might make some
> >> confusions.
> >>
> >> Given [1] is enforced these `final` are indeed redundant so that we can
> >> add a convention to reduce `final` modifiers of parameters, which is a
> net
> >> win.
> >>
> >> Best,
> >> tison.
> >>
> >> [1]
> https://checkstyle.sourceforge.io/config_coding.html#ParameterAssignment
> >> [2] https://checkstyle.sourceforge.io/config_misc.html#FinalParameters
> >>
> >>
> >>
> >> Piotr Nowojski  于2019年10月7日周一 下午3:49写道:
> >>
> >>> Hi,
> >>>
> >>> Couple of arguments to counter the proposal of making the `final`
> keyword
> >>> obligatory. Can we prepare a code style/IDE settings to add it
> >>> automatically? If not, I would be strongly against it, since:
> >>>
> >>> - Intellij’s automatic refactor actions will not work properly.
> >>> - I don’t think it’s a big deal. I don’t remember having any issues
> with
> >>> the lack or presence of the `final` keyword.
> >>> - `final` is pretty much useless in most of the cases (it’s not `const`
> >>> and it works only for the primitive types).
> >>> - I don’t like the extra overhead of doing something of very little
> extra
> >>> value. Especially the extra hassle of going back & forth during the
> reviews
> >>> (both as a contributor & as a reviewer).
> >>> - If missing `final` keyword caused some confusion, because
> surprisingly a
> >>> parameter was modified somewhere in the function and it wasn’t
> obviously
> >>> visible, the function is doing probably too many things and it’s too
> >>> long/too complicated…
> >>>
> >>> Generally speaking, I’m against adding minor things to our codestyle
> that
> >>> can not be enforced and added automatically.
> >>>
> >>> Piotrek
> >>>
>  On 7 Oct 2019, at 09:13, Arvid Heise  wrote:
> 
>  Hi guys,
> 
>  

Re: Testing DataStreams

2019-10-07 Thread Chesnay Schepler
Are you asking how the elements from the source are distributed across 
the subtasks of the next operation?


That differs a bit across operations; in this case (a map/sink after a 
source) AFAIK they are distributed in a round-robin manner.


On 07/10/2019 11:26, Dominik Wosiński wrote:

Thanks a lot for Your answer here Chesnay. I have one more question,
since the `fromElements` is creating the stream with parallelism 1, and I
can see that the env created for my local machine has a default parallelism
== 12. So I assume that the entries from the stream are propagated for the
first operators with some scheme ? Or am I missing something?

Thanks in advance,
Best Regards,
Dom.

pon., 7 paź 2019 o 11:08 Chesnay Schepler  napisał(a):


Are you referring to "ExampleIntegrationTest"? If so, then I'm afraid
this test is slightly misleading since the order isn't guaranteed in this
case.

1) As long as the parallelism of the sink is 1 the elements should arrive
in order.

2) The order is maintained if parallelism=1 since elements cannot overtake
each other in a single stream.

If the parallelism is increased by a subsequent operation O1, then the
individual subtasks of O1 will still see a sorted stream.
If an operation O2 after O1 has a lower parallelism than O1 then it will
not see a sorted stream, since the outputs of O1-subtasks may interleave at
will. This is the reason why the "ExampleIntegrationTest" is incorrect;
while the 2 sink instances receive a sorted input they are adding them into
a single collection, interleaving data.

This is fine:

env.fromElements(1L, 21L, 22L)
.map(x -> x * 2)
.setParallelism(2)

.setParallelism(2);

This is not:

env.fromElements(1L, 21L, 22L)
.map(x -> x * 2)
.setParallelism(2)

.setParallelism(1);

In other words, if you never reduce the parallelism your functions should
be fine.
If you have to reduce the parallelism then you must resort the stream (or
realistically, window) somehow.

On 02/10/2019 23:32, Dominik Wosiński wrote:

Hello,
I have a question, since I am observing quite weird behavior. In the
documentation[1] the example of FlinkMiniCluster usage, shows that we can
expect the results to appear in the same order as they were injected to the
stream by use of *fromElements(). *I mean that Java version of the code is
using assertEquals for list, which will be true only if ArrayLists have the
same elements with the same order. On the other hand the Scala version of
this code uses different matcher that only asserts if all elements are
actually present in the list.
So I have two questions here:

1) For the code below can we be sure that the output will have the same
order as the input ?
For some reason the code returns the elements In quite random order in the
sink. I was actually sure that it is the expected behavior but this piece
of documentation made me wonder.

val env = StreamExecutionEnvironment.getExecutionEnvironment

env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.fromElements("A", "B", "C", "D", "E", "F")
 .addSink(new TestSink)
env.execute()

class TestSink extends SinkFunction[String] {
   override def invoke(value: String): Unit =
synchronized{TestSink.vals.add(value)}
}

object TestSink {
   val vals = new ConcurrentLinkedQueue[String]()
}

  2) Is there a reason to enforce order to be kept for env with parallelism
= 1 ? If I want to test some function or set of functions that depend on
the order of the events. Like for example detecting the beginning and the
end of the pattern, can I somehow assure that the order for testing
purposes ??


Best Regards,
Dom.
a
[1]https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html







Re: Testing DataStreams

2019-10-07 Thread Dominik Wosiński
Thanks a lot for Your answer here Chesnay. I have one more question,
since the `fromElements` is creating the stream with parallelism 1, and I
can see that the env created for my local machine has a default parallelism
== 12. So I assume that the entries from the stream are propagated for the
first operators with some scheme ? Or am I missing something?

Thanks in advance,
Best Regards,
Dom.

pon., 7 paź 2019 o 11:08 Chesnay Schepler  napisał(a):

> Are you referring to "ExampleIntegrationTest"? If so, then I'm afraid
> this test is slightly misleading since the order isn't guaranteed in this
> case.
>
> 1) As long as the parallelism of the sink is 1 the elements should arrive
> in order.
>
> 2) The order is maintained if parallelism=1 since elements cannot overtake
> each other in a single stream.
>
> If the parallelism is increased by a subsequent operation O1, then the
> individual subtasks of O1 will still see a sorted stream.
> If an operation O2 after O1 has a lower parallelism than O1 then it will
> not see a sorted stream, since the outputs of O1-subtasks may interleave at
> will. This is the reason why the "ExampleIntegrationTest" is incorrect;
> while the 2 sink instances receive a sorted input they are adding them into
> a single collection, interleaving data.
>
> This is fine:
>
> env.fromElements(1L, 21L, 22L)
>.map(x -> x * 2)
>.setParallelism(2)
>
>.setParallelism(2);
>
> This is not:
>
> env.fromElements(1L, 21L, 22L)
>.map(x -> x * 2)
>.setParallelism(2)
>
>.setParallelism(1);
>
> In other words, if you never reduce the parallelism your functions should
> be fine.
> If you have to reduce the parallelism then you must resort the stream (or
> realistically, window) somehow.
>
> On 02/10/2019 23:32, Dominik Wosiński wrote:
>
> Hello,
> I have a question, since I am observing quite weird behavior. In the
> documentation[1] the example of FlinkMiniCluster usage, shows that we can
> expect the results to appear in the same order as they were injected to the
> stream by use of *fromElements(). *I mean that Java version of the code is
> using assertEquals for list, which will be true only if ArrayLists have the
> same elements with the same order. On the other hand the Scala version of
> this code uses different matcher that only asserts if all elements are
> actually present in the list.
> So I have two questions here:
>
> 1) For the code below can we be sure that the output will have the same
> order as the input ?
> For some reason the code returns the elements In quite random order in the
> sink. I was actually sure that it is the expected behavior but this piece
> of documentation made me wonder.
>
> val env = StreamExecutionEnvironment.getExecutionEnvironment
>
> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
> env.fromElements("A", "B", "C", "D", "E", "F")
> .addSink(new TestSink)
> env.execute()
>
> class TestSink extends SinkFunction[String] {
>   override def invoke(value: String): Unit =
> synchronized{TestSink.vals.add(value)}
> }
>
> object TestSink {
>   val vals = new ConcurrentLinkedQueue[String]()
> }
>
>  2) Is there a reason to enforce order to be kept for env with parallelism
> = 1 ? If I want to test some function or set of functions that depend on
> the order of the events. Like for example detecting the beginning and the
> end of the pattern, can I somehow assure that the order for testing
> purposes ??
>
>
> Best Regards,
> Dom.
> a
> [1]https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html
>
>
>


Re: [CODE STYLE] Parameters of method are always final

2019-10-07 Thread Piotr Nowojski
After checking out the check style modules mentioned by Tison, I really do not 
see a point of enforcing/adding `final`. With ParameterAssignment [1] it’s 
redundant and will cause problems that I mentioned before.

Additionally enabling ParameterAssignment [1] would be probably much easier in 
our code base compared to enabling FinalParameters [2].

However still I’m not sure if that’s worth our troubles. I would be in favour 
of doing it only If enabling ParameterAssignment [1] doesn’t require many 
changes in our code base.

Piotrek

[1] https://checkstyle.sourceforge.io/config_coding.html#ParameterAssignment 

[2] https://checkstyle.sourceforge.io/config_misc.html#FinalParameters 


> On 7 Oct 2019, at 11:09, Dawid Wysakowicz  wrote:
> 
> Hi all,
> 
> My quick take on it.
> 
> 1. If I were to introduce any rule on that I agree with Aljoscha, we
> should rather enforce the `final` keyword rather than the opposite.
> 
> 2. I think it does not make sense to enforce rules on such an
> unimportant (in my opinion) topic. Generally I agree with Piotr that if
> we want to introduce some rule we should have a way to automatically
> enforce it.
> 
> 3. I also agree with Piotr that problems with parameters reassignment
> appear nearly exclusively in a very long methods. If we break long
> methods in a shorter ones than there is usually no problem with a
> parameter reassignment.
> 
> 4. I would be ok with adding a point to our code style guidelines, but
> honestly I am a bit skeptical. In the end we are not writing a book on
> how to write a good software, but we rather list the most important
> problems that we've seen so far in Flink code base and how to better
> solve it. I'm not sure that's the case with the parameters reassignment.
> 
> Best,
> 
> Dawid
> 
> On 07/10/2019 10:11, Zili Chen wrote:
>> Thanks for your thoughts Arvid & Piotr,
>> 
>> I check out the effect of ParameterAssignment[1] and it does
>> prevent codes from modifying parameter which I argued above
>> the most value introduced by `final` modifier of parameter.
>> 
>> So firstly, I think it's good to enable ParameterAssignment in our
>> codebase.
>> 
>> Besides, there is no rule to forbid `final` modifier of parameter.
>> Instead, there is a rule to enforce `final` modifier[2] but given [1]
>> enabled it is actually redundant.
>> 
>> The main purpose for, given enforced not modify parameters, reducing
>> `final` modifiers of parameter is to remove redundant modifier so that we
>> don't see have declaration like
>> 
>> T fn(
>>  final ArgumentType1 argument1,
>>  final ArgumentType2 argument2,
>>  ...)
>> 
>> because we actually don't mark final everywhere so it might make some
>> confusions.
>> 
>> Given [1] is enforced these `final` are indeed redundant so that we can
>> add a convention to reduce `final` modifiers of parameters, which is a net
>> win.
>> 
>> Best,
>> tison.
>> 
>> [1] https://checkstyle.sourceforge.io/config_coding.html#ParameterAssignment
>> [2] https://checkstyle.sourceforge.io/config_misc.html#FinalParameters
>> 
>> 
>> 
>> Piotr Nowojski  于2019年10月7日周一 下午3:49写道:
>> 
>>> Hi,
>>> 
>>> Couple of arguments to counter the proposal of making the `final` keyword
>>> obligatory. Can we prepare a code style/IDE settings to add it
>>> automatically? If not, I would be strongly against it, since:
>>> 
>>> - Intellij’s automatic refactor actions will not work properly.
>>> - I don’t think it’s a big deal. I don’t remember having any issues with
>>> the lack or presence of the `final` keyword.
>>> - `final` is pretty much useless in most of the cases (it’s not `const`
>>> and it works only for the primitive types).
>>> - I don’t like the extra overhead of doing something of very little extra
>>> value. Especially the extra hassle of going back & forth during the reviews
>>> (both as a contributor & as a reviewer).
>>> - If missing `final` keyword caused some confusion, because surprisingly a
>>> parameter was modified somewhere in the function and it wasn’t obviously
>>> visible, the function is doing probably too many things and it’s too
>>> long/too complicated…
>>> 
>>> Generally speaking, I’m against adding minor things to our codestyle that
>>> can not be enforced and added automatically.
>>> 
>>> Piotrek
>>> 
 On 7 Oct 2019, at 09:13, Arvid Heise  wrote:
 
 Hi guys,
 
 I'm a bit torn. In general, +1 for making parameters effectively final.
 
 The question is how to enforce it. We can make it explicit (like Aljoscha
 said). All IDEs will show immediately warnings/errors for violations. It
 would allow to softly migrate code.
 
 Another option is to use a checkstyle rule [1]. Then, we could omit the
 final keyword and rely on checkstyle checks as we do for quite a few
>>> other
 things. A hard checkstyle rule would probably fail on a 

Re: Testing DataStreams

2019-10-07 Thread Chesnay Schepler

I've filed FLINK-14335 for fixing the java example.

On 07/10/2019 11:08, Chesnay Schepler wrote:
Are you referring to "ExampleIntegrationTest"? If so, then I'm afraid 
this test is slightly misleading since the order isn't guaranteed in 
this case.


1) As long as the parallelism of the sink is 1 the elements should 
arrive in order.


2) The order is maintained if parallelism=1 since elements cannot 
overtake each other in a single stream.


If the parallelism is increased by a subsequent operation O1, then the 
individual subtasks of O1 will still see a sorted stream.
If an operation O2 after O1 has a lower parallelism than O1 then it 
will not see a sorted stream, since the outputs of O1-subtasks may 
interleave at will. This is the reason why the 
"ExampleIntegrationTest" is incorrect; while the 2 sink instances 
receive a sorted input they are adding them into a single collection, 
interleaving data.


This is fine:

env.fromElements(1L, 21L, 22L)
   .map(x -> x *2)
   .setParallelism(2)
   
   .setParallelism(2);

This is not:

env.fromElements(1L, 21L, 22L)
   .map(x -> x *2)
   .setParallelism(2)
   
   .setParallelism(1);

In other words, if you never reduce the parallelism your functions 
should be fine.
If you have to reduce the parallelism then you must resort the stream 
(or realistically, window) somehow.


On 02/10/2019 23:32, Dominik Wosiński wrote:

Hello,
I have a question, since I am observing quite weird behavior. In the
documentation[1] the example of FlinkMiniCluster usage, shows that we 
can
expect the results to appear in the same order as they were injected 
to the
stream by use of *fromElements(). *I mean that Java version of the 
code is
using assertEquals for list, which will be true only if ArrayLists 
have the
same elements with the same order. On the other hand the Scala 
version of

this code uses different matcher that only asserts if all elements are
actually present in the list.
So I have two questions here:

1) For the code below can we be sure that the output will have the same
order as the input ?
For some reason the code returns the elements In quite random order 
in the
sink. I was actually sure that it is the expected behavior but this 
piece

of documentation made me wonder.

val env = StreamExecutionEnvironment.getExecutionEnvironment

env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.fromElements("A", "B", "C", "D", "E", "F")
 .addSink(new TestSink)
env.execute()

class TestSink extends SinkFunction[String] {
   override def invoke(value: String): Unit =
synchronized{TestSink.vals.add(value)}
}

object TestSink {
   val vals = new ConcurrentLinkedQueue[String]()
}

  2) Is there a reason to enforce order to be kept for env with 
parallelism

= 1 ? If I want to test some function or set of functions that depend on
the order of the events. Like for example detecting the beginning and 
the

end of the pattern, can I somehow assure that the order for testing
purposes ??


Best Regards,
Dom.
a
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html 










[jira] [Created] (FLINK-14335) Java version of ExampleIntegrationTest in testing docs is incorrect

2019-10-07 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-14335:


 Summary: Java version of ExampleIntegrationTest in testing docs is 
incorrect
 Key: FLINK-14335
 URL: https://issues.apache.org/jira/browse/FLINK-14335
 Project: Flink
  Issue Type: Task
  Components: Documentation, Tests
Affects Versions: 1.9.0
Reporter: Chesnay Schepler
 Fix For: 1.10.0, 1.9.1


The java version of the ExampleIntegrationTest is incorrect since it assumes 
elements to arrive in the sink in order, but this isn't guaranteed since there 
are 2 sink subtasks mutating a shared collection.

The scala example was modified correctly; it checks that elements are contained 
with verifying the order.

{code}
public class ExampleIntegrationTest {
...

// configure your test environment
env.setParallelism(2);

...

// create a stream of custom elements and apply transformations
env.fromElements(1L, 21L, 22L)
.map(new IncrementMapFunction())
.addSink(new CollectSink());

// execute
env.execute();

// verify your results
assertEquals(Lists.newArrayList(2L, 42L, 44L), CollectSink.values);
}

// create a testing sink
private static class CollectSink implements SinkFunction {

// must be static
public static final List values = new ArrayList<>();

@Override
public synchronized void invoke(Long value) throws Exception {
values.add(value);
}
}
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [CODE STYLE] Parameters of method are always final

2019-10-07 Thread Dawid Wysakowicz
Hi all,

My quick take on it.

1. If I were to introduce any rule on that I agree with Aljoscha, we
should rather enforce the `final` keyword rather than the opposite.

2. I think it does not make sense to enforce rules on such an
unimportant (in my opinion) topic. Generally I agree with Piotr that if
we want to introduce some rule we should have a way to automatically
enforce it.

3. I also agree with Piotr that problems with parameters reassignment
appear nearly exclusively in a very long methods. If we break long
methods in a shorter ones than there is usually no problem with a
parameter reassignment.

4. I would be ok with adding a point to our code style guidelines, but
honestly I am a bit skeptical. In the end we are not writing a book on
how to write a good software, but we rather list the most important
problems that we've seen so far in Flink code base and how to better
solve it. I'm not sure that's the case with the parameters reassignment.

Best,

Dawid

On 07/10/2019 10:11, Zili Chen wrote:
> Thanks for your thoughts Arvid & Piotr,
>
> I check out the effect of ParameterAssignment[1] and it does
> prevent codes from modifying parameter which I argued above
> the most value introduced by `final` modifier of parameter.
>
> So firstly, I think it's good to enable ParameterAssignment in our
> codebase.
>
> Besides, there is no rule to forbid `final` modifier of parameter.
> Instead, there is a rule to enforce `final` modifier[2] but given [1]
> enabled it is actually redundant.
>
> The main purpose for, given enforced not modify parameters, reducing
> `final` modifiers of parameter is to remove redundant modifier so that we
> don't see have declaration like
>
> T fn(
>   final ArgumentType1 argument1,
>   final ArgumentType2 argument2,
>   ...)
>
> because we actually don't mark final everywhere so it might make some
> confusions.
>
> Given [1] is enforced these `final` are indeed redundant so that we can
> add a convention to reduce `final` modifiers of parameters, which is a net
> win.
>
> Best,
> tison.
>
> [1] https://checkstyle.sourceforge.io/config_coding.html#ParameterAssignment
> [2] https://checkstyle.sourceforge.io/config_misc.html#FinalParameters
>
>
>
> Piotr Nowojski  于2019年10月7日周一 下午3:49写道:
>
>> Hi,
>>
>> Couple of arguments to counter the proposal of making the `final` keyword
>> obligatory. Can we prepare a code style/IDE settings to add it
>> automatically? If not, I would be strongly against it, since:
>>
>> - Intellij’s automatic refactor actions will not work properly.
>> - I don’t think it’s a big deal. I don’t remember having any issues with
>> the lack or presence of the `final` keyword.
>> - `final` is pretty much useless in most of the cases (it’s not `const`
>> and it works only for the primitive types).
>> - I don’t like the extra overhead of doing something of very little extra
>> value. Especially the extra hassle of going back & forth during the reviews
>> (both as a contributor & as a reviewer).
>> - If missing `final` keyword caused some confusion, because surprisingly a
>> parameter was modified somewhere in the function and it wasn’t obviously
>> visible, the function is doing probably too many things and it’s too
>> long/too complicated…
>>
>> Generally speaking, I’m against adding minor things to our codestyle that
>> can not be enforced and added automatically.
>>
>> Piotrek
>>
>>> On 7 Oct 2019, at 09:13, Arvid Heise  wrote:
>>>
>>> Hi guys,
>>>
>>> I'm a bit torn. In general, +1 for making parameters effectively final.
>>>
>>> The question is how to enforce it. We can make it explicit (like Aljoscha
>>> said). All IDEs will show immediately warnings/errors for violations. It
>>> would allow to softly migrate code.
>>>
>>> Another option is to use a checkstyle rule [1]. Then, we could omit the
>>> final keyword and rely on checkstyle checks as we do for quite a few
>> other
>>> things. A hard checkstyle rule would probably fail on a good portion of
>> the
>>> current code base. But we would also remove reassignment to parameters
>>> (which I consider an anti-pattern).
>>>
>>> If we opt not to enforce it, then -1 for removing final keywords from my
>>> side.
>>>
>>> Best,
>>>
>>> Arvid
>>>
>>> [1]
>>>
>> https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/checks/coding/ParameterAssignmentCheck.html
>>> On Fri, Oct 4, 2019 at 1:22 PM Zili Chen  wrote:
>>>
 Hi Aljoscha,

 I totally agree with you on field topic. Of course it makes significant
 difference whether
 or not a field is final and IDE/compiler can help on checking.

 Here are several thoughts about final modifier on parameters and why I
 propose this one:

 1. parameters should be always final

 As described above, there is no reason a parameter to be non-final. So
 different with field,
 a field can be final or non-final based on whether or not it is
>> immutable.
 Thus with such a
 code style guide in our community, 

Re: Testing DataStreams

2019-10-07 Thread Chesnay Schepler
Are you referring to "ExampleIntegrationTest"? If so, then I'm afraid 
this test is slightly misleading since the order isn't guaranteed in 
this case.


1) As long as the parallelism of the sink is 1 the elements should 
arrive in order.


2) The order is maintained if parallelism=1 since elements cannot 
overtake each other in a single stream.


If the parallelism is increased by a subsequent operation O1, then the 
individual subtasks of O1 will still see a sorted stream.
If an operation O2 after O1 has a lower parallelism than O1 then it will 
not see a sorted stream, since the outputs of O1-subtasks may interleave 
at will. This is the reason why the "ExampleIntegrationTest" is 
incorrect; while the 2 sink instances receive a sorted input they are 
adding them into a single collection, interleaving data.


This is fine:

env.fromElements(1L, 21L, 22L)
   .map(x -> x *2)
   .setParallelism(2)
   
   .setParallelism(2);

This is not:

env.fromElements(1L, 21L, 22L)
   .map(x -> x *2)
   .setParallelism(2)
   
   .setParallelism(1);

In other words, if you never reduce the parallelism your functions 
should be fine.
If you have to reduce the parallelism then you must resort the stream 
(or realistically, window) somehow.


On 02/10/2019 23:32, Dominik Wosiński wrote:

Hello,
I have a question, since I am observing quite weird behavior. In the
documentation[1] the example of FlinkMiniCluster usage, shows that we can
expect the results to appear in the same order as they were injected to the
stream by use of *fromElements(). *I mean that Java version of the code is
using assertEquals for list, which will be true only if ArrayLists have the
same elements with the same order. On the other hand the Scala version of
this code uses different matcher that only asserts if all elements are
actually present in the list.
So I have two questions here:

1) For the code below can we be sure that the output will have the same
order as the input ?
For some reason the code returns the elements In quite random order in the
sink. I was actually sure that it is the expected behavior but this piece
of documentation made me wonder.

val env = StreamExecutionEnvironment.getExecutionEnvironment

env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.fromElements("A", "B", "C", "D", "E", "F")
 .addSink(new TestSink)
env.execute()

class TestSink extends SinkFunction[String] {
   override def invoke(value: String): Unit =
synchronized{TestSink.vals.add(value)}
}

object TestSink {
   val vals = new ConcurrentLinkedQueue[String]()
}

  2) Is there a reason to enforce order to be kept for env with parallelism
= 1 ? If I want to test some function or set of functions that depend on
the order of the events. Like for example detecting the beginning and the
end of the pattern, can I somehow assure that the order for testing
purposes ??


Best Regards,
Dom.
a
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html





[jira] [Created] (FLINK-14334) ElasticSearch docs refer to non-existent ExceptionUtils.containsThrowable

2019-10-07 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-14334:


 Summary: ElasticSearch docs refer to non-existent 
ExceptionUtils.containsThrowable
 Key: FLINK-14334
 URL: https://issues.apache.org/jira/browse/FLINK-14334
 Project: Flink
  Issue Type: Task
  Components: Connectors / ElasticSearch, Documentation
Affects Versions: 1.8.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.10.0, 1.9.1, 1.8.3






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [CODE STYLE] Parameters of method are always final

2019-10-07 Thread Zili Chen
Thanks for your thoughts Arvid & Piotr,

I check out the effect of ParameterAssignment[1] and it does
prevent codes from modifying parameter which I argued above
the most value introduced by `final` modifier of parameter.

So firstly, I think it's good to enable ParameterAssignment in our
codebase.

Besides, there is no rule to forbid `final` modifier of parameter.
Instead, there is a rule to enforce `final` modifier[2] but given [1]
enabled it is actually redundant.

The main purpose for, given enforced not modify parameters, reducing
`final` modifiers of parameter is to remove redundant modifier so that we
don't see have declaration like

T fn(
  final ArgumentType1 argument1,
  final ArgumentType2 argument2,
  ...)

because we actually don't mark final everywhere so it might make some
confusions.

Given [1] is enforced these `final` are indeed redundant so that we can
add a convention to reduce `final` modifiers of parameters, which is a net
win.

Best,
tison.

[1] https://checkstyle.sourceforge.io/config_coding.html#ParameterAssignment
[2] https://checkstyle.sourceforge.io/config_misc.html#FinalParameters



Piotr Nowojski  于2019年10月7日周一 下午3:49写道:

> Hi,
>
> Couple of arguments to counter the proposal of making the `final` keyword
> obligatory. Can we prepare a code style/IDE settings to add it
> automatically? If not, I would be strongly against it, since:
>
> - Intellij’s automatic refactor actions will not work properly.
> - I don’t think it’s a big deal. I don’t remember having any issues with
> the lack or presence of the `final` keyword.
> - `final` is pretty much useless in most of the cases (it’s not `const`
> and it works only for the primitive types).
> - I don’t like the extra overhead of doing something of very little extra
> value. Especially the extra hassle of going back & forth during the reviews
> (both as a contributor & as a reviewer).
> - If missing `final` keyword caused some confusion, because surprisingly a
> parameter was modified somewhere in the function and it wasn’t obviously
> visible, the function is doing probably too many things and it’s too
> long/too complicated…
>
> Generally speaking, I’m against adding minor things to our codestyle that
> can not be enforced and added automatically.
>
> Piotrek
>
> > On 7 Oct 2019, at 09:13, Arvid Heise  wrote:
> >
> > Hi guys,
> >
> > I'm a bit torn. In general, +1 for making parameters effectively final.
> >
> > The question is how to enforce it. We can make it explicit (like Aljoscha
> > said). All IDEs will show immediately warnings/errors for violations. It
> > would allow to softly migrate code.
> >
> > Another option is to use a checkstyle rule [1]. Then, we could omit the
> > final keyword and rely on checkstyle checks as we do for quite a few
> other
> > things. A hard checkstyle rule would probably fail on a good portion of
> the
> > current code base. But we would also remove reassignment to parameters
> > (which I consider an anti-pattern).
> >
> > If we opt not to enforce it, then -1 for removing final keywords from my
> > side.
> >
> > Best,
> >
> > Arvid
> >
> > [1]
> >
> https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/checks/coding/ParameterAssignmentCheck.html
> >
> > On Fri, Oct 4, 2019 at 1:22 PM Zili Chen  wrote:
> >
> >> Hi Aljoscha,
> >>
> >> I totally agree with you on field topic. Of course it makes significant
> >> difference whether
> >> or not a field is final and IDE/compiler can help on checking.
> >>
> >> Here are several thoughts about final modifier on parameters and why I
> >> propose this one:
> >>
> >> 1. parameters should be always final
> >>
> >> As described above, there is no reason a parameter to be non-final. So
> >> different with field,
> >> a field can be final or non-final based on whether or not it is
> immutable.
> >> Thus with such a
> >> code style guide in our community, we can work towards a codebase every
> >> parameter is
> >> effectively final.
> >>
> >> 2. parameter final cannot be inherited
> >>
> >> Unfortunately Java doesn't keep final modifier of method parameter when
> >> inheritance. So
> >> even you mark a parameter as final in an interface or super class, you
> have
> >> to re-mark it
> >> as final in its implementation or subclass. From another perspective,
> final
> >> modifier of
> >> parameter is a local attribute of parameter so that we can narrow
> possible
> >> effect during
> >> review.
> >>
> >> 3. IDE can lint difference between effective final and mutable parameter
> >>
> >> It is true that IDE such as Intellij IDEA can lint difference between
> >> effective final and
> >> mutable parameter(with underline). So that with this code style what we
> >> lose is that
> >> we cannot get a compile time error if someone modifies parameter in the
> >> method body.
> >> But as mentioned in 1, by no means we allow a parameter to be modified.
> If
> >> we agree
> >> on this statement, then we hopefully converge in a codebase that no
> >> 

[VOTE] FLIP-74: Flink JobClient API

2019-10-07 Thread Zili Chen
Hi all,

I would like to start the vote for FLIP-74[1], which is discussed and
reached a consensus in the discussion thread[2].

The vote will be open util Oct. 9th(72h starting on Oct.7th), unless
there is an objection or not  enough votes.

Best,
tison.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API
[2]
https://lists.apache.org/x/thread.html/b2e22a45aeb94a8d06b50c4de078f7b23d9ff08b8226918a1a903768@%3Cdev.flink.apache.org%3E


Re: [VOTE] FLIP-57: Rework FunctionCatalog, latest updated

2019-10-07 Thread Dawid Wysakowicz
+1 for the FLIP.

Best,

Dawid

On 07/10/2019 08:45, Bowen Li wrote:
> Hi all,
>
> I'd like to start a new voting thread for FLIP-57 [1] on its latest status
> despite [2], and we've reached consensus in [2] and [3].
>
> This voting will be open for minimum 3 days till 6:45am UTC, Oct 10.
>
> Thanks,
> Bowen
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> [2] https://www.mail-archive.com/dev@flink.apache.org/msg30180.html
> [3]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-76: Unaligned checkpoints

2019-10-07 Thread Piotr Nowojski
Hi Arvid,

Thanks for coming up with this FLIP. I think it addresses the issues raised in 
the previous mailing list discussion [2]. 

For the record: +1 from my side to implement this.

Piotrek

> On 30 Sep 2019, at 14:31, Arvid Heise  wrote:
> 
> Hi Devs,
> 
> I would like to start the formal discussion about FLIP-76 [1], which
> improves the checkpoint latency in systems under backpressure, where a
> checkpoint can take hours to complete in the worst case. I recommend the
> thread "checkpointing under backpressure" [2] to get a good idea why users
> are not satisfied with the current behavior. The key points:
> 
>   - Since the checkpoint barrier flows much slower through the
>   back-pressured channels, the other channels and their upstream operators
>   are effectively blocked during checkpointing.
>   - The checkpoint barrier takes a long time to reach the sinks causing
>   long checkpointing times. A longer checkpointing time in turn means that
>   the checkpoint will be fairly outdated once done. Since a heavily utilized
>   pipeline is inherently more fragile, we may run into a vicious cycle of
>   late checkpoints, crash, recovery to a rather outdated checkpoint, more
>   back pressure, and even later checkpoints, which would result in little to
>   no progress in the application.
> 
> The FLIP proposes "unaligned checkpoints" which improves the current state,
> such that
> 
>   - Upstream processes can continue to produce data, even if some operator
>   still waits on a checkpoint barrier on a specific input channel.
>   - Checkpointing times are heavily reduced across the execution graph,
>   even for operators with a single input channel.
>   - End-users will see more progress even in unstable environments as more
>   up-to-date checkpoints will avoid too many recomputations.
>   - Facilitate faster rescaling.
> 
> The key idea is to allow checkpoint barriers to be forwarded to downstream
> tasks before the synchronous part of the checkpointing has been conducted
> (see Fig. 1). To that end, we need to store in-flight data as part of the
> checkpoint as described in greater details in this FLIP.
> 
> Although the basic idea was already sketched in [2], we would like get
> broader feedback in this dedicated mail thread.
> 
> Best,
> 
> Arvid
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Checkpointing-under-backpressure-td31616.html



[jira] [Created] (FLINK-14333) YARNSessionFIFOSecuredITCase fails on Travis

2019-10-07 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-14333:


 Summary: YARNSessionFIFOSecuredITCase fails on Travis
 Key: FLINK-14333
 URL: https://issues.apache.org/jira/browse/FLINK-14333
 Project: Flink
  Issue Type: Task
  Components: Deployment / YARN, Tests
Affects Versions: 1.10.0
Reporter: Chesnay Schepler


https://travis-ci.org/apache/flink/builds/593958591

{code}
20:59:33.116 [ERROR] Tests run: 4, Failures: 2, Errors: 0, Skipped: 2, Time 
elapsed: 46.546 s <<< FAILURE! - in 
org.apache.flink.yarn.YARNSessionFIFOSecuredITCase
20:59:33.116 [ERROR] 
testDetachedMode(org.apache.flink.yarn.YARNSessionFIFOSecuredITCase)  Time 
elapsed: 18.732 s  <<< FAILURE!
java.lang.AssertionError: 
Found a file 
/home/travis/build/apache/flink/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-1_0/application_1570309129335_0001/container_1570309129335_0001_01_01/jobmanager.log
 with a prohibited string (one of [Exception, Started 
SelectChannelConnector@0.0.0.0:8081]). Excerpts:
[
2019-10-05 20:59:11,971 INFO  
org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore  - 
Shutting down
2019-10-05 20:59:11,999 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher  - Job 
4d21a21eee0bf5af6fc8616debddfcd4 reached globally terminal state FINISHED.
2019-10-05 20:59:12,026 INFO  org.apache.flink.runtime.jobmaster.JobMaster  
- Stopping the JobMaster for job Streaming 
WordCount(4d21a21eee0bf5af6fc8616debddfcd4).
2019-10-05 20:59:12,050 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl  - Suspending 
SlotPool.
2019-10-05 20:59:12,051 INFO  org.apache.flink.runtime.jobmaster.JobMaster  
- Close ResourceManager connection 
8cbddf818ec206854b4ee4d75b08bf98: JobManager is shutting down..
2019-10-05 20:59:12,051 INFO  
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl  - Stopping 
SlotPool.
2019-10-05 20:59:12,052 INFO  org.apache.flink.yarn.YarnResourceManager 
- Disconnect job manager 
0...@akka.tcp://flink@localhost:41357/user/jobmanager_0
 for job 4d21a21eee0bf5af6fc8616debddfcd4 from the resource manager.
2019-10-05 20:59:12,062 INFO  
org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl   - 
JobManagerRunner already shutdown.
2019-10-05 20:59:12,325 ERROR org.apache.flink.yarn.YarnResourceManager 
- Fatal error occurred in ResourceManager.
org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: 
Received shutdown request from YARN ResourceManager.
org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: 
Received shutdown request from YARN ResourceManager.
at 
org.apache.flink.yarn.YarnResourceManager.onShutdownRequest(YarnResourceManager.java:454)
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:275)
2019-10-05 20:59:12,330 ERROR 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error 
occurred in the cluster entrypoint.
org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: 
Received shutdown request from YARN ResourceManager.
at 
org.apache.flink.yarn.YarnResourceManager.onShutdownRequest(YarnResourceManager.java:454)
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:275)
2019-10-05 20:59:12,342 INFO  org.apache.flink.runtime.blob.BlobServer  
- Stopped BLOB server at 0.0.0.0:34373
]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [CODE STYLE] Parameters of method are always final

2019-10-07 Thread Piotr Nowojski
Hi,

Couple of arguments to counter the proposal of making the `final` keyword 
obligatory. Can we prepare a code style/IDE settings to add it automatically? 
If not, I would be strongly against it, since:

- Intellij’s automatic refactor actions will not work properly.
- I don’t think it’s a big deal. I don’t remember having any issues with the 
lack or presence of the `final` keyword.
- `final` is pretty much useless in most of the cases (it’s not `const` and it 
works only for the primitive types).
- I don’t like the extra overhead of doing something of very little extra 
value. Especially the extra hassle of going back & forth during the reviews 
(both as a contributor & as a reviewer).
- If missing `final` keyword caused some confusion, because surprisingly a 
parameter was modified somewhere in the function and it wasn’t obviously 
visible, the function is doing probably too many things and it’s too long/too 
complicated…

Generally speaking, I’m against adding minor things to our codestyle that can 
not be enforced and added automatically.

Piotrek

> On 7 Oct 2019, at 09:13, Arvid Heise  wrote:
> 
> Hi guys,
> 
> I'm a bit torn. In general, +1 for making parameters effectively final.
> 
> The question is how to enforce it. We can make it explicit (like Aljoscha
> said). All IDEs will show immediately warnings/errors for violations. It
> would allow to softly migrate code.
> 
> Another option is to use a checkstyle rule [1]. Then, we could omit the
> final keyword and rely on checkstyle checks as we do for quite a few other
> things. A hard checkstyle rule would probably fail on a good portion of the
> current code base. But we would also remove reassignment to parameters
> (which I consider an anti-pattern).
> 
> If we opt not to enforce it, then -1 for removing final keywords from my
> side.
> 
> Best,
> 
> Arvid
> 
> [1]
> https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/checks/coding/ParameterAssignmentCheck.html
> 
> On Fri, Oct 4, 2019 at 1:22 PM Zili Chen  wrote:
> 
>> Hi Aljoscha,
>> 
>> I totally agree with you on field topic. Of course it makes significant
>> difference whether
>> or not a field is final and IDE/compiler can help on checking.
>> 
>> Here are several thoughts about final modifier on parameters and why I
>> propose this one:
>> 
>> 1. parameters should be always final
>> 
>> As described above, there is no reason a parameter to be non-final. So
>> different with field,
>> a field can be final or non-final based on whether or not it is immutable.
>> Thus with such a
>> code style guide in our community, we can work towards a codebase every
>> parameter is
>> effectively final.
>> 
>> 2. parameter final cannot be inherited
>> 
>> Unfortunately Java doesn't keep final modifier of method parameter when
>> inheritance. So
>> even you mark a parameter as final in an interface or super class, you have
>> to re-mark it
>> as final in its implementation or subclass. From another perspective, final
>> modifier of
>> parameter is a local attribute of parameter so that we can narrow possible
>> effect during
>> review.
>> 
>> 3. IDE can lint difference between effective final and mutable parameter
>> 
>> It is true that IDE such as Intellij IDEA can lint difference between
>> effective final and
>> mutable parameter(with underline). So that with this code style what we
>> lose is that
>> we cannot get a compile time error if someone modifies parameter in the
>> method body.
>> But as mentioned in 1, by no means we allow a parameter to be modified. If
>> we agree
>> on this statement, then we hopefully converge in a codebase that no
>> parameter is
>> modified.
>> 
>> For what we gain, I'd like to recur our existing code style of @Nonnull to
>> be default.
>> Actually it does help for compiler to report compile time warning if we
>> possibly pass a
>> nullable value to an non-null field. We make @Nonnull as default to "reduce
>> code noise"
>> so I think we can reuse the statement here also.
>> 
>> Best,
>> tison.
>> 
>> 
>> Aljoscha Krettek  于2019年10月4日周五 下午5:58写道:
>> 
>>> I actually think that doing this the other way round would be correct.
>>> Removing final everywhere and relying on humans to assume that everything
>>> is final doesn’t seem maintainable to me. The benefit of having final on
>>> parameters/fields is that the compiler/IDE actually checks that you don’t
>>> modify it.
>>> 
>>> In general, I think that as much as possible should be declared final,
>>> including fields and parameters.
>>> 
>>> Best,
>>> Aljoscha
>>> 
 On 2. Oct 2019, at 13:31, Piotr Nowojski  wrote:
 
 +1 from my side.
 
> On 2 Oct 2019, at 13:07, Zili Chen  wrote:
> 
> Yes exactly.
> 
> 
> Piotr Nowojski  于2019年10月2日周三 下午7:03写道:
> 
>> Hi Tison,
>> 
>> To clarify  your proposal. You are proposing to actually drop the
>> `final` keyword from the parameters and we should implicilty assume
>>> that
>> it’s always 

Re: [CODE STYLE] Parameters of method are always final

2019-10-07 Thread Arvid Heise
Hi guys,

I'm a bit torn. In general, +1 for making parameters effectively final.

The question is how to enforce it. We can make it explicit (like Aljoscha
said). All IDEs will show immediately warnings/errors for violations. It
would allow to softly migrate code.

Another option is to use a checkstyle rule [1]. Then, we could omit the
final keyword and rely on checkstyle checks as we do for quite a few other
things. A hard checkstyle rule would probably fail on a good portion of the
current code base. But we would also remove reassignment to parameters
(which I consider an anti-pattern).

If we opt not to enforce it, then -1 for removing final keywords from my
side.

Best,

Arvid

[1]
https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/checks/coding/ParameterAssignmentCheck.html

On Fri, Oct 4, 2019 at 1:22 PM Zili Chen  wrote:

> Hi Aljoscha,
>
> I totally agree with you on field topic. Of course it makes significant
> difference whether
> or not a field is final and IDE/compiler can help on checking.
>
> Here are several thoughts about final modifier on parameters and why I
> propose this one:
>
> 1. parameters should be always final
>
> As described above, there is no reason a parameter to be non-final. So
> different with field,
> a field can be final or non-final based on whether or not it is immutable.
> Thus with such a
> code style guide in our community, we can work towards a codebase every
> parameter is
> effectively final.
>
> 2. parameter final cannot be inherited
>
> Unfortunately Java doesn't keep final modifier of method parameter when
> inheritance. So
> even you mark a parameter as final in an interface or super class, you have
> to re-mark it
> as final in its implementation or subclass. From another perspective, final
> modifier of
> parameter is a local attribute of parameter so that we can narrow possible
> effect during
> review.
>
> 3. IDE can lint difference between effective final and mutable parameter
>
> It is true that IDE such as Intellij IDEA can lint difference between
> effective final and
> mutable parameter(with underline). So that with this code style what we
> lose is that
> we cannot get a compile time error if someone modifies parameter in the
> method body.
> But as mentioned in 1, by no means we allow a parameter to be modified. If
> we agree
> on this statement, then we hopefully converge in a codebase that no
> parameter is
> modified.
>
> For what we gain, I'd like to recur our existing code style of @Nonnull to
> be default.
> Actually it does help for compiler to report compile time warning if we
> possibly pass a
> nullable value to an non-null field. We make @Nonnull as default to "reduce
> code noise"
> so I think we can reuse the statement here also.
>
> Best,
> tison.
>
>
> Aljoscha Krettek  于2019年10月4日周五 下午5:58写道:
>
> > I actually think that doing this the other way round would be correct.
> > Removing final everywhere and relying on humans to assume that everything
> > is final doesn’t seem maintainable to me. The benefit of having final on
> > parameters/fields is that the compiler/IDE actually checks that you don’t
> > modify it.
> >
> > In general, I think that as much as possible should be declared final,
> > including fields and parameters.
> >
> > Best,
> > Aljoscha
> >
> > > On 2. Oct 2019, at 13:31, Piotr Nowojski  wrote:
> > >
> > > +1 from my side.
> > >
> > >> On 2 Oct 2019, at 13:07, Zili Chen  wrote:
> > >>
> > >> Yes exactly.
> > >>
> > >>
> > >> Piotr Nowojski  于2019年10月2日周三 下午7:03写道:
> > >>
> > >>> Hi Tison,
> > >>>
> > >>> To clarify  your proposal. You are proposing to actually drop the
> > >>> `final` keyword from the parameters and we should implicilty assume
> > that
> > >>> it’s always there (in other words, we shouldn’t be modifying the
> > >>> parameters). Did I understand this correctly?
> > >>>
> > >>> Piotrek
> > >>>
> >  On 1 Oct 2019, at 21:44, Zili Chen  wrote:
> > 
> >  Hi devs,
> > 
> >  Coming from this discussion[1] I'd like to propose that in Flink
> > codebase
> >  we suggest a code style
> >  that parameters of method are always final. Almost everywhere
> > parameters
> > >>> of
> >  method are final
> >  already and if we have such consensus we can prevent redundant final
> >  modifier in method
> >  declaration so that we survive from those noise.
> > 
> >  Here are some cases that might require to modify a parameter.
> > 
> >  1. to set default; especially if (param == null) { param = ... }
> >  2. to refine parameter; it is in pattern if ( ... ) { param =
> >  refine(param); }
> > 
> >  Either of the cases we can move the refine/set default logics to the
> > >>> caller
> >  or select another
> >  name for the refined value case by case.
> > 
> >  Looking forward to your feedbacks :-)
> > 
> >  Best,
> >  tison.
> > 
> >  [1] 

[VOTE] FLIP-57: Rework FunctionCatalog, latest updated

2019-10-07 Thread Bowen Li
Hi all,

I'd like to start a new voting thread for FLIP-57 [1] on its latest status
despite [2], and we've reached consensus in [2] and [3].

This voting will be open for minimum 3 days till 6:45am UTC, Oct 10.

Thanks,
Bowen

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
[2] https://www.mail-archive.com/dev@flink.apache.org/msg30180.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613


Re: [VOTE] FLIP-57: Rework FunctionCatalog

2019-10-07 Thread Bowen Li
Hi Aljoscha, Timo

Thanks for the reminder. I've update the details in FLIP wiki, and will
kick off a voting thread.

On Fri, Oct 4, 2019 at 1:51 PM Timo Walther  wrote:

> Hi,
>
> I agree with Aljoscha. It is not transparent to me which votes are
> binding to the current status of the FLIP.
>
> Some other minor comments from my side:
>
> - We don't need to deprecate methods in FunctionCatalog. This class is
> internal. We can simply change the method signatures.
> - `String name` is missing in the FunctionIdentifier code example; can
> we call FunctionIdentifier.getSimpleName() just
> FunctionIdentifier.getName()?
> - Add the methods that we discussed to the example:  `of(String)`,
> `of(ObjectIdentifier)`
>
> Other than that, I'm happy to give my +1 to this proposal.
>
> Thanks for the productive discussion,
> Timo
>
>
> On 04.10.19 13:29, Aljoscha Krettek wrote:
> > Hi,
> >
> > I see there was quite some discussion and changes on the FLIP after this
> VOTE was started. I would suggest to start a new voting thread on the
> current state of the FLIP (keeping in mind that a FLIP vote needs at least
> three committer/PMC votes).
> >
> > For the future, we should probably keep discussion to the [DISCUSS]
> thread and use the vote thread only for voting.
> >
> > Best,
> > Aljoscha
> >
> >> On 3. Oct 2019, at 21:17, Bowen Li  wrote:
> >>
> >> I'm glad to announce that the community has accepted the design of
> FLIP-57,
> >> and we are moving forward to implementing it.
> >>
> >> Thanks everyone!
> >>
> >> On Wed, Oct 2, 2019 at 11:01 AM Bowen Li  wrote:
> >>
> >>> Introducing a new term "path" to APIs like
> >>> "getShortPath(Identifier)/getLongPath(Identifier)" would be confusing
> to
> >>> users, thus I feel "getSimpleName/getIdentifier" is fine.
> >>>
> >>> To summarize the discussion result.
> >>>
> >>>- categorize functions into 2 dimensions - system v.s. catalog,
> >>>non-temp v.s. temp - and that give us 4 combinations
> >>>- definition of FunctionIdentifier
> >>>
> >>>  @PublicEvolving
> >>>
> >>> Class FunctionIdentifier {
> >>>
> >>> String name;
> >>>
> >>> ObjectIdentifier oi;
> >>>
> >>> // for temporary/non-temporary system function
> >>> public FunctionIdentifier of(String name) {  }
> >>> // for temporary/non-temporary catalog function
> >>> public FunctionIdentifier of(ObjectIdentifier identifier) {  }
> >>>
> >>>
> >>> Optional getIdentifier() {}
> >>>
> >>> Optional getSimpleName() {}
> >>>
> >>> }
> >>>
> >>>
> >>> I've updated them to FLIP wiki. Please take a final look. I'll close
> the
> >>> voting if there's no other concern raised within 24 hours.
> >>>
> >>> Cheers
> >>>
> >>> On Wed, Oct 2, 2019 at 4:54 AM Dawid Wysakowicz <
> dwysakow...@apache.org>
> >>> wrote:
> >>>
>  Hi,
> 
>  I very much agree with Xuefu's summary of the two points, especially
> on
>  the "functionIdentifier doesn't need to reflect the categories".
> 
>  For the factory methods I think methods of should be enough:
> 
>    // for temporary/non-temporary system function
>  public FunctionIdentifier of(String name) {  }
>    // for temporary/non-temporary catalog function
>  public FunctionIdentifier of(ObjectIdentifier identifier){  }
> 
>  In case of the getters I did not like the method name `getName` in the
>  original proposal, as in my opinion it could imply that it can return
>  also just the name part of an ObjectIdentifier, which should not be
> the
>  case.
> 
>  I'm fine with getSimpleName/getIdentifier, but want to throw in few
>  other suggestions:
> 
>  * getShortPath(Identifier)/getLongPath(Identifier),
> 
>  * getSystemPath(Identifier)/getCatalogPath(Identifier)
> 
>  +1 to any of the 3 options.
> 
>  One additional thing the FunctionIdentifier should be a PublicEvolving
>  class, as it is part of a PublicEvolving APIs e.g. CallExpression,
> which
>  user might need to access e.g. in a filter pushdown.
> 
>  I also support the Xuefu's suggestion not to support the "ALL" keyword
>  in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design
>  of it  is not part of the FLIP-57, we do not need to agree on that in
>  this thread.
> 
>  Overall I think after updating the FLIP with the outcome of the
>  discussion I vote +1 for it.
> 
>  Best,
> 
>  Dawid
> 
> 
>  On 02/10/2019 00:21, Xuefu Z wrote:
> > Here are some of my thoughts on the minor debates above:
> >
> > 1. +1 for 4 categories of functions. They are categorized along two
> > dimensions of binary values: X: *temporary* vs non-temporary
>  (persistent);
> > Y: *system* vs non-system (so said catalog).
> > 2. In my opinion, class functionIdentifier doesn't really need to
>  reflect
> > the categories of the functions. Instead, we should decouple