[GitHub] [flink] lirui-apache commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox
lirui-apache commented on a change in pull request #8766: [FLINK-12664][hive] 
Implement TableSink to write Hive tables
URL: https://github.com/apache/flink/pull/8766#discussion_r295653875
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSink.java
 ##
 @@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.batch.connectors.hive;
+
+import org.apache.flink.api.common.io.OutputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.table.catalog.exceptions.CatalogException;
+import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientFactory;
+import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper;
+import org.apache.flink.table.catalog.hive.client.HiveShimLoader;
+import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator;
+import org.apache.flink.table.sinks.OutputFormatTableSink;
+import org.apache.flink.table.sinks.TableSink;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.FlinkRuntimeException;
+import org.apache.flink.util.Preconditions;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.MetaStoreUtils;
+import org.apache.hadoop.hive.metastore.Warehouse;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.thrift.TException;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+import java.util.List;
+
+/**
+ * Table sink to write to Hive tables.
+ */
+public class HiveTableSink extends OutputFormatTableSink {
+
+   private final JobConf jobConf;
+   private final RowTypeInfo rowTypeInfo;
+   private final String dbName;
+   private final String tableName;
+   private final List partitionColumns;
+   private final String hiveVersion;
+
+   // TODO: need OverwritableTableSink to configure this
+   private boolean overwrite = false;
+
+   public HiveTableSink(JobConf jobConf, RowTypeInfo rowTypeInfo, String 
dbName, String tableName,
+   List partitionColumns) {
+   this.jobConf = jobConf;
+   this.rowTypeInfo = rowTypeInfo;
+   this.dbName = dbName;
+   this.tableName = tableName;
+   this.partitionColumns = partitionColumns;
+   hiveVersion = 
jobConf.get(HiveCatalogValidator.CATALOG_HIVE_VERSION, 
HiveShimLoader.getHiveVersion());
 
 Review comment:
   Could you explain what's the benefit of doing that?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] lirui-apache commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox
lirui-apache commented on a change in pull request #8766: [FLINK-12664][hive] 
Implement TableSink to write Hive tables
URL: https://github.com/apache/flink/pull/8766#discussion_r295653711
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSink.java
 ##
 @@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.batch.connectors.hive;
+
+import org.apache.flink.api.common.io.OutputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.table.catalog.exceptions.CatalogException;
+import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientFactory;
+import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper;
+import org.apache.flink.table.catalog.hive.client.HiveShimLoader;
+import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator;
+import org.apache.flink.table.sinks.OutputFormatTableSink;
+import org.apache.flink.table.sinks.TableSink;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.FlinkRuntimeException;
+import org.apache.flink.util.Preconditions;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.MetaStoreUtils;
+import org.apache.hadoop.hive.metastore.Warehouse;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.thrift.TException;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+import java.util.List;
+
+/**
+ * Table sink to write to Hive tables.
+ */
+public class HiveTableSink extends OutputFormatTableSink {
+
+   private final JobConf jobConf;
+   private final RowTypeInfo rowTypeInfo;
+   private final String dbName;
+   private final String tableName;
+   private final List partitionColumns;
+   private final String hiveVersion;
+
+   // TODO: need OverwritableTableSink to configure this
+   private boolean overwrite = false;
 
 Review comment:
   As the comment said, `overwrite ` is intended to be configured via the 
`OverwritableTableSink ` interface. Therefore it shouldn't be initialized in 
the constructor.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox
synckey commented on a change in pull request #8799: 
[FLINK-11612][chinese-translation,Documentation] Translate the "Proje…
URL: https://github.com/apache/flink/pull/8799#discussion_r295653024
 
 

 ##
 File path: docs/dev/projectsetup/java_api_quickstart.zh.md
 ##
 @@ -304,51 +297,46 @@ quickstart/
 └── log4j.properties
 {% endhighlight %}
 
-The sample project is a __Gradle project__, which contains two classes: 
_StreamingJob_ and _BatchJob_ are the basic skeleton programs for a 
*DataStream* and *DataSet* program.
-The _main_ method is the entry point of the program, both for in-IDE 
testing/execution and for proper deployments.
+示例项目是一个 __Gradle 项目__,它包含了两个类:_StreamingJob_ 和 _BatchJob_ 是 *DataStream* 和 
*DataSet* 程序的基础骨架程序。
+_main_ 方法是程序的入口,即可用于IDE测试/执行,也可用于部署。
 
-We recommend you __import this project into your IDE__ to develop and
-test it. IntelliJ IDEA supports Gradle projects after installing the `Gradle` 
plugin.
-Eclipse does so via the [Eclipse 
Buildship](https://projects.eclipse.org/projects/tools.buildship) plugin
-(make sure to specify a Gradle version >= 3.0 in the last step of the import 
wizard; the `shadow` plugin requires it).
-You may also use [Gradle's IDE 
integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration)
-to create project files from Gradle.
+我们建议你将 __此项目导入你的 IDE__ 来开发和测试它。
+IntelliJ IDEA 在安装 `Gradle` 插件后支持 Gradle 项目。Eclipse 则通过 [Eclipse 
Buildship](https://projects.eclipse
+.org/projects/tools.buildship) 插件支持 Gradle 项目(鉴于 `shadow` 插件对 Gradle 
版本有要求,请确保在导入向导的最后一步指定 Gradle 版本 >= 3.0)。
+你也可以使用 [Gradle’s IDE 
integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration)
 从 Gradle 
+创建项目文件。
 
 
-*Please note*: The default JVM heapsize for Java may be too
-small for Flink. You have to manually increase it.
-In Eclipse, choose `Run Configurations -> Arguments` and write into the `VM 
Arguments` box: `-Xmx800m`.
-In IntelliJ IDEA recommended way to change JVM options is from the `Help | 
Edit Custom VM Options` menu. See [this 
article](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties)
 for details.
+*请注意*:对 Flink 来说,默认的 JVM 堆内存可能太小,你应当手动增加堆内存。
+在 Eclipse中,选择 `Run Configurations -> Arguments` 并在 `VM Arguments` 
对应的输入框中写入:`-Xmx800m`。
+在 IntelliJ IDEA 中,推荐从菜单 `Help | Edit Custom VM Options` 来修改 JVM 
选项。有关详细信息,请参阅[此文章](https://intellij-support
+.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties)。
 
-### Build Project
+### 构建项目
 
-If you want to __build/package your project__, go to your project directory and
-run the '`gradle clean shadowJar`' command.
-You will __find a JAR file__ that contains your application, plus connectors 
and libraries
-that you may have added as dependencies to the application: 
`build/libs/--all.jar`.
+如果你想要 __构建/打包项目__,请在项目目录下运行 '`gradle clean shadowJar`' 命令。
+命令执行后,你将 __找到一个 JAR 
文件__,里面包含了你的应用程序,以及已作为依赖项添加到应用程序的连接器和库:`build/libs/--all.jar`。
 
 
-__Note:__ If you use a different class than *StreamingJob* as the 
application's main class / entry point,
-we recommend you change the `mainClassName` setting in the `build.gradle` file 
accordingly. That way, Flink
-can run the application from the JAR file without additionally specifying the 
main class.
+__注意:__ 如果你使用其他类而不是 *StreamingJob* 作为应用程序的主类/入口,
+我们建议你相应地修改 `build.gradle` 文件中的 `mainClassName` 配置。
+这样,Flink 可以从 JAR 文件运行应用程序,而无需另外指定主类。
 
-## Next Steps
+## 下一步
 
-Write your application!
+开始编写应用!
 
-If you are writing a streaming application and you are looking for inspiration 
what to write,
-take a look at the [Stream Processing Application Tutorial]({{ site.baseurl 
}}/tutorials/datastream_api.html#writing-a-flink-program).
+如果你准备编写流处理应用,正在寻找灵感来写什么,
+可以看看[流处理应用程序教程]({{ site.baseurl 
}}/zh/tutorials/datastream_api.html#writing-a-flink-program)
 
-If you are writing a batch processing application and you are looking for 
inspiration what to write,
-take a look at the [Batch Application Examples]({{ site.baseurl 
}}/dev/batch/examples.html).
+如果你准备编写批处理应用,正在寻找灵感来写什么,
+可以看看[批处理应用程序示例]({{ site.baseurl }}/zh/dev/batch/examples.html)
 
-For a complete overview over the APIs, have a look at the
-[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and
-[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections.
+有关 API 的完整概述,请查看
+[DataStream API]({{ site.baseurl }}/zh/dev/datastream_api.html) 和
+[DataSet API]({{ site.baseurl }}/zh/dev/batch/index.html) 章节。
 
-[Here]({{ site.baseurl }}/tutorials/local_setup.html) you can find out how to 
run an application outside the IDE on a local cluster.
+在[这里]({{ site.baseurl }}/zh/tutorials/local_setup.html),你可以找到如何在 IDE 
之外的本地集群中运行应用程序。
 
 Review comment:
   Oh sorry, my fault.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above 

[jira] [Closed] (FLINK-12856) Introduce planner rule to push projection into TableSource

2019-06-19 Thread Kurt Young (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Young closed FLINK-12856.
--
   Resolution: Implemented
Fix Version/s: 1.9.0

merged in 1.9.0: 800fe61cb6074eed0311abd4634d71f5569451b5

> Introduce planner rule to push projection into TableSource
> --
>
> Key: FLINK-12856
> URL: https://issues.apache.org/jira/browse/FLINK-12856
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Planner
>Reporter: godfrey he
>Assignee: godfrey he
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue aims to support push projection into ProjectableTableSource or 
> NestedFieldsProjectableTableSource to reduce output fields of a TableSource



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] KurtYoung merged pull request #8747: [FLINK-12856] [table-planner-blink] Introduce planner rule to push projection into TableSource

2019-06-19 Thread GitBox
KurtYoung merged pull request #8747: [FLINK-12856] [table-planner-blink] 
Introduce planner rule to push projection into TableSource
URL: https://github.com/apache/flink/pull/8747
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] yuezhuangshi commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox
yuezhuangshi commented on a change in pull request #8799: 
[FLINK-11612][chinese-translation,Documentation] Translate the "Proje…
URL: https://github.com/apache/flink/pull/8799#discussion_r295652737
 
 

 ##
 File path: docs/dev/projectsetup/java_api_quickstart.zh.md
 ##
 @@ -304,51 +297,46 @@ quickstart/
 └── log4j.properties
 {% endhighlight %}
 
-The sample project is a __Gradle project__, which contains two classes: 
_StreamingJob_ and _BatchJob_ are the basic skeleton programs for a 
*DataStream* and *DataSet* program.
-The _main_ method is the entry point of the program, both for in-IDE 
testing/execution and for proper deployments.
+示例项目是一个 __Gradle 项目__,它包含了两个类:_StreamingJob_ 和 _BatchJob_ 是 *DataStream* 和 
*DataSet* 程序的基础骨架程序。
+_main_ 方法是程序的入口,即可用于IDE测试/执行,也可用于部署。
 
-We recommend you __import this project into your IDE__ to develop and
-test it. IntelliJ IDEA supports Gradle projects after installing the `Gradle` 
plugin.
-Eclipse does so via the [Eclipse 
Buildship](https://projects.eclipse.org/projects/tools.buildship) plugin
-(make sure to specify a Gradle version >= 3.0 in the last step of the import 
wizard; the `shadow` plugin requires it).
-You may also use [Gradle's IDE 
integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration)
-to create project files from Gradle.
+我们建议你将 __此项目导入你的 IDE__ 来开发和测试它。
+IntelliJ IDEA 在安装 `Gradle` 插件后支持 Gradle 项目。Eclipse 则通过 [Eclipse 
Buildship](https://projects.eclipse
+.org/projects/tools.buildship) 插件支持 Gradle 项目(鉴于 `shadow` 插件对 Gradle 
版本有要求,请确保在导入向导的最后一步指定 Gradle 版本 >= 3.0)。
+你也可以使用 [Gradle’s IDE 
integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration)
 从 Gradle 
+创建项目文件。
 
 
-*Please note*: The default JVM heapsize for Java may be too
-small for Flink. You have to manually increase it.
-In Eclipse, choose `Run Configurations -> Arguments` and write into the `VM 
Arguments` box: `-Xmx800m`.
-In IntelliJ IDEA recommended way to change JVM options is from the `Help | 
Edit Custom VM Options` menu. See [this 
article](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties)
 for details.
+*请注意*:对 Flink 来说,默认的 JVM 堆内存可能太小,你应当手动增加堆内存。
+在 Eclipse中,选择 `Run Configurations -> Arguments` 并在 `VM Arguments` 
对应的输入框中写入:`-Xmx800m`。
+在 IntelliJ IDEA 中,推荐从菜单 `Help | Edit Custom VM Options` 来修改 JVM 
选项。有关详细信息,请参阅[此文章](https://intellij-support
+.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties)。
 
-### Build Project
+### 构建项目
 
-If you want to __build/package your project__, go to your project directory and
-run the '`gradle clean shadowJar`' command.
-You will __find a JAR file__ that contains your application, plus connectors 
and libraries
-that you may have added as dependencies to the application: 
`build/libs/--all.jar`.
+如果你想要 __构建/打包项目__,请在项目目录下运行 '`gradle clean shadowJar`' 命令。
+命令执行后,你将 __找到一个 JAR 
文件__,里面包含了你的应用程序,以及已作为依赖项添加到应用程序的连接器和库:`build/libs/--all.jar`。
 
 
-__Note:__ If you use a different class than *StreamingJob* as the 
application's main class / entry point,
-we recommend you change the `mainClassName` setting in the `build.gradle` file 
accordingly. That way, Flink
-can run the application from the JAR file without additionally specifying the 
main class.
+__注意:__ 如果你使用其他类而不是 *StreamingJob* 作为应用程序的主类/入口,
+我们建议你相应地修改 `build.gradle` 文件中的 `mainClassName` 配置。
+这样,Flink 可以从 JAR 文件运行应用程序,而无需另外指定主类。
 
-## Next Steps
+## 下一步
 
-Write your application!
+开始编写应用!
 
-If you are writing a streaming application and you are looking for inspiration 
what to write,
-take a look at the [Stream Processing Application Tutorial]({{ site.baseurl 
}}/tutorials/datastream_api.html#writing-a-flink-program).
+如果你准备编写流处理应用,正在寻找灵感来写什么,
+可以看看[流处理应用程序教程]({{ site.baseurl 
}}/zh/tutorials/datastream_api.html#writing-a-flink-program)
 
-If you are writing a batch processing application and you are looking for 
inspiration what to write,
-take a look at the [Batch Application Examples]({{ site.baseurl 
}}/dev/batch/examples.html).
+如果你准备编写批处理应用,正在寻找灵感来写什么,
+可以看看[批处理应用程序示例]({{ site.baseurl }}/zh/dev/batch/examples.html)
 
-For a complete overview over the APIs, have a look at the
-[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and
-[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections.
+有关 API 的完整概述,请查看
+[DataStream API]({{ site.baseurl }}/zh/dev/datastream_api.html) 和
+[DataSet API]({{ site.baseurl }}/zh/dev/batch/index.html) 章节。
 
-[Here]({{ site.baseurl }}/tutorials/local_setup.html) you can find out how to 
run an application outside the IDE on a local cluster.
+在[这里]({{ site.baseurl }}/zh/tutorials/local_setup.html),你可以找到如何在 IDE 
之外的本地集群中运行应用程序。
 
 Review comment:
   I think it can be assessed from this link : 

[jira] [Comment Edited] (FLINK-12849) Add support for build Python Docs in Buildbot

2019-06-19 Thread sunjincheng (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868272#comment-16868272
 ] 

sunjincheng edited comment on FLINK-12849 at 6/20/19 5:47 AM:
--

We successfully built PythonDocs:

master: [https://ci.apache.org/builders/flink-docs-master/builds/1509] 

release-1.7: [https://ci.apache.org/builders/flink-docs-release-1.7/builds/204]

then I'll close this JIRA. 

Thanks again for your help [~aljoscha] [~Zentol] :)

 


was (Author: sunjincheng121):
We successfully built PythonDocs, 
[https://ci.apache.org/builders/flink-docs-master/builds/1509] ,then I'll close 
this JIRA. 

Thanks again for your help [~aljoscha] [~Zentol] :)

 

> Add support for build Python Docs in Buildbot
> -
>
> Key: FLINK-12849
> URL: https://issues.apache.org/jira/browse/FLINK-12849
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python, Build System
>Affects Versions: 1.9.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: image-2019-06-14-16-14-35-439.png, 
> image-2019-06-20-13-45-40-876.png, python_docs.patch
>
>
> We should add the Python Doc for Python API, and add the link to the web 
> page. i.e.:
> !image-2019-06-14-16-14-35-439.png!
> In FLINK-12720 we will add how to generate the Python Docs, and in this PR we 
> should add support for build Python Docs in Buildbot. We may need to modify 
> the build config:
> [https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf]
>  
> the Wiki of how to change the Buildbot code: 
> [https://cwiki.apache.org/confluence/display/FLINK/Managing+Flink+Documentation]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-12849) Add support for build Python Docs in Buildbot

2019-06-19 Thread sunjincheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng closed FLINK-12849.
---
   Resolution: Fixed
Fix Version/s: 1.9.0

Fixed in the SVN, the change code is as follows:

!image-2019-06-20-13-45-05-090.png!

> Add support for build Python Docs in Buildbot
> -
>
> Key: FLINK-12849
> URL: https://issues.apache.org/jira/browse/FLINK-12849
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python, Build System
>Affects Versions: 1.9.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: image-2019-06-14-16-14-35-439.png, 
> image-2019-06-20-13-45-40-876.png, python_docs.patch
>
>
> We should add the Python Doc for Python API, and add the link to the web 
> page. i.e.:
> !image-2019-06-14-16-14-35-439.png!
> In FLINK-12720 we will add how to generate the Python Docs, and in this PR we 
> should add support for build Python Docs in Buildbot. We may need to modify 
> the build config:
> [https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf]
>  
> the Wiki of how to change the Buildbot code: 
> [https://cwiki.apache.org/confluence/display/FLINK/Managing+Flink+Documentation]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-12849) Add support for build Python Docs in Buildbot

2019-06-19 Thread sunjincheng (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868273#comment-16868273
 ] 

sunjincheng edited comment on FLINK-12849 at 6/20/19 5:45 AM:
--

Fixed in the SVN, the change code is as follows:

!image-2019-06-20-13-45-40-876.png!


was (Author: sunjincheng121):
Fixed in the SVN, the change code is as follows:

!image-2019-06-20-13-45-05-090.png!

> Add support for build Python Docs in Buildbot
> -
>
> Key: FLINK-12849
> URL: https://issues.apache.org/jira/browse/FLINK-12849
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python, Build System
>Affects Versions: 1.9.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: image-2019-06-14-16-14-35-439.png, 
> image-2019-06-20-13-45-40-876.png, python_docs.patch
>
>
> We should add the Python Doc for Python API, and add the link to the web 
> page. i.e.:
> !image-2019-06-14-16-14-35-439.png!
> In FLINK-12720 we will add how to generate the Python Docs, and in this PR we 
> should add support for build Python Docs in Buildbot. We may need to modify 
> the build config:
> [https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf]
>  
> the Wiki of how to change the Buildbot code: 
> [https://cwiki.apache.org/confluence/display/FLINK/Managing+Flink+Documentation]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] zhijiangW commented on issue #8761: [FLINK-12842][network] Fix invalid check released state during ResultPartition#createSubpartitionView

2019-06-19 Thread GitBox
zhijiangW commented on issue #8761: [FLINK-12842][network] Fix invalid check 
released state during ResultPartition#createSubpartitionView
URL: https://github.com/apache/flink/pull/8761#issuecomment-503884044
 
 
   Thanks @zentol , it has already green.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-12849) Add support for build Python Docs in Buildbot

2019-06-19 Thread sunjincheng (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868272#comment-16868272
 ] 

sunjincheng commented on FLINK-12849:
-

We successfully built PythonDocs, 
[https://ci.apache.org/builders/flink-docs-master/builds/1509] ,then I'll close 
this JIRA. 

Thanks again for your help [~aljoscha] [~Zentol] :)

 

> Add support for build Python Docs in Buildbot
> -
>
> Key: FLINK-12849
> URL: https://issues.apache.org/jira/browse/FLINK-12849
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python, Build System
>Affects Versions: 1.9.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Attachments: image-2019-06-14-16-14-35-439.png, python_docs.patch
>
>
> We should add the Python Doc for Python API, and add the link to the web 
> page. i.e.:
> !image-2019-06-14-16-14-35-439.png!
> In FLINK-12720 we will add how to generate the Python Docs, and in this PR we 
> should add support for build Python Docs in Buildbot. We may need to modify 
> the build config:
> [https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf]
>  
> the Wiki of how to change the Buildbot code: 
> [https://cwiki.apache.org/confluence/display/FLINK/Managing+Flink+Documentation]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox
synckey commented on a change in pull request #8766: [FLINK-12664][hive] 
Implement TableSink to write Hive tables
URL: https://github.com/apache/flink/pull/8766#discussion_r295649613
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSink.java
 ##
 @@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.batch.connectors.hive;
+
+import org.apache.flink.api.common.io.OutputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.table.catalog.exceptions.CatalogException;
+import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientFactory;
+import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper;
+import org.apache.flink.table.catalog.hive.client.HiveShimLoader;
+import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator;
+import org.apache.flink.table.sinks.OutputFormatTableSink;
+import org.apache.flink.table.sinks.TableSink;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.FlinkRuntimeException;
+import org.apache.flink.util.Preconditions;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.MetaStoreUtils;
+import org.apache.hadoop.hive.metastore.Warehouse;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.thrift.TException;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+import java.util.List;
+
+/**
+ * Table sink to write to Hive tables.
+ */
+public class HiveTableSink extends OutputFormatTableSink {
+
+   private final JobConf jobConf;
+   private final RowTypeInfo rowTypeInfo;
+   private final String dbName;
+   private final String tableName;
+   private final List partitionColumns;
+   private final String hiveVersion;
+
+   // TODO: need OverwritableTableSink to configure this
+   private boolean overwrite = false;
+
+   public HiveTableSink(JobConf jobConf, RowTypeInfo rowTypeInfo, String 
dbName, String tableName,
+   List partitionColumns) {
+   this.jobConf = jobConf;
+   this.rowTypeInfo = rowTypeInfo;
+   this.dbName = dbName;
+   this.tableName = tableName;
+   this.partitionColumns = partitionColumns;
+   hiveVersion = 
jobConf.get(HiveCatalogValidator.CATALOG_HIVE_VERSION, 
HiveShimLoader.getHiveVersion());
+   }
+
+   @Override
+   public OutputFormat getOutputFormat() {
+   boolean isPartitioned = partitionColumns != null && 
!partitionColumns.isEmpty();
+   // TODO: need PartitionableTableSink to decide whether it's 
dynamic partitioning
+   boolean isDynamicPartition = isPartitioned;
+   try (HiveMetastoreClientWrapper client = 
HiveMetastoreClientFactory.create(new HiveConf(jobConf, HiveConf.class), 
hiveVersion)) {
+   Table table = client.getTable(dbName, tableName);
+   StorageDescriptor sd = table.getSd();
+   // here we use the sdLocation to store the output path 
of the job, which is always a staging dir
+   String sdLocation = sd.getLocation();
+   HiveTablePartition hiveTablePartition;
+   if (isPartitioned) {
+   // TODO: validate partition spec
+   // TODO: strip quotes in partition values
+   LinkedHashMap strippedPartSpec 
= new LinkedHashMap<>();
 
 Review comment:
   Is there a reason to make strippedPartSpec `LinkedHashMap` not `Map`?


This is an automated message from the Apache Git Service.
To 

[GitHub] [flink] synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox
synckey commented on a change in pull request #8766: [FLINK-12664][hive] 
Implement TableSink to write Hive tables
URL: https://github.com/apache/flink/pull/8766#discussion_r295648808
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSink.java
 ##
 @@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.batch.connectors.hive;
+
+import org.apache.flink.api.common.io.OutputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.table.catalog.exceptions.CatalogException;
+import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientFactory;
+import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper;
+import org.apache.flink.table.catalog.hive.client.HiveShimLoader;
+import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator;
+import org.apache.flink.table.sinks.OutputFormatTableSink;
+import org.apache.flink.table.sinks.TableSink;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.FlinkRuntimeException;
+import org.apache.flink.util.Preconditions;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.MetaStoreUtils;
+import org.apache.hadoop.hive.metastore.Warehouse;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.thrift.TException;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+import java.util.List;
+
+/**
+ * Table sink to write to Hive tables.
+ */
+public class HiveTableSink extends OutputFormatTableSink {
+
+   private final JobConf jobConf;
+   private final RowTypeInfo rowTypeInfo;
+   private final String dbName;
+   private final String tableName;
+   private final List partitionColumns;
+   private final String hiveVersion;
+
+   // TODO: need OverwritableTableSink to configure this
+   private boolean overwrite = false;
 
 Review comment:
   lint: would you move the initiation of `overwrite` to constructor?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox
synckey commented on a change in pull request #8766: [FLINK-12664][hive] 
Implement TableSink to write Hive tables
URL: https://github.com/apache/flink/pull/8766#discussion_r295648942
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSink.java
 ##
 @@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.batch.connectors.hive;
+
+import org.apache.flink.api.common.io.OutputFormat;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.table.catalog.exceptions.CatalogException;
+import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientFactory;
+import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper;
+import org.apache.flink.table.catalog.hive.client.HiveShimLoader;
+import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator;
+import org.apache.flink.table.sinks.OutputFormatTableSink;
+import org.apache.flink.table.sinks.TableSink;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.FlinkRuntimeException;
+import org.apache.flink.util.Preconditions;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.MetaStoreUtils;
+import org.apache.hadoop.hive.metastore.Warehouse;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.thrift.TException;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+import java.util.List;
+
+/**
+ * Table sink to write to Hive tables.
+ */
+public class HiveTableSink extends OutputFormatTableSink {
+
+   private final JobConf jobConf;
+   private final RowTypeInfo rowTypeInfo;
+   private final String dbName;
+   private final String tableName;
+   private final List partitionColumns;
+   private final String hiveVersion;
+
+   // TODO: need OverwritableTableSink to configure this
+   private boolean overwrite = false;
+
+   public HiveTableSink(JobConf jobConf, RowTypeInfo rowTypeInfo, String 
dbName, String tableName,
+   List partitionColumns) {
+   this.jobConf = jobConf;
+   this.rowTypeInfo = rowTypeInfo;
+   this.dbName = dbName;
+   this.tableName = tableName;
+   this.partitionColumns = partitionColumns;
+   hiveVersion = 
jobConf.get(HiveCatalogValidator.CATALOG_HIVE_VERSION, 
HiveShimLoader.getHiveVersion());
 
 Review comment:
   lint: would you also add a `this` before `hiveVersion`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox
synckey commented on a change in pull request #8799: 
[FLINK-11612][chinese-translation,Documentation] Translate the "Proje…
URL: https://github.com/apache/flink/pull/8799#discussion_r295647061
 
 

 ##
 File path: docs/dev/projectsetup/java_api_quickstart.zh.md
 ##
 @@ -304,51 +297,46 @@ quickstart/
 └── log4j.properties
 {% endhighlight %}
 
-The sample project is a __Gradle project__, which contains two classes: 
_StreamingJob_ and _BatchJob_ are the basic skeleton programs for a 
*DataStream* and *DataSet* program.
-The _main_ method is the entry point of the program, both for in-IDE 
testing/execution and for proper deployments.
+示例项目是一个 __Gradle 项目__,它包含了两个类:_StreamingJob_ 和 _BatchJob_ 是 *DataStream* 和 
*DataSet* 程序的基础骨架程序。
+_main_ 方法是程序的入口,即可用于IDE测试/执行,也可用于部署。
 
-We recommend you __import this project into your IDE__ to develop and
-test it. IntelliJ IDEA supports Gradle projects after installing the `Gradle` 
plugin.
-Eclipse does so via the [Eclipse 
Buildship](https://projects.eclipse.org/projects/tools.buildship) plugin
-(make sure to specify a Gradle version >= 3.0 in the last step of the import 
wizard; the `shadow` plugin requires it).
-You may also use [Gradle's IDE 
integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration)
-to create project files from Gradle.
+我们建议你将 __此项目导入你的 IDE__ 来开发和测试它。
+IntelliJ IDEA 在安装 `Gradle` 插件后支持 Gradle 项目。Eclipse 则通过 [Eclipse 
Buildship](https://projects.eclipse
+.org/projects/tools.buildship) 插件支持 Gradle 项目(鉴于 `shadow` 插件对 Gradle 
版本有要求,请确保在导入向导的最后一步指定 Gradle 版本 >= 3.0)。
+你也可以使用 [Gradle’s IDE 
integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration)
 从 Gradle 
+创建项目文件。
 
 
-*Please note*: The default JVM heapsize for Java may be too
-small for Flink. You have to manually increase it.
-In Eclipse, choose `Run Configurations -> Arguments` and write into the `VM 
Arguments` box: `-Xmx800m`.
-In IntelliJ IDEA recommended way to change JVM options is from the `Help | 
Edit Custom VM Options` menu. See [this 
article](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties)
 for details.
+*请注意*:对 Flink 来说,默认的 JVM 堆内存可能太小,你应当手动增加堆内存。
+在 Eclipse中,选择 `Run Configurations -> Arguments` 并在 `VM Arguments` 
对应的输入框中写入:`-Xmx800m`。
+在 IntelliJ IDEA 中,推荐从菜单 `Help | Edit Custom VM Options` 来修改 JVM 
选项。有关详细信息,请参阅[此文章](https://intellij-support
+.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties)。
 
-### Build Project
+### 构建项目
 
-If you want to __build/package your project__, go to your project directory and
-run the '`gradle clean shadowJar`' command.
-You will __find a JAR file__ that contains your application, plus connectors 
and libraries
-that you may have added as dependencies to the application: 
`build/libs/--all.jar`.
+如果你想要 __构建/打包项目__,请在项目目录下运行 '`gradle clean shadowJar`' 命令。
+命令执行后,你将 __找到一个 JAR 
文件__,里面包含了你的应用程序,以及已作为依赖项添加到应用程序的连接器和库:`build/libs/--all.jar`。
 
 
-__Note:__ If you use a different class than *StreamingJob* as the 
application's main class / entry point,
-we recommend you change the `mainClassName` setting in the `build.gradle` file 
accordingly. That way, Flink
-can run the application from the JAR file without additionally specifying the 
main class.
+__注意:__ 如果你使用其他类而不是 *StreamingJob* 作为应用程序的主类/入口,
+我们建议你相应地修改 `build.gradle` 文件中的 `mainClassName` 配置。
+这样,Flink 可以从 JAR 文件运行应用程序,而无需另外指定主类。
 
-## Next Steps
+## 下一步
 
-Write your application!
+开始编写应用!
 
-If you are writing a streaming application and you are looking for inspiration 
what to write,
-take a look at the [Stream Processing Application Tutorial]({{ site.baseurl 
}}/tutorials/datastream_api.html#writing-a-flink-program).
+如果你准备编写流处理应用,正在寻找灵感来写什么,
+可以看看[流处理应用程序教程]({{ site.baseurl 
}}/zh/tutorials/datastream_api.html#writing-a-flink-program)
 
-If you are writing a batch processing application and you are looking for 
inspiration what to write,
-take a look at the [Batch Application Examples]({{ site.baseurl 
}}/dev/batch/examples.html).
+如果你准备编写批处理应用,正在寻找灵感来写什么,
+可以看看[批处理应用程序示例]({{ site.baseurl }}/zh/dev/batch/examples.html)
 
-For a complete overview over the APIs, have a look at the
-[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and
-[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections.
+有关 API 的完整概述,请查看
+[DataStream API]({{ site.baseurl }}/zh/dev/datastream_api.html) 和
+[DataSet API]({{ site.baseurl }}/zh/dev/batch/index.html) 章节。
 
-[Here]({{ site.baseurl }}/tutorials/local_setup.html) you can find out how to 
run an application outside the IDE on a local cluster.
+在[这里]({{ site.baseurl }}/zh/tutorials/local_setup.html),你可以找到如何在 IDE 
之外的本地集群中运行应用程序。
 
 Review comment:
   Looks like this link https://flink.apache.org/zh/tutorials/local_setup.html 
is dead.


This is an automated message from the Apache Git Service.
To 

[GitHub] [flink] synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox
synckey commented on a change in pull request #8799: 
[FLINK-11612][chinese-translation,Documentation] Translate the "Proje…
URL: https://github.com/apache/flink/pull/8799#discussion_r295646328
 
 

 ##
 File path: docs/dev/projectsetup/java_api_quickstart.zh.md
 ##
 @@ -219,15 +215,15 @@ configurations {
 // declare the dependencies for your production and test code
 dependencies {
 // --
-// Compile-time dependencies that should NOT be part of the
-// shadow jar and are provided in the lib folder of Flink
+// 编译时依赖不应该包含在 shadow jar 中,
+// 这些依赖会在 Flink 的 lib 目录中提供。
 // --
 compile "org.apache.flink:flink-java:${flinkVersion}"
 compile 
"org.apache.flink:flink-streaming-java_${scalaBinaryVersion}:${flinkVersion}"
 
 // --
-// Dependencies that should be part of the shadow jar, e.g.
-// connectors. These must be in the flinkShadowJar configuration!
+// 应该包含在 shadow jar 中的依赖,例如:连接器。
+// 这些必须在 flinkShadowJar 的配置中!
 
 Review comment:
   What about `他们` instead of `他们`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox
synckey commented on a change in pull request #8799: 
[FLINK-11612][chinese-translation,Documentation] Translate the "Proje…
URL: https://github.com/apache/flink/pull/8799#discussion_r295645437
 
 

 ##
 File path: docs/dev/projectsetup/java_api_quickstart.zh.md
 ##
 @@ -74,16 +74,17 @@ Use one of the following commands to __create a project__:
 
 {% unless site.is_stable %}
 
-Note: For Maven 3.0 or higher, it is no longer possible to 
specify the repository (-DarchetypeCatalog) via the command line. If you wish 
to use the snapshot repository, you need to add a repository entry to your 
settings.xml. For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html;>Maven
 official document
+注意:对于 Maven 3.0 及更高版本,不再支持通过命令行指定仓库(-DarchetypeCatalog)。
 
 Review comment:
   Would you remove `对于`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (FLINK-12886) Support container memory segment

2019-06-19 Thread Jingsong Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868265#comment-16868265
 ] 

Jingsong Lee edited comment on FLINK-12886 at 6/20/19 5:16 AM:
---

Back to the original intention of Jira and consider option 2, what kind of 
code do you want to optimize? Maybe you can talk deeply to the detail code 
instead of introduce a new big and abstract something. Maybe just a little bit 
of code refactoring is needed to achieve good results.


was (Author: lzljs3620320):
 

Back to the original intention of Jira and consider option 2, what kind of 
code do you want to optimize? Maybe you can talk deeply to the detail code 
instead of introduce a new big and abstract something. Maybe just a little bit 
of code refactoring is needed to achieve good results.

> Support container memory segment
> 
>
> Key: FLINK-12886
> URL: https://issues.apache.org/jira/browse/FLINK-12886
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Runtime
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2019-06-18-17-59-42-136.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We observe that in many scenarios, the operations/algorithms are based on an 
> array of MemorySegment. These memory segments form a large, combined, and 
> continuous memory space.
> For example, suppose we have an array of n memory segments. Memory addresses 
> from 0 to segment_size - 1 are served by the first memory segment; memory 
> addresses from segment_size to 2 * segment_size - 1 are served by the second 
> memory segment, and so on.
> Specific algorithms decide the actual MemorySegment to serve the operation 
> requests. For some rare cases, two or more memory segments serve the 
> requests. There are many operations based on such a paradigm, for example, 
> {{BinaryString#matchAt}}, {{SegmentsUtil#copyToBytes}}, 
> {{LongHashPartition#MatchIterator#get}}, etc.
> The problem is that, for memory segment array based operations, large amounts 
> of code is devoted to
> 1. Computing the memory segment index & offset within the memory segment.
>  2. Processing boundary cases. For example, to write an integer, there are 
> only 2 bytes left in the first memory segment, and the remaining 2 bytes must 
> be written to the next memory segment.
>  3. Differentiate processing for short/long data. For example, when copying 
> memory data to a byte array. Different methods are implemented for cases when 
> 1) the data fits in a single segment; 2) the data spans multiple segments.
> Therefore, there are much duplicated code to achieve above purposes. What is 
> worse, this paradigm significantly increases the amount of code, making the 
> code more difficult to read and maintain. Furthermore, it easily gives rise 
> to bugs which difficult to find and debug.
> To address these problems, we propose a new type of memory segment: 
> {{ContainerMemorySegment}}. It is based on an array of underlying memory 
> segments with the same size. It extends from the {{MemorySegment}} base 
> class, so it provides all the functionalities provided by {{MemorySegment}}. 
> In addition, it hides all the details for dealing with specific memory 
> segments, and acts as if it were a big continuous memory region.
> A prototype implementation is given below:
>  !image-2019-06-18-17-59-42-136.png|thumbnail! 
> With this new type of memory segment, many operations/algorithms can be 
> greatly simplified, without affecting performance. This is because,
> 1. Many checks, boundary processing are already there. We just move them to 
> the new class.
>  2. We optimize the implementation of the new class, so the special 
> optimizations (e.g. optimizations for short data) are still preserved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12886) Support container memory segment

2019-06-19 Thread Jingsong Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868265#comment-16868265
 ] 

Jingsong Lee commented on FLINK-12886:
--

 

Back to the original intention of Jira and consider option 2, what kind of 
code do you want to optimize? Maybe you can talk deeply to the detail code 
instead of introduce a new big and abstract something. Maybe just a little bit 
of code refactoring is needed to achieve good results.

> Support container memory segment
> 
>
> Key: FLINK-12886
> URL: https://issues.apache.org/jira/browse/FLINK-12886
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Runtime
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2019-06-18-17-59-42-136.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We observe that in many scenarios, the operations/algorithms are based on an 
> array of MemorySegment. These memory segments form a large, combined, and 
> continuous memory space.
> For example, suppose we have an array of n memory segments. Memory addresses 
> from 0 to segment_size - 1 are served by the first memory segment; memory 
> addresses from segment_size to 2 * segment_size - 1 are served by the second 
> memory segment, and so on.
> Specific algorithms decide the actual MemorySegment to serve the operation 
> requests. For some rare cases, two or more memory segments serve the 
> requests. There are many operations based on such a paradigm, for example, 
> {{BinaryString#matchAt}}, {{SegmentsUtil#copyToBytes}}, 
> {{LongHashPartition#MatchIterator#get}}, etc.
> The problem is that, for memory segment array based operations, large amounts 
> of code is devoted to
> 1. Computing the memory segment index & offset within the memory segment.
>  2. Processing boundary cases. For example, to write an integer, there are 
> only 2 bytes left in the first memory segment, and the remaining 2 bytes must 
> be written to the next memory segment.
>  3. Differentiate processing for short/long data. For example, when copying 
> memory data to a byte array. Different methods are implemented for cases when 
> 1) the data fits in a single segment; 2) the data spans multiple segments.
> Therefore, there are much duplicated code to achieve above purposes. What is 
> worse, this paradigm significantly increases the amount of code, making the 
> code more difficult to read and maintain. Furthermore, it easily gives rise 
> to bugs which difficult to find and debug.
> To address these problems, we propose a new type of memory segment: 
> {{ContainerMemorySegment}}. It is based on an array of underlying memory 
> segments with the same size. It extends from the {{MemorySegment}} base 
> class, so it provides all the functionalities provided by {{MemorySegment}}. 
> In addition, it hides all the details for dealing with specific memory 
> segments, and acts as if it were a big continuous memory region.
> A prototype implementation is given below:
>  !image-2019-06-18-17-59-42-136.png|thumbnail! 
> With this new type of memory segment, many operations/algorithms can be 
> greatly simplified, without affecting performance. This is because,
> 1. Many checks, boundary processing are already there. We just move them to 
> the new class.
>  2. We optimize the implementation of the new class, so the special 
> optimizations (e.g. optimizations for short data) are still preserved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12886) Support container memory segment

2019-06-19 Thread Kurt Young (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868254#comment-16868254
 ] 

Kurt Young commented on FLINK-12886:


Generally speak I would vote for option 2, but let's first decide whether it's 
worthy to have a new utility class. 

> Support container memory segment
> 
>
> Key: FLINK-12886
> URL: https://issues.apache.org/jira/browse/FLINK-12886
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Runtime
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2019-06-18-17-59-42-136.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We observe that in many scenarios, the operations/algorithms are based on an 
> array of MemorySegment. These memory segments form a large, combined, and 
> continuous memory space.
> For example, suppose we have an array of n memory segments. Memory addresses 
> from 0 to segment_size - 1 are served by the first memory segment; memory 
> addresses from segment_size to 2 * segment_size - 1 are served by the second 
> memory segment, and so on.
> Specific algorithms decide the actual MemorySegment to serve the operation 
> requests. For some rare cases, two or more memory segments serve the 
> requests. There are many operations based on such a paradigm, for example, 
> {{BinaryString#matchAt}}, {{SegmentsUtil#copyToBytes}}, 
> {{LongHashPartition#MatchIterator#get}}, etc.
> The problem is that, for memory segment array based operations, large amounts 
> of code is devoted to
> 1. Computing the memory segment index & offset within the memory segment.
>  2. Processing boundary cases. For example, to write an integer, there are 
> only 2 bytes left in the first memory segment, and the remaining 2 bytes must 
> be written to the next memory segment.
>  3. Differentiate processing for short/long data. For example, when copying 
> memory data to a byte array. Different methods are implemented for cases when 
> 1) the data fits in a single segment; 2) the data spans multiple segments.
> Therefore, there are much duplicated code to achieve above purposes. What is 
> worse, this paradigm significantly increases the amount of code, making the 
> code more difficult to read and maintain. Furthermore, it easily gives rise 
> to bugs which difficult to find and debug.
> To address these problems, we propose a new type of memory segment: 
> {{ContainerMemorySegment}}. It is based on an array of underlying memory 
> segments with the same size. It extends from the {{MemorySegment}} base 
> class, so it provides all the functionalities provided by {{MemorySegment}}. 
> In addition, it hides all the details for dealing with specific memory 
> segments, and acts as if it were a big continuous memory region.
> A prototype implementation is given below:
>  !image-2019-06-18-17-59-42-136.png|thumbnail! 
> With this new type of memory segment, many operations/algorithms can be 
> greatly simplified, without affecting performance. This is because,
> 1. Many checks, boundary processing are already there. We just move them to 
> the new class.
>  2. We optimize the implementation of the new class, so the special 
> optimizations (e.g. optimizations for short data) are still preserved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12886) Support container memory segment

2019-06-19 Thread Liya Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868251#comment-16868251
 ] 

Liya Fan commented on FLINK-12886:
--

[~ykt836] [~lzljs3620320], two ideas to resolve the performance degradation. 
Would you please give some comments?

1. Let ContainerMemorySegment and MemorySegment extends a common super 
interface, which defines the basic operations for accessing data:

public interface MemoryAccessible {

public int getInt(int index);

public void setInt(int index, int value);

...

}

public class MemorySegment implements MemoryAccessible ...

public class ContainerMemorySegment implements MemoryAccessible ...

 

For this method, the MemorySegment class hierarchy is unaffected, so code that 
depends on MemorySegment does not have performance affected. In addition, the 
code that expects a MemoryAccessible can accept both a MemorySegment and a 
ContainerMemorySegment.

 

2. ContainerMemorySegment no longer inherits from MemorySegment. In this way, 
ContainerMemorySegment just acts a wrapper for a set of MemorySegment. So 
wherever a MemorySegment is expected, a ContainerMemorySegment cannot be 
provided. 

Also, ContainerMemorySegment can be moved to module blink-runtime, because it 
is not a general MemorySegment.

 

 

 

> Support container memory segment
> 
>
> Key: FLINK-12886
> URL: https://issues.apache.org/jira/browse/FLINK-12886
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / Runtime
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2019-06-18-17-59-42-136.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We observe that in many scenarios, the operations/algorithms are based on an 
> array of MemorySegment. These memory segments form a large, combined, and 
> continuous memory space.
> For example, suppose we have an array of n memory segments. Memory addresses 
> from 0 to segment_size - 1 are served by the first memory segment; memory 
> addresses from segment_size to 2 * segment_size - 1 are served by the second 
> memory segment, and so on.
> Specific algorithms decide the actual MemorySegment to serve the operation 
> requests. For some rare cases, two or more memory segments serve the 
> requests. There are many operations based on such a paradigm, for example, 
> {{BinaryString#matchAt}}, {{SegmentsUtil#copyToBytes}}, 
> {{LongHashPartition#MatchIterator#get}}, etc.
> The problem is that, for memory segment array based operations, large amounts 
> of code is devoted to
> 1. Computing the memory segment index & offset within the memory segment.
>  2. Processing boundary cases. For example, to write an integer, there are 
> only 2 bytes left in the first memory segment, and the remaining 2 bytes must 
> be written to the next memory segment.
>  3. Differentiate processing for short/long data. For example, when copying 
> memory data to a byte array. Different methods are implemented for cases when 
> 1) the data fits in a single segment; 2) the data spans multiple segments.
> Therefore, there are much duplicated code to achieve above purposes. What is 
> worse, this paradigm significantly increases the amount of code, making the 
> code more difficult to read and maintain. Furthermore, it easily gives rise 
> to bugs which difficult to find and debug.
> To address these problems, we propose a new type of memory segment: 
> {{ContainerMemorySegment}}. It is based on an array of underlying memory 
> segments with the same size. It extends from the {{MemorySegment}} base 
> class, so it provides all the functionalities provided by {{MemorySegment}}. 
> In addition, it hides all the details for dealing with specific memory 
> segments, and acts as if it were a big continuous memory region.
> A prototype implementation is given below:
>  !image-2019-06-18-17-59-42-136.png|thumbnail! 
> With this new type of memory segment, many operations/algorithms can be 
> greatly simplified, without affecting performance. This is because,
> 1. Many checks, boundary processing are already there. We just move them to 
> the new class.
>  2. We optimize the implementation of the new class, so the special 
> optimizations (e.g. optimizations for short data) are still preserved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot commented on issue #8801: [hotfix][FLINK-12896][HistoryServer] modify :jobId key in TaskCheckpointStatisticDetailsHandler

2019-06-19 Thread GitBox
flinkbot commented on issue #8801: [hotfix][FLINK-12896][HistoryServer] modify 
:jobId key in TaskCheckpointStatisticDetailsHandler
URL: https://github.com/apache/flink/pull/8801#issuecomment-503835553
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12896) Missing some information in History Server

2019-06-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12896:
---
Labels: pull-request-available  (was: )

> Missing some information in History Server
> --
>
> Key: FLINK-12896
> URL: https://issues.apache.org/jira/browse/FLINK-12896
> Project: Flink
>  Issue Type: Bug
>Reporter: xymaqingxiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2019-06-19-15-32-15-994.png, 
> image-2019-06-19-15-33-12-243.png, image-2019-06-19-15-40-08-481.png, 
> image-2019-06-19-15-41-48-051.png
>
>
> There are three bugs, as follows:
> 1. could not found the metrics for vertices.
> !image-2019-06-19-15-33-12-243.png!
> 2. could not found the checkpoint details for subtasks.
> !image-2019-06-19-15-32-15-994.png!
> 3. The jobs directory has an exception: job directory, the ArchivedJson we 
> get in FsJobArchivist is wrong.
> !image-2019-06-19-15-40-08-481.png!
> !image-2019-06-19-15-41-48-051.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] maqingxiang opened a new pull request #8801: [hotfix][FLINK-12896][HistoryServer] modify :jobId key in TaskCheckpointStatisticDetailsHandler

2019-06-19 Thread GitBox
maqingxiang opened a new pull request #8801: 
[hotfix][FLINK-12896][HistoryServer] modify :jobId key in 
TaskCheckpointStatisticDetailsHandler
URL: https://github.com/apache/flink/pull/8801
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on issue #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox
bowenli86 commented on issue #8766: [FLINK-12664][hive] Implement TableSink to 
write Hive tables
URL: https://github.com/apache/flink/pull/8766#issuecomment-503830225
 
 
   @lirui-apache Thanks for your contribution.
   
   Will merge once the CI passes


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] flinkbot commented on issue #8800: [FLINK-12627][doc][sql client][hive] Document how to configure and use catalogs in SQL CLI

2019-06-19 Thread GitBox
flinkbot commented on issue #8800: [FLINK-12627][doc][sql client][hive] 
Document how to configure and use catalogs in SQL CLI
URL: https://github.com/apache/flink/pull/8800#issuecomment-503830112
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 opened a new pull request #8800: [FLINK-12627][doc][sql client][hive] Document how to configure and use catalogs in SQL CLI

2019-06-19 Thread GitBox
bowenli86 opened a new pull request #8800: [FLINK-12627][doc][sql client][hive] 
Document how to configure and use catalogs in SQL CLI
URL: https://github.com/apache/flink/pull/8800
 
 
   ## What is the purpose of the change
   
   This PR adds English doc for configuring catalogs in SQL CLI.
   
   Chinese doc is in 
[FLINK-12894](https://issues.apache.org/jira/browse/FLINK-12894).
   
   ## Brief change log
   
   - adds document for configuring catalogs in SQL CLI.
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (docs)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12627) Document how to configure and use catalogs in SQL CLI

2019-06-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12627:
---
Labels: pull-request-available  (was: )

> Document how to configure and use catalogs in SQL CLI
> -
>
> Key: FLINK-12627
> URL: https://issues.apache.org/jira/browse/FLINK-12627
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive, Documentation, Table SQL / Client
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12847) Update Kinesis Connectors to latest Apache licensed libraries

2019-06-19 Thread Bowen Li (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868227#comment-16868227
 ] 

Bowen Li commented on FLINK-12847:
--

[~dyanarose] Thanks for your contribution!

I'm currently busy with other features in release 1.9, and also given that this 
work heavily relies on licenses updates from AWS side, I won't be able to 
review the changes before licenses of all kinesis connector's dependencies have 
been updated to Apache 2.0 and their new releases are officially published.

> Update Kinesis Connectors to latest Apache licensed libraries
> -
>
> Key: FLINK-12847
> URL: https://issues.apache.org/jira/browse/FLINK-12847
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kinesis
>Reporter: Dyana Rose
>Assignee: Dyana Rose
>Priority: Major
>
> Currently the referenced Kinesis Client Library and Kinesis Producer Library 
> code in the flink-connector-kinesis is licensed under the Amazon Software 
> License which is not compatible with the Apache License. This then requires a 
> fair amount of work in the CI pipeline and for users who want to use the 
> flink-connector-kinesis.
> The Kinesis Client Library v2.x and the AWS Java SDK v2.x both are now on the 
> Apache 2.0 license.
> [https://github.com/awslabs/amazon-kinesis-client/blob/master/LICENSE.txt]
> [https://github.com/aws/aws-sdk-java-v2/blob/master/LICENSE.txt]
> There is a PR for the Kinesis Producer Library to update it to the Apache 2.0 
> license ([https://github.com/awslabs/amazon-kinesis-producer/pull/256])
> The task should include, but not limited to, upgrading KCL/KPL to new 
> versions of Apache 2.0 license, changing licenses and NOTICE files in 
> flink-connector-kinesis, and adding flink-connector-kinesis to build, CI and 
> artifact publishing pipeline, updating the build profiles, updating 
> documentation that references the license incompatibility
> The expected outcome of this issue is that the flink-connector-kinesis will 
> be included with the standard build artifacts and will no longer need to be 
> built separately by users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12848) Method equals() in RowTypeInfo should consider fieldsNames

2019-06-19 Thread aloyszhang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868224#comment-16868224
 ] 

aloyszhang commented on FLINK-12848:


Hi Enrico, 

Simple add the fieldNames to the equals method of RowTypeInfo is not safe. It 
will cause test failed in `ExternalCatalogInsertTest` because of some operator 
like `union` use equals in RowTypeInfo to determine whether the two input are 
the of same type.  So I did not find a way to meet both tableEnv.scan() and 
union operator.

And more , this problem does not appear in flink-1.9.

 

> Method equals() in RowTypeInfo should consider fieldsNames
> --
>
> Key: FLINK-12848
> URL: https://issues.apache.org/jira/browse/FLINK-12848
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.7.2
>Reporter: aloyszhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since the `RowTypeInfo#equals()` does not consider the fieldNames , when 
> process data with RowTypeInfo type there may comes an error of the field 
> name.  
> {code:java}
> String [] fields = new String []{"first", "second"};
> TypeInformation[] types = new TypeInformation[]{
> Types.ROW_NAMED(new String[]{"first001"}, Types.INT),
> Types.ROW_NAMED(new String[]{"second002"}, Types.INT) }; 
> StreamExecutionEnvironment execEnv = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> StreamTableEnvironment env = 
> StreamTableEnvironment.getTableEnvironment(execEnv);
> SimpleProcessionTimeSource streamTableSource = new 
> SimpleProcessionTimeSource(fields, types);
> env.registerTableSource("testSource", streamTableSource);
> Table sourceTable = env.scan("testSource");
> System.out.println("Source table schema : ");
> sourceTable.printSchema();
> {code}
> The table shcema will be 
> {code:java}
> Source table schema : 
> root 
> |-- first: Row(first001: Integer) 
> |-- second: Row(first001: Integer) 
> |-- timestamp: TimeIndicatorTypeInfo(proctime)
> {code}
> the second field has the same name with the first field.
> So, we should consider the fieldnames in RowTypeInfo#equals()
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] klion26 commented on issue #8797: [Docks] Checkpoints: fix typo

2019-06-19 Thread GitBox
klion26 commented on issue #8797: [Docks] Checkpoints: fix typo
URL: https://github.com/apache/flink/pull/8797#issuecomment-503828052
 
 
   @casidiablo thanks for your contribution, LGTM, could you please also update 
the `stream_checkpointing.zh.md`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] sunhaibotb commented on issue #8731: [FLINK-11878][runtime] Implement the runtime handling of BoundedOneInput and BoundedMultiInput

2019-06-19 Thread GitBox
sunhaibotb commented on issue #8731: [FLINK-11878][runtime] Implement the 
runtime handling of BoundedOneInput and BoundedMultiInput
URL: https://github.com/apache/flink/pull/8731#issuecomment-503827844
 
 
   The comments were addressed, and the PR has been updated.  @pnowojski 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-12849) Add support for build Python Docs in Buildbot

2019-06-19 Thread sunjincheng (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868215#comment-16868215
 ] 

sunjincheng commented on FLINK-12849:
-

Yes, I had added the file exists check, in Committed revision 1046594.  But I 
found that there a  write permissions problem, I am checking it now, and will 
add new commit after fixed the bug.

> Add support for build Python Docs in Buildbot
> -
>
> Key: FLINK-12849
> URL: https://issues.apache.org/jira/browse/FLINK-12849
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python, Build System
>Affects Versions: 1.9.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Attachments: image-2019-06-14-16-14-35-439.png, python_docs.patch
>
>
> We should add the Python Doc for Python API, and add the link to the web 
> page. i.e.:
> !image-2019-06-14-16-14-35-439.png!
> In FLINK-12720 we will add how to generate the Python Docs, and in this PR we 
> should add support for build Python Docs in Buildbot. We may need to modify 
> the build config:
> [https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf]
>  
> the Wiki of how to change the Buildbot code: 
> [https://cwiki.apache.org/confluence/display/FLINK/Managing+Flink+Documentation]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11947) Support MapState value schema evolution for RocksDB

2019-06-19 Thread Tzu-Li (Gordon) Tai (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-11947:

Fix Version/s: 1.9.0

> Support MapState value schema evolution for RocksDB
> ---
>
> Key: FLINK-11947
> URL: https://issues.apache.org/jira/browse/FLINK-11947
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System, Runtime / State Backends
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Congxian Qiu(klion26)
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, we do not attempt to perform state schema evolution if the key or 
> value's schema of a user {{MapState}} has changed when using {{RocksDB}}:
> https://github.com/apache/flink/blob/953a5ffcbdae4115f7d525f310723cf8770779df/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L542
> This was disallowed in the initial support for state schema evolution because 
> the way we did state evolution in the RocksDB state backend was simply 
> overwriting values.
> For {{MapState}} key evolution, only overwriting RocksDB values does not 
> work, since RocksDB entries for {{MapState}} uses a composite key containing 
> the map state key. This means that when evolving {{MapState}} in this case 
> with an evolved key schema, we will have new entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-11869) [checkpoint] Make buffer size in checkpoint stream factory configurable

2019-06-19 Thread Tzu-Li (Gordon) Tai (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai resolved FLINK-11869.
-
Resolution: Fixed

Merged for 1.9.0: bc5e8a77aa3419afc03c0751dd339e5027cf3664

> [checkpoint] Make buffer size in checkpoint stream factory configurable
> ---
>
> Key: FLINK-11869
> URL: https://issues.apache.org/jira/browse/FLINK-11869
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, the default buffer size for {{FsCheckpointStateOutputStream}} is 
> only 4KB. This would case a lot of IOPS if stream is large. Unfortunately, 
> when user want to checkpoint on a totally disaggregated file system which has 
> no data node manager running in local machine, they might have a IOPS limit 
> or cannot serve too many IOPS at a time. This would cause the checkpoint 
> duration really large and might expire often. 
> If we want to increase this buffer size, we have to increase the 
> {{fileStateThreshold}} to indirectly increase the buffer size. However, as we 
> all know, too many not-so-small {{ByteStreamStateHandle}} returned to 
> checkpoint coordinator would easily cause job manager OOM and checkpoint meta 
> file large.
> We should also make the buffer size configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-11947) Support MapState value schema evolution for RocksDB

2019-06-19 Thread Tzu-Li (Gordon) Tai (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai resolved FLINK-11947.
-
Resolution: Fixed

> Support MapState value schema evolution for RocksDB
> ---
>
> Key: FLINK-11947
> URL: https://issues.apache.org/jira/browse/FLINK-11947
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System, Runtime / State Backends
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Congxian Qiu(klion26)
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, we do not attempt to perform state schema evolution if the key or 
> value's schema of a user {{MapState}} has changed when using {{RocksDB}}:
> https://github.com/apache/flink/blob/953a5ffcbdae4115f7d525f310723cf8770779df/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L542
> This was disallowed in the initial support for state schema evolution because 
> the way we did state evolution in the RocksDB state backend was simply 
> overwriting values.
> For {{MapState}} key evolution, only overwriting RocksDB values does not 
> work, since RocksDB entries for {{MapState}} uses a composite key containing 
> the map state key. This means that when evolving {{MapState}} in this case 
> with an evolved key schema, we will have new entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11947) Support MapState value schema evolution for RocksDB

2019-06-19 Thread Tzu-Li (Gordon) Tai (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868193#comment-16868193
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-11947:
-

Merged for 1.9.0: 829146d516751a592c3ab15908baebfd13429e8e

> Support MapState value schema evolution for RocksDB
> ---
>
> Key: FLINK-11947
> URL: https://issues.apache.org/jira/browse/FLINK-11947
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System, Runtime / State Backends
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Congxian Qiu(klion26)
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, we do not attempt to perform state schema evolution if the key or 
> value's schema of a user {{MapState}} has changed when using {{RocksDB}}:
> https://github.com/apache/flink/blob/953a5ffcbdae4115f7d525f310723cf8770779df/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L542
> This was disallowed in the initial support for state schema evolution because 
> the way we did state evolution in the RocksDB state backend was simply 
> overwriting values.
> For {{MapState}} key evolution, only overwriting RocksDB values does not 
> work, since RocksDB entries for {{MapState}} uses a composite key containing 
> the map state key. This means that when evolving {{MapState}} in this case 
> with an evolved key schema, we will have new entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] lirui-apache commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwrit

2019-06-19 Thread GitBox
lirui-apache commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295593755
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sinks/PartitionableTableSink.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sinks;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * An abstract class with trait about partitionable table sink. This is mainly 
used for
 
 Review comment:
   I guess we can remove that statement, since we'll support dynamic 
partitioning in Flink.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] lirui-apache commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwrit

2019-06-19 Thread GitBox
lirui-apache commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295595082
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sinks/PartitionableTableSink.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sinks;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * An abstract class with trait about partitionable table sink. This is mainly 
used for
+ * static partitions. For sql statement:
+ * 
+ * 
+ * INSERT INTO A PARTITION(a='ab', b='cd') select c, d from B
+ * 
+ * 
+ * We Assume the A has partition columns as a, b, c.
+ * The columns a and b are called static partition columns, 
while c is called
 
 Review comment:
   I think we should give the definition of table `A` if we intend to offer a 
valid example. It's true the dynamic column should appear last, but the column 
names in `SELECT` don't have to be the same as the column names in the 
destination table -- so it's hard to tell w/o a DDL :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] flinkbot commented on issue #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox
flinkbot commented on issue #8799: 
[FLINK-11612][chinese-translation,Documentation] Translate the "Proje…
URL: https://github.com/apache/flink/pull/8799#issuecomment-503816730
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-11612) Translate the "Project Template for Java" page into Chinese

2019-06-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-11612:
---
Labels: pull-request-available  (was: )

> Translate the "Project Template for Java" page into Chinese
> ---
>
> Key: FLINK-11612
> URL: https://issues.apache.org/jira/browse/FLINK-11612
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Jark Wu
>Assignee: Jasper Yue
>Priority: Major
>  Labels: pull-request-available
>
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-master/dev/projectsetup/java_api_quickstart.html
> The markdown file is located in 
> flink/docs/dev/projectsetup/java_api_quickstart.zh.md
> The markdown file will be created once FLINK-11529 is merged.
> You can reference the translation from : 
> https://github.com/flink-china/1.6.0/blob/master/quickstart/java_api_quickstart.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] yuezhuangshi opened a new pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox
yuezhuangshi opened a new pull request #8799: 
[FLINK-11612][chinese-translation,Documentation] Translate the "Proje…
URL: https://github.com/apache/flink/pull/8799
 
 
   
   
   
   ## What is the purpose of the change
   
   This pull request completes the Chinese translation of "Project Template for 
Java" page from official document.
   
   ## Brief change log
   
 - *Translate the "Project Template for Java" page into Chinese*
   
   ## Verifying this change
   
   This change is to add a new translated document.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] leesf closed pull request #8159: [hotfix][runtime] fix error log description

2019-06-19 Thread GitBox
leesf closed pull request #8159: [hotfix][runtime] fix error log description
URL: https://github.com/apache/flink/pull/8159
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] asfgit closed pull request #8686: [FLINK-11869] Make buffer size in checkpoint stream factory configurable

2019-06-19 Thread GitBox
asfgit closed pull request #8686: [FLINK-11869] Make buffer size in checkpoint 
stream factory configurable
URL: https://github.com/apache/flink/pull/8686
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] asfgit closed pull request #8565: [FLINK-11947] Support MapState value schema evolution for RocksDB

2019-06-19 Thread GitBox
asfgit closed pull request #8565: [FLINK-11947] Support MapState value schema 
evolution for RocksDB
URL: https://github.com/apache/flink/pull/8565
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-12846) Carry primary key and unique key information in TableSchema

2019-06-19 Thread Hequn Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868170#comment-16868170
 ] 

Hequn Cheng commented on FLINK-12846:
-

+1 for adding the key information! Primary key and unique key are part of 
TableSchema and they are very helpful for the optimization.

BTW, how are we going to add the key infos? We may also need to consider other 
information, like column nullable and computed column in order not to make the 
TableSchema becoming more and more mussy.

What do you think?

Best, Hequn

> Carry primary key and unique key information in TableSchema
> ---
>
> Key: FLINK-12846
> URL: https://issues.apache.org/jira/browse/FLINK-12846
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / API
>Reporter: Jark Wu
>Assignee: Jark Wu
>Priority: Major
> Fix For: 1.9.0
>
>
> The primary key and unique key is a standard meta information in SQL. And 
> they are important information for optimization, for example, 
> AggregateRemove, AggregateReduceGrouping and state layout optimization for 
> TopN and Join.
> So in this issue, we want to extend {{TableSchema}} to carry more information 
> about primary key and unique keys. So that the TableSource can declare this 
> meta information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-11612) Translate the "Project Template for Java" page into Chinese

2019-06-19 Thread Jasper Yue (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jasper Yue reassigned FLINK-11612:
--

Assignee: Jasper Yue  (was: LakeShen)

> Translate the "Project Template for Java" page into Chinese
> ---
>
> Key: FLINK-11612
> URL: https://issues.apache.org/jira/browse/FLINK-11612
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Jark Wu
>Assignee: Jasper Yue
>Priority: Major
>
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-master/dev/projectsetup/java_api_quickstart.html
> The markdown file is located in 
> flink/docs/dev/projectsetup/java_api_quickstart.zh.md
> The markdown file will be created once FLINK-11529 is merged.
> You can reference the translation from : 
> https://github.com/flink-china/1.6.0/blob/master/quickstart/java_api_quickstart.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11612) Translate the "Project Template for Java" page into Chinese

2019-06-19 Thread Jark Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868158#comment-16868158
 ] 

Jark Wu commented on FLINK-11612:
-

[~yuetongshu], sure, I think it's fine.

> Translate the "Project Template for Java" page into Chinese
> ---
>
> Key: FLINK-11612
> URL: https://issues.apache.org/jira/browse/FLINK-11612
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Jark Wu
>Assignee: LakeShen
>Priority: Major
>
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-master/dev/projectsetup/java_api_quickstart.html
> The markdown file is located in 
> flink/docs/dev/projectsetup/java_api_quickstart.zh.md
> The markdown file will be created once FLINK-11529 is merged.
> You can reference the translation from : 
> https://github.com/flink-china/1.6.0/blob/master/quickstart/java_api_quickstart.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-12907) flink-table-planner-blink fails to compile with scala 2.12

2019-06-19 Thread Jark Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-12907.
---
Resolution: Fixed
  Assignee: Jark Wu

Fixed in 1.9.0: 671ac182e514500c5f2b430877c6ac30b26e6ec7

> flink-table-planner-blink fails to compile with scala 2.12
> --
>
> Key: FLINK-12907
> URL: https://issues.apache.org/jira/browse/FLINK-12907
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0
>Reporter: Dawid Wysakowicz
>Assignee: Jark Wu
>Priority: Blocker
> Fix For: 1.9.0
>
>
> {code}
> 14:03:15.204 [ERROR] 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/plan/nodes/resource/ExecNodeResourceTest.scala:269:
>  error: overriding method getOutputType in trait TableSink of type 
> ()org.apache.flink.api.common.typeinfo.TypeInformation[org.apache.flink.table.dataformat.BaseRow];
> 14:03:15.204 [ERROR]  method getOutputType needs `override' modifier
> 14:03:15.204 [ERROR]   @deprecated def getOutputType: 
> TypeInformation[BaseRow] = {
> 14:03:15.204 [ERROR]   ^
> 14:03:15.217 [ERROR] 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/plan/nodes/resource/ExecNodeResourceTest.scala:275:
>  error: overriding method getFieldNames in trait TableSink of type 
> ()Array[String];
> 14:03:15.217 [ERROR]  method getFieldNames needs `override' modifier
> 14:03:15.217 [ERROR]   @deprecated def getFieldNames: Array[String] = 
> schema.getFieldNames
> 14:03:15.217 [ERROR]   ^
> 14:03:15.219 [ERROR] 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/plan/nodes/resource/ExecNodeResourceTest.scala:280:
>  error: overriding method getFieldTypes in trait TableSink of type 
> ()Array[org.apache.flink.api.common.typeinfo.TypeInformation[_]];
> 14:03:15.219 [ERROR]  method getFieldTypes needs `override' modifier
> 14:03:15.219 [ERROR]   @deprecated def getFieldTypes: 
> Array[TypeInformation[_]] = schema.getFieldTypes
> {code}
> https://api.travis-ci.org/v3/job/547655787/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-10348) Solve data skew when consuming data from kafka

2019-06-19 Thread Jiayi Liao (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayi Liao closed FLINK-10348.
--
Resolution: Not A Problem

> Solve data skew when consuming data from kafka
> --
>
> Key: FLINK-10348
> URL: https://issues.apache.org/jira/browse/FLINK-10348
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Kafka
>Affects Versions: 1.6.0
>Reporter: Jiayi Liao
>Assignee: Jiayi Liao
>Priority: Major
>
> By using KafkaConsumer, our strategy is to send fetch request to brokers with 
> a fixed fetch size. Assume x topic has n partition and there exists data skew 
> between partitions, now we need to consume data from x topic with earliest 
> offset, and we can get max fetch size data in every fetch request. The 
> problem is that when an task consumes data from both "big" partitions and 
> "small" partitions, the data in "big" partitions may be late elements because 
> "small" partitions are consumed faster.
> *Solution: *
> I think we can leverage two parameters to control this.
> 1. data.skew.check // whether to check data skew
> 2. data.skew.check.interval // the interval between checks
> Every data.skew.check.interval, we will check the latest offset of every 
> specific partition, and calculate (latest offset - current offset), then get 
> partitions which need to slow down and redefine their fetch size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11612) Translate the "Project Template for Java" page into Chinese

2019-06-19 Thread Jasper Yue (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868141#comment-16868141
 ] 

Jasper Yue commented on FLINK-11612:


Hi [~jark], can I assign this issue to me?

> Translate the "Project Template for Java" page into Chinese
> ---
>
> Key: FLINK-11612
> URL: https://issues.apache.org/jira/browse/FLINK-11612
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Jark Wu
>Assignee: LakeShen
>Priority: Major
>
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-master/dev/projectsetup/java_api_quickstart.html
> The markdown file is located in 
> flink/docs/dev/projectsetup/java_api_quickstart.zh.md
> The markdown file will be created once FLINK-11529 is merged.
> You can reference the translation from : 
> https://github.com/flink-china/1.6.0/blob/master/quickstart/java_api_quickstart.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] flinkbot commented on issue #8798: [FLINK-12659][hive] Integrate Flink with Hive GenericUDTF

2019-06-19 Thread GitBox
flinkbot commented on issue #8798: [FLINK-12659][hive] Integrate Flink with 
Hive GenericUDTF
URL: https://github.com/apache/flink/pull/8798#issuecomment-503786750
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on issue #8798: [FLINK-12659][hive] Integrate Flink with Hive GenericUDTF

2019-06-19 Thread GitBox
bowenli86 commented on issue #8798: [FLINK-12659][hive] Integrate Flink with 
Hive GenericUDTF
URL: https://github.com/apache/flink/pull/8798#issuecomment-503786615
 
 
   cc @xuefuz @JingsongLi @lirui-apache @zjuwangg


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12659) Integrate Flink with Hive GenericUDTF

2019-06-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12659:
---
Labels: pull-request-available  (was: )

> Integrate Flink with Hive GenericUDTF
> -
>
> Key: FLINK-12659
> URL: https://issues.apache.org/jira/browse/FLINK-12659
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Major
>  Labels: pull-request-available
>
> https://hive.apache.org/javadocs/r3.1.1/api/org/apache/hadoop/hive/ql/udf/generic/GenericUDTF.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] bowenli86 opened a new pull request #8798: [FLINK-12659][hive] Integrate Flink with Hive GenericUDTF

2019-06-19 Thread GitBox
bowenli86 opened a new pull request #8798: [FLINK-12659][hive] Integrate Flink 
with Hive GenericUDTF
URL: https://github.com/apache/flink/pull/8798
 
 
   ## What is the purpose of the change
   
   This PR integrates Flink with Hive GenericUDTF.
   
   ## Brief change log
   
   - added `HiveGenericUDTF` to delegate function calls to Hive's GenericUDTF
   - extracted a few util methods to `HiveFunctionUtil`
   - added unit tests for `HiveGenericUDTF`
   
   ## Verifying this change
   
   This change added tests and can be verified as `HiveGenericUDTFTest`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes)
 - If yes, how is the feature documented? (docs)
   
   Documentation will be added later
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] walterddr commented on a change in pull request #8632: [FLINK-12744][ml] add shared params in ml package

2019-06-19 Thread GitBox
walterddr commented on a change in pull request #8632: [FLINK-12744][ml] add 
shared params in ml package
URL: https://github.com/apache/flink/pull/8632#discussion_r295563061
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/params/shared/colname/HasKeepColNames.java
 ##
 @@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.flink.ml.params.shared.colname;
+
+import org.apache.flink.ml.api.misc.param.ParamInfo;
+import org.apache.flink.ml.api.misc.param.ParamInfoFactory;
+import org.apache.flink.ml.api.misc.param.WithParams;
+
+/**
+ * An interface for classes with a parameter specifying the names of the 
columns to be retained in the output table.
 
 Review comment:
   my suggestion was following the convention of 
[RowTypeInfo](https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/RowTypeInfo.java).
 which IMO most of the table schema is defined against. but either works fine. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on a change in pull request #8785: [FLINK-11480][hive] Create HiveTableFactory that creates TableSource/…

2019-06-19 Thread GitBox
bowenli86 commented on a change in pull request #8785: [FLINK-11480][hive] 
Create HiveTableFactory that creates TableSource/…
URL: https://github.com/apache/flink/pull/8785#discussion_r295561400
 
 

 ##
 File path: 
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/catalog/DatabaseCalciteSchema.java
 ##
 @@ -77,6 +84,23 @@ public Table getTable(String tableName) {
!connectorTable.isBatch(),

FlinkStatistic.of(tableSource.getTableStats().orElse(null
.orElseThrow(() -> new 
TableException("Cannot query a sink only table."));
+   } else if (table instanceof CatalogTable) {
+   Optional tableFactory = 
catalog.getTableFactory();
+   TableSource tableSource = 
tableFactory.map(tf -> ((TableSourceFactory) 
tf).createTableSource((CatalogTable) table))
 
 Review comment:
   nice, I didn't know map() can be applied to Optional objects


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] xuefuz commented on a change in pull request #8785: [FLINK-11480][hive] Create HiveTableFactory that creates TableSource/…

2019-06-19 Thread GitBox
xuefuz commented on a change in pull request #8785: [FLINK-11480][hive] Create 
HiveTableFactory that creates TableSource/…
URL: https://github.com/apache/flink/pull/8785#discussion_r295548810
 
 

 ##
 File path: 
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/catalog/DatabaseCalciteSchema.java
 ##
 @@ -77,6 +84,23 @@ public Table getTable(String tableName) {
!connectorTable.isBatch(),

FlinkStatistic.of(tableSource.getTableStats().orElse(null
.orElseThrow(() -> new 
TableException("Cannot query a sink only table."));
+   } else if (table instanceof CatalogTable) {
+   Optional tableFactory = 
catalog.getTableFactory();
+   TableSource tableSource = 
tableFactory.map(tf -> ((TableSourceFactory) 
tf).createTableSource((CatalogTable) table))
+   
.orElse(TableFactoryUtil.findAndCreateTableSource(((CatalogTable) 
table).toProperties()));
+
+   if (!(tableSource instanceof 
StreamTableSource)) {
 
 Review comment:
   The message was from Dawid's change. I didn't add it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] xuefuz commented on a change in pull request #8785: [FLINK-11480][hive] Create HiveTableFactory that creates TableSource/…

2019-06-19 Thread GitBox
xuefuz commented on a change in pull request #8785: [FLINK-11480][hive] Create 
HiveTableFactory that creates TableSource/…
URL: https://github.com/apache/flink/pull/8785#discussion_r295537393
 
 

 ##
 File path: 
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/catalog/DatabaseCalciteSchema.java
 ##
 @@ -77,6 +84,23 @@ public Table getTable(String tableName) {
!connectorTable.isBatch(),

FlinkStatistic.of(tableSource.getTableStats().orElse(null
.orElseThrow(() -> new 
TableException("Cannot query a sink only table."));
+   } else if (table instanceof CatalogTable) {
+   Optional tableFactory = 
catalog.getTableFactory();
+   TableSource tableSource = 
tableFactory.map(tf -> ((TableSourceFactory) 
tf).createTableSource((CatalogTable) table))
 
 Review comment:
   map() is applied to an Optional object, so the key (tf) is a TableFactory 
instance.
   As you can see from other types of tables, calcite seems only need 
TableSource.
   If tableFactory isn't present, then orElse() clause kicks in.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] xuefuz commented on a change in pull request #8785: [FLINK-11480][hive] Create HiveTableFactory that creates TableSource/…

2019-06-19 Thread GitBox
xuefuz commented on a change in pull request #8785: [FLINK-11480][hive] Create 
HiveTableFactory that creates TableSource/…
URL: https://github.com/apache/flink/pull/8785#discussion_r295537393
 
 

 ##
 File path: 
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/catalog/DatabaseCalciteSchema.java
 ##
 @@ -77,6 +84,23 @@ public Table getTable(String tableName) {
!connectorTable.isBatch(),

FlinkStatistic.of(tableSource.getTableStats().orElse(null
.orElseThrow(() -> new 
TableException("Cannot query a sink only table."));
+   } else if (table instanceof CatalogTable) {
+   Optional tableFactory = 
catalog.getTableFactory();
+   TableSource tableSource = 
tableFactory.map(tf -> ((TableSourceFactory) 
tf).createTableSource((CatalogTable) table))
 
 Review comment:
   map() is applied to an Optional object, so the key (tf) is a TableFactory 
instance.
   As you can see from other types of tables, calcite seems only need 
TableSource.
   If tableFactory isn't present, then orElse() clause kicks in. The test 
verifies that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-12848) Method equals() in RowTypeInfo should consider fieldsNames

2019-06-19 Thread Enrico Canzonieri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868029#comment-16868029
 ] 

Enrico Canzonieri commented on FLINK-12848:
---

This is causing issues to one of our queries where the schema has two nested 
records that have two fields of the same type but different name, e.g. 
Row(Row(a: Int, b: Int), Row(c: Int, d:Int)) where "a", "b", "c", "d" are the 
field names.

The code in 
[https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkTypeFactory.scala#L92|https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkTypeFactory.scala#L92)]
 that is caching the Type conversion is returning the for Row(c: Int, d: Int) 
the conversion cached for the first nested Row. As result the generated table 
schema will have mixed up (and clashing) field names.

I see that the equals() change to RowTypeInfo was introduced in FLINK-9444. Is 
there any reason why we should never consider the field names for RowTypeInfo 
equals? If so would it make sense to have a method that covers that special (to 
my understanding) case and make equals also include names?

I'm currently planning to fix this locally by extending the equals method of 
RowTypeInfo, but it'd be great to know whether that's safe to do or not.

> Method equals() in RowTypeInfo should consider fieldsNames
> --
>
> Key: FLINK-12848
> URL: https://issues.apache.org/jira/browse/FLINK-12848
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.7.2
>Reporter: aloyszhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since the `RowTypeInfo#equals()` does not consider the fieldNames , when 
> process data with RowTypeInfo type there may comes an error of the field 
> name.  
> {code:java}
> String [] fields = new String []{"first", "second"};
> TypeInformation[] types = new TypeInformation[]{
> Types.ROW_NAMED(new String[]{"first001"}, Types.INT),
> Types.ROW_NAMED(new String[]{"second002"}, Types.INT) }; 
> StreamExecutionEnvironment execEnv = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> StreamTableEnvironment env = 
> StreamTableEnvironment.getTableEnvironment(execEnv);
> SimpleProcessionTimeSource streamTableSource = new 
> SimpleProcessionTimeSource(fields, types);
> env.registerTableSource("testSource", streamTableSource);
> Table sourceTable = env.scan("testSource");
> System.out.println("Source table schema : ");
> sourceTable.printSchema();
> {code}
> The table shcema will be 
> {code:java}
> Source table schema : 
> root 
> |-- first: Row(first001: Integer) 
> |-- second: Row(first001: Integer) 
> |-- timestamp: TimeIndicatorTypeInfo(proctime)
> {code}
> the second field has the same name with the first field.
> So, we should consider the fieldnames in RowTypeInfo#equals()
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] asfgit closed pull request #8770: [FLINK-12658][hive] Integrate Flink with Hive GenericUDF

2019-06-19 Thread GitBox
asfgit closed pull request #8770: [FLINK-12658][hive] Integrate Flink with Hive 
GenericUDF
URL: https://github.com/apache/flink/pull/8770
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (FLINK-12658) Integrate Flink with Hive GenericUDF

2019-06-19 Thread Bowen Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li closed FLINK-12658.

Resolution: Fixed

merged in 1.9.0: 6f89f3d0720823019e3200a4eb572f7281657344

> Integrate Flink with Hive GenericUDF
> 
>
> Key: FLINK-12658
> URL: https://issues.apache.org/jira/browse/FLINK-12658
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://hive.apache.org/javadocs/r3.1.1/api/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] bowenli86 commented on issue #8770: [FLINK-12658][hive] Integrate Flink with Hive GenericUDF

2019-06-19 Thread GitBox
bowenli86 commented on issue #8770: [FLINK-12658][hive] Integrate Flink with 
Hive GenericUDF
URL: https://github.com/apache/flink/pull/8770#issuecomment-503735850
 
 
   @xuefuz Thanks for your review!
   
   Merging


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] vim345 commented on issue #8737: [FLINK-12848][core] Consider fieldNames in RowTypeInfo#equals()

2019-06-19 Thread GitBox
vim345 commented on issue #8737: [FLINK-12848][core] Consider fieldNames in 
RowTypeInfo#equals()
URL: https://github.com/apache/flink/pull/8737#issuecomment-503735121
 
 
   @aloyszhang Is there any reasons you didn't merge this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] ex00 commented on issue #8632: [FLINK-12744][ml] add shared params in ml package

2019-06-19 Thread GitBox
ex00 commented on issue #8632: [FLINK-12744][ml] add shared params in ml package
URL: https://github.com/apache/flink/pull/8632#issuecomment-503721142
 
 
   Thanks @xuyang1706 
   LGFM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] ex00 commented on a change in pull request #8776: [FLINK-12881][ml] Add more functionalities for ML Params and ParamInfo class

2019-06-19 Thread GitBox
ex00 commented on a change in pull request #8776: [FLINK-12881][ml] Add more 
functionalities for ML Params and ParamInfo class
URL: https://github.com/apache/flink/pull/8776#discussion_r295493529
 
 

 ##
 File path: 
flink-ml-parent/flink-ml-api/src/main/java/org/apache/flink/ml/api/misc/param/Params.java
 ##
 @@ -44,17 +86,39 @@
 * @param   the type of the specific parameter
 * @return the value of the specific parameter, or default value 
defined in the {@code info} if
 * this Params doesn't contain the parameter
-* @throws RuntimeException if the Params doesn't contains the specific 
parameter, while the
-*  param is not optional but has no default 
value in the {@code info}
+* @throws IllegalArgumentException if the Params doesn't contains the 
specific parameter, while the
+*  param is not optional but has no default 
value in the {@code info} or
+*  if the Params contains the specific 
parameter and alias, but has more
+*  than one value or
+*  if the Params doesn't contains the specific 
parameter, while the ParamInfo
+*  is optional but has no default value
 */
-   @SuppressWarnings("unchecked")
public  V get(ParamInfo info) {
-   V value = (V) paramMap.getOrDefault(info.getName(), 
info.getDefaultValue());
-   if (value == null && !info.isOptional() && 
!info.hasDefaultValue()) {
-   throw new RuntimeException(info.getName() +
-   " not exist which is not optional and don't 
have a default value");
+   String value = null;
+   String usedParamName = null;
+   for (String nameOrAlias : getParamNameAndAlias(info)) {
+   if (params.containsKey(nameOrAlias)) {
+   if (usedParamName != null) {
+   throw new 
IllegalArgumentException(String.format("Duplicate parameters of %s and %s",
 
 Review comment:
   Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] asfgit closed pull request #8786: [FLINK-12877][table][hive] Unify catalog database implementations

2019-06-19 Thread GitBox
asfgit closed pull request #8786: [FLINK-12877][table][hive] Unify catalog 
database implementations
URL: https://github.com/apache/flink/pull/8786
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (FLINK-12877) Unify catalog database implementations

2019-06-19 Thread Bowen Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li closed FLINK-12877.

Resolution: Fixed

merged in 1.9.0: 3cf65b11237a2928273ce5675faf6b2900b0b76a

> Unify catalog database implementations
> --
>
> Key: FLINK-12877
> URL: https://issues.apache.org/jira/browse/FLINK-12877
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive, Table SQL / API
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> per discussion in https://issues.apache.org/jira/browse/FLINK-12841



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] bowenli86 commented on issue #8786: [FLINK-12877][table][hive] Unify catalog database implementations

2019-06-19 Thread GitBox
bowenli86 commented on issue #8786: [FLINK-12877][table][hive] Unify catalog 
database implementations
URL: https://github.com/apache/flink/pull/8786#issuecomment-503686014
 
 
   @xuefuz thanks for your review!
   
   Merging


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on issue #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox
bowenli86 commented on issue #8766: [FLINK-12664][hive] Implement TableSink to 
write Hive tables
URL: https://github.com/apache/flink/pull/8766#issuecomment-503682405
 
 
   I don't have any other concerns. @xuefuz do you?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] bowenli86 commented on issue #8786: [FLINK-12877][table][hive] Unify catalog database implementations

2019-06-19 Thread GitBox
bowenli86 commented on issue #8786: [FLINK-12877][table][hive] Unify catalog 
database implementations
URL: https://github.com/apache/flink/pull/8786#issuecomment-503681013
 
 
   @lirui-apache thanks for your review! After a discussion with @xuefuz, we 
felt it's not necessary to distinguish whether a database is generic or not. 
Thus I completely removed that part from the PR.
   
   @xuefuz can you take a look?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] flinkbot commented on issue #8797: [Docks] Checkpoints: fix typo

2019-06-19 Thread GitBox
flinkbot commented on issue #8797: [Docks] Checkpoints: fix typo
URL: https://github.com/apache/flink/pull/8797#issuecomment-503670139
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] casidiablo opened a new pull request #8797: [Docks] Checkpoints: fix typo

2019-06-19 Thread GitBox
casidiablo opened a new pull request #8797: [Docks] Checkpoints: fix typo
URL: https://github.com/apache/flink/pull/8797
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] azagrebin commented on a change in pull request #8783: [FLINK-12863][FLINK-12865] Remove concurrency from HeartbeatManager(Sender)Impl

2019-06-19 Thread GitBox
azagrebin commented on a change in pull request #8783: 
[FLINK-12863][FLINK-12865] Remove concurrency from HeartbeatManager(Sender)Impl
URL: https://github.com/apache/flink/pull/8783#discussion_r295407861
 
 

 ##
 File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/heartbeat/TestingHeartbeatServices.java
 ##
 @@ -18,43 +18,18 @@
 
 package org.apache.flink.runtime.heartbeat;
 
-import org.apache.flink.runtime.clusterframework.types.ResourceID;
 import org.apache.flink.runtime.concurrent.ScheduledExecutor;
-import org.apache.flink.runtime.testingUtils.TestingUtils;
-import org.apache.flink.util.Preconditions;
-
-import org.slf4j.Logger;
 
 /**
  * A {@link HeartbeatServices} that allows the injection of a {@link 
ScheduledExecutor}.
  */
 public class TestingHeartbeatServices extends HeartbeatServices {
 
-   private final ScheduledExecutor scheduledExecutorToUse;
-
-   public TestingHeartbeatServices(long heartbeatInterval, long 
heartbeatTimeout, ScheduledExecutor scheduledExecutorToUse) {
+   public TestingHeartbeatServices(long heartbeatInterval, long 
heartbeatTimeout) {
super(heartbeatInterval, heartbeatTimeout);
-
-   this.scheduledExecutorToUse = 
Preconditions.checkNotNull(scheduledExecutorToUse);
}
 
public TestingHeartbeatServices() {
-   this(1000L, 1L, TestingUtils.defaultScheduledExecutor());
-   }
-
-   @Override
-   public  HeartbeatManager createHeartbeatManagerSender(
-   ResourceID resourceId,
-   HeartbeatListener heartbeatListener,
-   ScheduledExecutor mainThreadExecutor,
-   Logger log) {
-
-   return new HeartbeatManagerSenderImpl<>(
-   heartbeatInterval,
-   heartbeatTimeout,
-   resourceId,
-   heartbeatListener,
-   scheduledExecutorToUse,
-   log);
+   this(1000L, 1L);
 
 Review comment:
   seems this class is not needed any more or at least the comment is outdated


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] azagrebin commented on a change in pull request #8783: [FLINK-12863][FLINK-12865] Remove concurrency from HeartbeatManager(Sender)Impl

2019-06-19 Thread GitBox
azagrebin commented on a change in pull request #8783: 
[FLINK-12863][FLINK-12865] Remove concurrency from HeartbeatManager(Sender)Impl
URL: https://github.com/apache/flink/pull/8783#discussion_r295395850
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/heartbeat/HeartbeatManagerSenderImpl.java
 ##
 @@ -36,54 +34,43 @@
  */
 public class HeartbeatManagerSenderImpl extends HeartbeatManagerImpl implements Runnable {
 
-   private final ScheduledFuture triggerFuture;
+   private final long heartbeatPeriod;
 
-   public HeartbeatManagerSenderImpl(
-   long heartbeatPeriod,
-   long heartbeatTimeout,
-   ResourceID ownResourceID,
-   HeartbeatListener heartbeatListener,
-   ScheduledExecutor mainThreadExecutor,
-   Logger log) {
+   HeartbeatManagerSenderImpl(
+   long heartbeatPeriod,
+   long heartbeatTimeout,
+   ResourceID ownResourceID,
+   HeartbeatListener heartbeatListener,
+   ScheduledExecutor mainThreadExecutor,
+   Logger log) {
 
 Review comment:
   formatting


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] azagrebin commented on a change in pull request #8783: [FLINK-12863][FLINK-12865] Remove concurrency from HeartbeatManager(Sender)Impl

2019-06-19 Thread GitBox
azagrebin commented on a change in pull request #8783: 
[FLINK-12863][FLINK-12865] Remove concurrency from HeartbeatManager(Sender)Impl
URL: https://github.com/apache/flink/pull/8783#discussion_r295380352
 
 

 ##
 File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/taskexecutor/TaskExecutorTest.java
 ##
 @@ -1898,4 +1998,22 @@ public SlotReport createSlotReport(ResourceID 
resourceId) {
return slotReports.poll();
}
}
+
+   private static final class AllocateSlotNotifyingTaskSlotTable extends 
TaskSlotTable {
+
+   private final OneShotLatch allocateSlotLatch;
+
+   private 
AllocateSlotNotifyingTaskSlotTable(Collection 
resourceProfiles, TimerService timerService, OneShotLatch 
allocateSlotLatch) {
 
 Review comment:
   nit: I would avoid long lines for readability. Shorter lines are easier to 
read in reviews and comparing diffs and easier perceived in general.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] azagrebin commented on a change in pull request #8646: [FLINK-12735][network] Make shuffle environment implementation independent with IOManager

2019-06-19 Thread GitBox
azagrebin commented on a change in pull request #8646: [FLINK-12735][network] 
Make shuffle environment implementation independent with IOManager
URL: https://github.com/apache/flink/pull/8646#discussion_r295369130
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/disk/iomanager/IOManager.java
 ##
 @@ -120,41 +121,40 @@ public boolean isProperlyShutDown() {
// 

 
/**
-* Creates a new {@link FileIOChannel.ID} in one of the temp 
directories. Multiple
-* invocations of this method spread the channels evenly across the 
different directories.
+* Creates a new {@link ID} in one of the temp directories. Multiple 
invocations of this
+* method spread the channels evenly across the different directories.
 *
 * @return A channel to a temporary directory.
 */
-   public FileIOChannel.ID createChannel() {
+   public ID createChannel() {
final int num = getNextPathNum();
-   return new FileIOChannel.ID(this.paths[num], num, this.random);
+   return new ID(this.paths[num], num, this.random);
}
 
/**
-* Creates a new {@link FileIOChannel.Enumerator}, spreading the 
channels in a round-robin fashion
+* Creates a new {@link Enumerator}, spreading the channels in a 
round-robin fashion
 * across the temporary file directories.
 *
 * @return An enumerator for channels.
 */
-   public FileIOChannel.Enumerator createChannelEnumerator() {
-   return new FileIOChannel.Enumerator(this.paths, this.random);
+   public Enumerator createChannelEnumerator() {
+   return new Enumerator(this.paths, this.random);
}
 
/**
 * Deletes the file underlying the given channel. If the channel is 
still open, this
 * call may fail.
-* 
+*
 * @param channel The channel to be deleted.
-* @throws IOException Thrown if the deletion fails.
 */
-   public void deleteChannel(FileIOChannel.ID channel) throws IOException {
+   public void deleteChannel(ID channel) {
 
 Review comment:
   `static` is more restrictive like `final` so it provides more guaranties and 
simplifies future refactoring


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] azagrebin commented on a change in pull request #8646: [FLINK-12735][network] Make shuffle environment implementation independent with IOManager

2019-06-19 Thread GitBox
azagrebin commented on a change in pull request #8646: [FLINK-12735][network] 
Make shuffle environment implementation independent with IOManager
URL: https://github.com/apache/flink/pull/8646#discussion_r295368493
 
 

 ##
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/disk/FileChannelManager.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.disk;
+
+import org.apache.flink.runtime.io.disk.iomanager.FileIOChannel.Enumerator;
+import org.apache.flink.runtime.io.disk.iomanager.FileIOChannel.ID;
+import org.apache.flink.util.FileUtils;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Random;
+import java.util.UUID;
+
+/**
+ * The manager used for creating/deleting file channels based on config temp 
dirs.
+ */
+public class FileChannelManager implements AutoCloseable {
+   private static final Logger LOG = 
LoggerFactory.getLogger(FileChannelManager.class);
+
+   /** The temporary directories for files. */
+   private final File[] paths;
 
 Review comment:
   the returned array in getter can be modified outside, because immutability 
cannot be enforced with arrays, alternative is to return a defensive copy


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12906) Port OperationTreeBuilder to Java

2019-06-19 Thread Dawid Wysakowicz (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-12906:
-
Affects Version/s: 1.9.0

> Port OperationTreeBuilder to Java
> -
>
> Key: FLINK-12906
> URL: https://issues.apache.org/jira/browse/FLINK-12906
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.9.0
>Reporter: Hequn Cheng
>Assignee: Dawid Wysakowicz
>Priority: Major
>
> As discussed with [~dawidwys], even we can't move it to the API module yet, 
> but we could already migrate it to java. This might be a good idea, because 
> some new features added there that use scala features, any such additions are 
> making the future migration harder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12906) Port OperationTreeBuilder to Java

2019-06-19 Thread Dawid Wysakowicz (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-12906:
-
Fix Version/s: 1.9.0

> Port OperationTreeBuilder to Java
> -
>
> Key: FLINK-12906
> URL: https://issues.apache.org/jira/browse/FLINK-12906
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.9.0
>Reporter: Hequn Cheng
>Assignee: Dawid Wysakowicz
>Priority: Major
> Fix For: 1.9.0
>
>
> As discussed with [~dawidwys], even we can't move it to the API module yet, 
> but we could already migrate it to java. This might be a good idea, because 
> some new features added there that use scala features, any such additions are 
> making the future migration harder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-12898) FlinkSQL job fails when rowTime field encounters dirty data

2019-06-19 Thread Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mai reassigned FLINK-12898:
---

Assignee: Mai  (was: Jun Zhang)

> FlinkSQL job fails when rowTime field encounters dirty data
> ---
>
> Key: FLINK-12898
> URL: https://issues.apache.org/jira/browse/FLINK-12898
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.7.2
>Reporter: Mai
>Assignee: Mai
>Priority: Minor
>
> I use FlinkSQL to process Kafka data in the following format:
>   
> |  id|  server_time|
> |  1 |2019-05-15 10:00:00|
> |  2 |2019-05-15 10:00:00|
> ...
>   
>  and I define rowtime from the  server_time field:
>  new Schema()
>      .field("rowtime", Types.SQL_TIMESTAMP)
>         .rowtime(new Rowtime().timestampsFromField("server_time"))
>      .field("id", Types.String)
>      .field("server_time", Types.String)
>   
>  when dirty data arrives, such as :
> |  id   |  server_time|
> |  99 |11.22.33.44 |
>  
>  My FlinkSQL job fails with exception:
> {code:java}
> java.lang.NumberFormatException: For input string: "11.22.33.44"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:580)
> at java.lang.Integer.parseInt(Integer.java:615)
> at 
> org.apache.calcite.avatica.util.DateTimeUtils.dateStringToUnixDate(DateTimeUtils.java:625)
> at 
> org.apache.calcite.avatica.util.DateTimeUtils.timestampStringToUnixDate(DateTimeUtils.java:715)
> at DataStreamSourceConversion$288.processElement(Unknown Source)
> at 
> org.apache.flink.table.runtime.CRowOutputProcessRunner.processElement(CRowOutputProcessRunner.scala:70)
> at 
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
> at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollectWithTimestamp(StreamSourceContexts.java:310)
> at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collectWithTimestamp(StreamSourceContexts.java:409)
> at 
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
> at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:187)
> at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:152)
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:665)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:94)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:58)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
> at java.lang.Thread.run(Thread.java:748){code}
>   
>  Because my flink job use EXACTLY_ONCE, so the job is re-executed from the 
> last checkpoint, consumes dirty data again, fails again, and keeps looping 
> like this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] aljoscha commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwritable

2019-06-19 Thread GitBox
aljoscha commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295336514
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sources/PartitionableTableSource.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sources;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Adds support for partition pruning to a [[TableSource]].
+ * A {@link TableSource} extending this interface is a partition table and 
able to
+ * prune partitions for reducing reading data and reducing reading partition
+ * metadata (especially there are thousands of partitions).
+ *
+ * Partitions is represented as a {@code Map} which maps 
from partition-key
+ * to value. The partition columns and values are NOT of strict order in the 
Map. The correct
+ * order of partition fields should be provided by {@link #getPartitionKeys()}.
+ */
+public interface PartitionableTableSource {
+
+   /**
+* Gets all partitions as key-value map belong to this table.
 
 Review comment:
   ```suggestion
 * Returns the partitions of this {@link PartitionableTableSource}.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] aljoscha commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwritable

2019-06-19 Thread GitBox
aljoscha commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295337309
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sources/PartitionableTableSource.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sources;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Adds support for partition pruning to a [[TableSource]].
+ * A {@link TableSource} extending this interface is a partition table and 
able to
+ * prune partitions for reducing reading data and reducing reading partition
+ * metadata (especially there are thousands of partitions).
+ *
+ * Partitions is represented as a {@code Map} which maps 
from partition-key
+ * to value. The partition columns and values are NOT of strict order in the 
Map. The correct
+ * order of partition fields should be provided by {@link #getPartitionKeys()}.
+ */
+public interface PartitionableTableSource {
+
+   /**
+* Gets all partitions as key-value map belong to this table.
+*/
+   List> getAllPartitions();
+
+   /**
+* Get the partition keys of the table. This should be an empty set if 
the table
 
 Review comment:
   This should probably specify that the keys are "sorted", i.e. they have the 
order as specified in the PARTITION statement.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] aljoscha commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwritable

2019-06-19 Thread GitBox
aljoscha commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295336825
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sources/PartitionableTableSource.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sources;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Adds support for partition pruning to a [[TableSource]].
+ * A {@link TableSource} extending this interface is a partition table and 
able to
+ * prune partitions for reducing reading data and reducing reading partition
+ * metadata (especially there are thousands of partitions).
+ *
+ * Partitions is represented as a {@code Map} which maps 
from partition-key
+ * to value. The partition columns and values are NOT of strict order in the 
Map. The correct
+ * order of partition fields should be provided by {@link #getPartitionKeys()}.
+ */
+public interface PartitionableTableSource {
+
+   /**
+* Gets all partitions as key-value map belong to this table.
+*/
+   List> getAllPartitions();
 
 Review comment:
   maybe `getPartitions()`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] aljoscha commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwritable

2019-06-19 Thread GitBox
aljoscha commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295323293
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sinks/PartitionableTableSink.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sinks;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * An abstract class with trait about partitionable table sink. This is mainly 
used for
+ * static partitions. For sql statement:
+ * 
+ * 
+ * INSERT INTO A PARTITION(a='ab', b='cd') select c, d from B
+ * 
+ * 
+ * We Assume the A has partition columns as a, b, c.
+ * The columns a and b are called static partition columns, 
while c is called
+ * dynamic partition column.
+ *
+ * Note: Current class implementation don't support partition pruning which 
means constant
+ * partition columns will still be kept in result row.
+ */
+public interface PartitionableTableSink {
+
+   /**
+* Get the partition keys of the table. This should be an empty set if 
the table is not partitioned.
+*
+* @return partition keys of the table
+*/
+   List getPartitionKeys();
+
+   /**
+* Sets the static partitions into the {@link TableSink}.
+* @param partitions mapping from static partition column names to 
string literal values.
+*  String literals will be quoted using {@code '}, 
for example,
 
 Review comment:
   Why are the strings quoted using `''`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] aljoscha commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwritable

2019-06-19 Thread GitBox
aljoscha commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295324969
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sinks/PartitionableTableSink.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sinks;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * An abstract class with trait about partitionable table sink. This is mainly 
used for
+ * static partitions. For sql statement:
+ * 
+ * 
+ * INSERT INTO A PARTITION(a='ab', b='cd') select c, d from B
+ * 
+ * 
+ * We Assume the A has partition columns as a, b, c.
+ * The columns a and b are called static partition columns, 
while c is called
 
 Review comment:
   How can `c` be a dynamic column? According to Hive documentation 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-DynamicPartitionInserts)
 dynamic partitions need to be last in the `SELECT` clause.
   
   (This could just be my ignorance, since I am not too familiar with Hive.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] aljoscha commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwritable

2019-06-19 Thread GitBox
aljoscha commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295324572
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sinks/PartitionableTableSink.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sinks;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * An abstract class with trait about partitionable table sink. This is mainly 
used for
+ * static partitions. For sql statement:
+ * 
+ * 
+ * INSERT INTO A PARTITION(a='ab', b='cd') select c, d from B
 
 Review comment:
   I think using `'ab'` and `'cd'` as examples here is a bit confusing because 
both `a` and `b` appear multiple times here, both as keys and values. This took 
me some time to understand.  It would be better to have something like 
`a='foo', b='bar'`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] aljoscha commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwritable

2019-06-19 Thread GitBox
aljoscha commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295323022
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sinks/PartitionableTableSink.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sinks;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * An abstract class with trait about partitionable table sink. This is mainly 
used for
+ * static partitions. For sql statement:
+ * 
+ * 
+ * INSERT INTO A PARTITION(a='ab', b='cd') select c, d from B
+ * 
+ * 
+ * We Assume the A has partition columns as a, b, c.
+ * The columns a and b are called static partition columns, 
while c is called
+ * dynamic partition column.
+ *
+ * Note: Current class implementation don't support partition pruning which 
means constant
+ * partition columns will still be kept in result row.
+ */
+public interface PartitionableTableSink {
+
+   /**
+* Get the partition keys of the table. This should be an empty set if 
the table is not partitioned.
+*
+* @return partition keys of the table
+*/
+   List getPartitionKeys();
+
+   /**
+* Sets the static partitions into the {@link TableSink}.
 
 Review comment:
   nit: missing newline


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] aljoscha commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwritable

2019-06-19 Thread GitBox
aljoscha commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295320917
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sinks/PartitionableTableSink.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sinks;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * An abstract class with trait about partitionable table sink. This is mainly 
used for
 
 Review comment:
   This should be something like `An interface for partitionable {@link 
TableSink TableSinks}.` What does "mainly for static partitions" mean? We 
should clarify that, I think that's also the point of the above comment by 
@lirui-apache.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] aljoscha commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwritable

2019-06-19 Thread GitBox
aljoscha commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295338639
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sources/PartitionableTableSource.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sources;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Adds support for partition pruning to a [[TableSource]].
+ * A {@link TableSource} extending this interface is a partition table and 
able to
+ * prune partitions for reducing reading data and reducing reading partition
+ * metadata (especially there are thousands of partitions).
+ *
+ * Partitions is represented as a {@code Map} which maps 
from partition-key
+ * to value. The partition columns and values are NOT of strict order in the 
Map. The correct
+ * order of partition fields should be provided by {@link #getPartitionKeys()}.
+ */
+public interface PartitionableTableSource {
+
+   /**
+* Gets all partitions as key-value map belong to this table.
+*/
+   List> getAllPartitions();
+
+   /**
+* Get the partition keys of the table. This should be an empty set if 
the table
+* is not partitioned.
+*
+* @return partition keys of the table
+*/
+   List getPartitionKeys();
+
+   /**
+* Return the flag to indicate whether partition pruning has been 
tried. Must return true on
+* the returned instance of {@link #applyPartitionPruning(List)}.
+*/
+   boolean isPartitionPruned();
+
+   /**
+* Applies the remaining partitions to the table source. The {@code 
remainingPartitions} is
+* the remaining partitions of {@link #getAllPartitions()} after 
partition pruning applied.
+*
+* After trying to apply partition pruning, we should return a new 
{@link TableSource}
+* instance which holds all pruned-partitions. Even if we actually 
pruned nothing,
+* it is recommended that we still return a new {@link TableSource} 
instance since we will
+* mark the returned instance as partition pruned and will never try 
again.
+*
+* @param remainingPartitions Remaining partitions after partition 
pruning applied.
+* @return A new cloned instance of {@link TableSource}.
+*/
+   TableSource applyPartitionPruning(List> 
remainingPartitions);
 
 Review comment:
   I'm not so sure about these last two methods? Is the problem that we could 
into an infinite loop if we don't have `isPartitionPruned()`? Otherwise 
`applyPartitionPruning()` could just be a no-op on the returned `TableSource`, 
that could be simpler interface.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] aljoscha commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwritable

2019-06-19 Thread GitBox
aljoscha commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295328327
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sinks/PartitionableTableSink.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sinks;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * An abstract class with trait about partitionable table sink. This is mainly 
used for
+ * static partitions. For sql statement:
+ * 
+ * 
+ * INSERT INTO A PARTITION(a='ab', b='cd') select c, d from B
+ * 
+ * 
+ * We Assume the A has partition columns as a, b, c.
+ * The columns a and b are called static partition columns, 
while c is called
+ * dynamic partition column.
+ *
+ * Note: Current class implementation don't support partition pruning which 
means constant
+ * partition columns will still be kept in result row.
+ */
+public interface PartitionableTableSink {
+
+   /**
+* Get the partition keys of the table. This should be an empty set if 
the table is not partitioned.
+*
+* @return partition keys of the table
+*/
+   List getPartitionKeys();
+
+   /**
+* Sets the static partitions into the {@link TableSink}.
+* @param partitions mapping from static partition column names to 
string literal values.
+*  String literals will be quoted using {@code '}, 
for example,
+*  value {@code abc} will be stored as {@code 
'abc'} with quotes around.
+*/
+   void setStaticPartitions(Map partitions);
+
+   /**
+* If true, all records would be sort with partition fields before 
output, for some sinks, this
+* can be used to reduce the partition writers, that means the sink 
will accept data
+* one partition at a time.
+*
+* A sink should consider whether to override this especially when 
it needs buffer
+* data before writing.
+*
+* Notes:
+* 1. If returns true, the output data will be sorted 
locally after partitioning.
+* 2. Default returns true, if the table is partitioned.
+*/
+   default boolean sortLocalPartition() {
 
 Review comment:
   Is this a _requirement_ or a _request_, i.e. when this returns true does the 
data have to be sorted by partition and the sink would otherwise produce 
incorrect output or is it a nice request but the sink still works if the data 
is not sorted? We should put this clearer in the comment and the method name. 
I.e. `requiresPartitionGrouping()` or `canUsePartitionGrouping()`. (not sure on 
the name of the second one).
   
   Also, it's not really sorting by partitions but grouping by partitions, 
right? Which you can achieve using a sort. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] aljoscha commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwritable

2019-06-19 Thread GitBox
aljoscha commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295336284
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sources/PartitionableTableSource.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sources;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Adds support for partition pruning to a [[TableSource]].
 
 Review comment:
   This could be improved to make it a bit more concise:
   ```
   /*
* An interface for partitionable {@link TableSink TableSinks}.
*
* A {@link PartitionableTableSource} can exclude partitions from 
reading, which
* includes skipping the metadata. This is especially useful when there are 
thousands
* of partitions in a table.
*
* Partitions are represented as a {@code Map} which maps 
from partition key
* to partition value. Since the map is NOT ordered, the correct order of 
partition fields should be obtained via 
* {@link #getPartitionKeys()}.
*/
   ```
   (line breaks are probably off in this)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-12907) flink-table-planner-blink fails to compile with scala 2.12

2019-06-19 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-12907:
--
Fix Version/s: 1.9.0

> flink-table-planner-blink fails to compile with scala 2.12
> --
>
> Key: FLINK-12907
> URL: https://issues.apache.org/jira/browse/FLINK-12907
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0
>Reporter: Dawid Wysakowicz
>Priority: Blocker
> Fix For: 1.9.0
>
>
> {code}
> 14:03:15.204 [ERROR] 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/plan/nodes/resource/ExecNodeResourceTest.scala:269:
>  error: overriding method getOutputType in trait TableSink of type 
> ()org.apache.flink.api.common.typeinfo.TypeInformation[org.apache.flink.table.dataformat.BaseRow];
> 14:03:15.204 [ERROR]  method getOutputType needs `override' modifier
> 14:03:15.204 [ERROR]   @deprecated def getOutputType: 
> TypeInformation[BaseRow] = {
> 14:03:15.204 [ERROR]   ^
> 14:03:15.217 [ERROR] 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/plan/nodes/resource/ExecNodeResourceTest.scala:275:
>  error: overriding method getFieldNames in trait TableSink of type 
> ()Array[String];
> 14:03:15.217 [ERROR]  method getFieldNames needs `override' modifier
> 14:03:15.217 [ERROR]   @deprecated def getFieldNames: Array[String] = 
> schema.getFieldNames
> 14:03:15.217 [ERROR]   ^
> 14:03:15.219 [ERROR] 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/plan/nodes/resource/ExecNodeResourceTest.scala:280:
>  error: overriding method getFieldTypes in trait TableSink of type 
> ()Array[org.apache.flink.api.common.typeinfo.TypeInformation[_]];
> 14:03:15.219 [ERROR]  method getFieldTypes needs `override' modifier
> 14:03:15.219 [ERROR]   @deprecated def getFieldTypes: 
> Array[TypeInformation[_]] = schema.getFieldTypes
> {code}
> https://api.travis-ci.org/v3/job/547655787/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12907) flink-table-planner-blink fails to compile with scala 2.12

2019-06-19 Thread Till Rohrmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-12907:
--
Affects Version/s: 1.9.0

> flink-table-planner-blink fails to compile with scala 2.12
> --
>
> Key: FLINK-12907
> URL: https://issues.apache.org/jira/browse/FLINK-12907
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0
>Reporter: Dawid Wysakowicz
>Priority: Blocker
>
> {code}
> 14:03:15.204 [ERROR] 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/plan/nodes/resource/ExecNodeResourceTest.scala:269:
>  error: overriding method getOutputType in trait TableSink of type 
> ()org.apache.flink.api.common.typeinfo.TypeInformation[org.apache.flink.table.dataformat.BaseRow];
> 14:03:15.204 [ERROR]  method getOutputType needs `override' modifier
> 14:03:15.204 [ERROR]   @deprecated def getOutputType: 
> TypeInformation[BaseRow] = {
> 14:03:15.204 [ERROR]   ^
> 14:03:15.217 [ERROR] 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/plan/nodes/resource/ExecNodeResourceTest.scala:275:
>  error: overriding method getFieldNames in trait TableSink of type 
> ()Array[String];
> 14:03:15.217 [ERROR]  method getFieldNames needs `override' modifier
> 14:03:15.217 [ERROR]   @deprecated def getFieldNames: Array[String] = 
> schema.getFieldNames
> 14:03:15.217 [ERROR]   ^
> 14:03:15.219 [ERROR] 
> /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/plan/nodes/resource/ExecNodeResourceTest.scala:280:
>  error: overriding method getFieldTypes in trait TableSink of type 
> ()Array[org.apache.flink.api.common.typeinfo.TypeInformation[_]];
> 14:03:15.219 [ERROR]  method getFieldTypes needs `override' modifier
> 14:03:15.219 [ERROR]   @deprecated def getFieldTypes: 
> Array[TypeInformation[_]] = schema.getFieldTypes
> {code}
> https://api.travis-ci.org/v3/job/547655787/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [flink] dawidwys merged pull request #8796: [hotfix] Fix error message in BatchTableSourceScan

2019-06-19 Thread GitBox
dawidwys merged pull request #8796: [hotfix] Fix error message in 
BatchTableSourceScan
URL: https://github.com/apache/flink/pull/8796
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] dawidwys commented on issue #8796: [hotfix] Fix error message in BatchTableSourceScan

2019-06-19 Thread GitBox
dawidwys commented on issue #8796: [hotfix] Fix error message in 
BatchTableSourceScan
URL: https://github.com/apache/flink/pull/8796#issuecomment-503587955
 
 
   Good catch @sjwiesman! Merging


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] flinkbot commented on issue #8796: [hotfix] Fix error message in BatchTableSourceScan

2019-06-19 Thread GitBox
flinkbot commented on issue #8796: [hotfix] Fix error message in 
BatchTableSourceScan
URL: https://github.com/apache/flink/pull/8796#issuecomment-503586478
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] sjwiesman commented on issue #8796: [hotfix] Fix error message in BatchTableSourceScan

2019-06-19 Thread GitBox
sjwiesman commented on issue #8796: [hotfix] Fix error message in 
BatchTableSourceScan
URL: https://github.com/apache/flink/pull/8796#issuecomment-503584971
 
 
   cc @dawidwys 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] sjwiesman opened a new pull request #8796: [hotfix] Fix error message in BatchTableSourceScan

2019-06-19 Thread GitBox
sjwiesman opened a new pull request #8796: [hotfix] Fix error message in 
BatchTableSourceScan
URL: https://github.com/apache/flink/pull/8796
 
 
   ## What is the purpose of the change
   
   Trivial fix so that the error message outputs the correct types. 
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   3   4   >