date:20190619

[jira] [Updated] (FLINK-12889) Job keeps in FAILING state

2019-06-19 Thread Fan Xinpu (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fan Xinpu updated FLINK-12889:
--
Priority: Blocker  (was: Major)

> Job keeps in FAILING state
> --
>
> Key: FLINK-12889
> URL: https://issues.apache.org/jira/browse/FLINK-12889
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.7.2
>Reporter: Fan Xinpu
>Priority: Blocker
> Attachments: 20190618104945417.jpg, jobmanager.log.2019-06-16.0, 
> taskmanager.log
>
>
> There is a topology of 3 operator, such as, source, parser, and persist. 
> Occasionally, 5 subtasks of the source encounters exception and turns to 
> failed, at the same time, one subtask of the parser runs into exception and 
> turns to failed too. The jobmaster gets a message of the parser's failed. The 
> jobmaster then try to cancel all the subtask, most of the subtasks of the 
> three operator turns to canceled except the 5 subtasks of the source, because 
> the state of the 5 ones is already FAILED before jobmaster try to cancel it. 
> Then the jobmaster can not reach a final state but keeps in  Failing state 
> meanwhile the subtask of the source kees in canceling state. 
>  
> The job run on a flink 1.7 cluster on yarn, and there is only one tm with 10 
> slots.
>  
> The attached files contains a jm log , tm log and the ui picture.
>  
> The exception timestamp is about 2019-06-16 13:42:28.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-12896) modify :jobId key in TaskCheckpointStatisticDetailsHandler for History Server

2019-06-19 Thread xymaqingxiang (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xymaqingxiang updated FLINK-12896:
--
Attachment: (was: image-2019-06-19-15-33-12-243.png)

> modify :jobId key in TaskCheckpointStatisticDetailsHandler for History Server
> -
>
> Key: FLINK-12896
> URL: https://issues.apache.org/jira/browse/FLINK-12896
> Project: Flink
>  Issue Type: Bug
>Reporter: xymaqingxiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2019-06-19-15-32-15-994.png, 
> image-2019-06-19-15-40-08-481.png, image-2019-06-19-15-41-48-051.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are 2 bugs, as follows:
> 1. could not found the checkpoint details for subtasks.
> !image-2019-06-19-15-32-15-994.png!
> 2. The jobs directory has an exception: job directory, the ArchivedJson we 
> get in FsJobArchivist is wrong.
> !image-2019-06-19-15-40-08-481.png!
> !image-2019-06-19-15-41-48-051.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-12896) modify :jobId key in TaskCheckpointStatisticDetailsHandler for History Server

2019-06-19 Thread xymaqingxiang (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xymaqingxiang updated FLINK-12896:
--
Description: 
There are 2 bugs, as follows:

1. could not found the checkpoint details for subtasks.

!image-2019-06-19-15-32-15-994.png!

2. The jobs directory has an exception: job directory, the ArchivedJson we get 
in FsJobArchivist is wrong.

!image-2019-06-19-15-40-08-481.png!

!image-2019-06-19-15-41-48-051.png!

 

  was:
There are three bugs, as follows:

1. could not found the metrics for vertices.

!image-2019-06-19-15-33-12-243.png!

2. could not found the checkpoint details for subtasks.

!image-2019-06-19-15-32-15-994.png!

3. The jobs directory has an exception: job directory, the ArchivedJson we get 
in FsJobArchivist is wrong.

!image-2019-06-19-15-40-08-481.png!

!image-2019-06-19-15-41-48-051.png!

 


> modify :jobId key in TaskCheckpointStatisticDetailsHandler for History Server
> -
>
> Key: FLINK-12896
> URL: https://issues.apache.org/jira/browse/FLINK-12896
> Project: Flink
>  Issue Type: Bug
>Reporter: xymaqingxiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2019-06-19-15-32-15-994.png, 
> image-2019-06-19-15-40-08-481.png, image-2019-06-19-15-41-48-051.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are 2 bugs, as follows:
> 1. could not found the checkpoint details for subtasks.
> !image-2019-06-19-15-32-15-994.png!
> 2. The jobs directory has an exception: job directory, the ArchivedJson we 
> get in FsJobArchivist is wrong.
> !image-2019-06-19-15-40-08-481.png!
> !image-2019-06-19-15-41-48-051.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-12896) modify :jobId key in TaskCheckpointStatisticDetailsHandler for History Server

2019-06-19 Thread xymaqingxiang (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xymaqingxiang updated FLINK-12896:
--
Summary: modify :jobId key in TaskCheckpointStatisticDetailsHandler for 
History Server  (was: Missing some information in History Server)

> modify :jobId key in TaskCheckpointStatisticDetailsHandler for History Server
> -
>
> Key: FLINK-12896
> URL: https://issues.apache.org/jira/browse/FLINK-12896
> Project: Flink
>  Issue Type: Bug
>Reporter: xymaqingxiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2019-06-19-15-32-15-994.png, 
> image-2019-06-19-15-33-12-243.png, image-2019-06-19-15-40-08-481.png, 
> image-2019-06-19-15-41-48-051.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are three bugs, as follows:
> 1. could not found the metrics for vertices.
> !image-2019-06-19-15-33-12-243.png!
> 2. could not found the checkpoint details for subtasks.
> !image-2019-06-19-15-32-15-994.png!
> 3. The jobs directory has an exception: job directory, the ArchivedJson we 
> get in FsJobArchivist is wrong.
> !image-2019-06-19-15-40-08-481.png!
> !image-2019-06-19-15-41-48-051.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] wuchong commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and OverwritableT

2019-06-19 Thread GitBox

wuchong commented on a change in pull request #8695: 
[FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce 
PartitionableTableSource and PartitionableTableSink and OverwritableTableSink
URL: https://github.com/apache/flink/pull/8695#discussion_r295666375
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/sinks/PartitionableTableSink.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.sinks;
+
+import java.util.List;
+import java.util.Map;
+
+/**
+ * An abstract class with trait about partitionable table sink. This is mainly 
used for
+ * static partitions. For sql statement:
+ * 
+ * 
+ * INSERT INTO A PARTITION(a='ab', b='cd') select c, d from B
+ * 
+ * 
+ * We Assume the A has partition columns as , , .
+ * The columns  and  are called static partition columns, 
while c is called
+ * dynamic partition column.
+ *
+ * Note: Current class implementation don't support partition pruning which 
means constant
+ * partition columns will still be kept in result row.
+ */
+public interface PartitionableTableSink {
+
+   /**
+* Get the partition keys of the table. This should be an empty set if 
the table is not partitioned.
+*
+* @return partition keys of the table
+*/
+   List getPartitionKeys();
+
+   /**
+* Sets the static partitions into the {@link TableSink}.
+* @param partitions mapping from static partition column names to 
string literal values.
+*  String literals will be quoted using {@code '}, 
for example,
 
 Review comment:
   This is a mistake. I think the key-value meaning should align with 
`CatalogPartitionSpec`. If I understand correctly, all the values are encoded 
as string (i.e. encoded using `String.valueOf(...)`). For example, there is a 
static partition `f0=1024, f1="foo", f2="bar"`.  `f0` is an integer type, `f1` 
and `f2` are string types. They will all be encoded as strings: "1024", "foo", 
"bar". And will be decoded to original type literals based on the field types. 
Do I understand correctly? @bowenli86 
   
   I will update this to the javadoc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] aloyszhang commented on issue #8737: [FLINK-12848][core] Consider fieldNames in RowTypeInfo#equals()

2019-06-19 Thread GitBox

aloyszhang commented on issue #8737: [FLINK-12848][core] Consider fieldNames in RowTypeInfo#equals() URL: https://github.com/apache/flink/pull/8737#issuecomment-503900965 @vim345 Adding the fieldNames in equals method of RowTypeInfo will make test in ExternalCatalogInsertTest failed. So, we should not fix this problem this way. BTW , flink-1.9 has no problem described in [FLINK-12834](https://issues.apache.org/jira/browse/FLINK-12848). I think this problem has been fixed in 1.9. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] synckey commented on issue #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

synckey commented on issue #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#issuecomment-503891781 > Hi, @synckey , thanks for great help, all comments have been updated. Thank you~. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] yuezhuangshi commented on issue #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

yuezhuangshi commented on issue #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#issuecomment-503891361 Hi, @synckey , thanks for great help, all comments have been updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox

synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables URL: https://github.com/apache/flink/pull/8766#discussion_r295656992 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSink.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.batch.connectors.hive; + +import org.apache.flink.api.common.io.OutputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.catalog.exceptions.CatalogException; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientFactory; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper; +import org.apache.flink.table.catalog.hive.client.HiveShimLoader; +import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator; +import org.apache.flink.table.sinks.OutputFormatTableSink; +import org.apache.flink.table.sinks.TableSink; +import org.apache.flink.types.Row; +import org.apache.flink.util.FlinkRuntimeException; +import org.apache.flink.util.Preconditions; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.MetaStoreUtils; +import org.apache.hadoop.hive.metastore.Warehouse; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.mapred.JobConf; +import org.apache.thrift.TException; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; + +/** + * Table sink to write to Hive tables. + */ +public class HiveTableSink extends OutputFormatTableSink { + + private final JobConf jobConf; + private final RowTypeInfo rowTypeInfo; + private final String dbName; + private final String tableName; + private final List partitionColumns; + private final String hiveVersion; + + // TODO: need OverwritableTableSink to configure this + private boolean overwrite = false; Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#discussion_r295656424 ## File path: docs/dev/projectsetup/java_api_quickstart.zh.md ## @@ -219,15 +215,15 @@ configurations { // declare the dependencies for your production and test code dependencies { // -- -// Compile-time dependencies that should NOT be part of the -// shadow jar and are provided in the lib folder of Flink +// 编译时依赖不应该包含在 shadow jar 中， +// 这些依赖会在 Flink 的 lib 目录中提供。 // -- compile "org.apache.flink:flink-java:${flinkVersion}" compile "org.apache.flink:flink-streaming-java_${scalaBinaryVersion}:${flinkVersion}" // -- -// Dependencies that should be part of the shadow jar, e.g. -// connectors. These must be in the flinkShadowJar configuration! +// 应该包含在 shadow jar 中的依赖，例如：连接器。 +// 这些必须在 flinkShadowJar 的配置中！ Review comment: Couldn't agree more. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] yuezhuangshi commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

yuezhuangshi commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#discussion_r295656203 ## File path: docs/dev/projectsetup/java_api_quickstart.zh.md ## @@ -74,16 +74,17 @@ Use one of the following commands to __create a project__: {% unless site.is_stable %} -Note: For Maven 3.0 or higher, it is no longer possible to specify the repository (-DarchetypeCatalog) via the command line. If you wish to use the snapshot repository, you need to add a repository entry to your settings.xml. For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html";>Maven official document +注意：对于 Maven 3.0 及更高版本，不再支持通过命令行指定仓库（-DarchetypeCatalog）。 Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] yuezhuangshi commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

yuezhuangshi commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#discussion_r295656113 ## File path: docs/dev/projectsetup/java_api_quickstart.zh.md ## @@ -219,15 +215,15 @@ configurations { // declare the dependencies for your production and test code dependencies { // -- -// Compile-time dependencies that should NOT be part of the -// shadow jar and are provided in the lib folder of Flink +// 编译时依赖不应该包含在 shadow jar 中， +// 这些依赖会在 Flink 的 lib 目录中提供。 // -- compile "org.apache.flink:flink-java:${flinkVersion}" compile "org.apache.flink:flink-streaming-java_${scalaBinaryVersion}:${flinkVersion}" // -- -// Dependencies that should be part of the shadow jar, e.g. -// connectors. These must be in the flinkShadowJar configuration! +// 应该包含在 shadow jar 中的依赖，例如：连接器。 +// 这些必须在 flinkShadowJar 的配置中！ Review comment: Maybe `它们` is better in chinese, do you agree? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#discussion_r295646328 ## File path: docs/dev/projectsetup/java_api_quickstart.zh.md ## @@ -219,15 +215,15 @@ configurations { // declare the dependencies for your production and test code dependencies { // -- -// Compile-time dependencies that should NOT be part of the -// shadow jar and are provided in the lib folder of Flink +// 编译时依赖不应该包含在 shadow jar 中， +// 这些依赖会在 Flink 的 lib 目录中提供。 // -- compile "org.apache.flink:flink-java:${flinkVersion}" compile "org.apache.flink:flink-streaming-java_${scalaBinaryVersion}:${flinkVersion}" // -- -// Dependencies that should be part of the shadow jar, e.g. -// connectors. These must be in the flinkShadowJar configuration! +// 应该包含在 shadow jar 中的依赖，例如：连接器。 +// 这些必须在 flinkShadowJar 的配置中！ Review comment: What about `这些` instead of `他们`？ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#discussion_r295655540 ## File path: docs/dev/projectsetup/java_api_quickstart.zh.md ## @@ -219,15 +215,15 @@ configurations { // declare the dependencies for your production and test code dependencies { // -- -// Compile-time dependencies that should NOT be part of the -// shadow jar and are provided in the lib folder of Flink +// 编译时依赖不应该包含在 shadow jar 中， +// 这些依赖会在 Flink 的 lib 目录中提供。 // -- compile "org.apache.flink:flink-java:${flinkVersion}" compile "org.apache.flink:flink-streaming-java_${scalaBinaryVersion}:${flinkVersion}" // -- -// Dependencies that should be part of the shadow jar, e.g. -// connectors. These must be in the flinkShadowJar configuration! +// 应该包含在 shadow jar 中的依赖，例如：连接器。 +// 这些必须在 flinkShadowJar 的配置中！ Review comment: Oh sorry, yes `这些`->`他们`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] sunjincheng121 commented on issue #8532: [FLINK-12541][REST] Support to submit Python Table API jobs via REST API

2019-06-19 Thread GitBox

sunjincheng121 commented on issue #8532: [FLINK-12541][REST] Support to submit Python Table API jobs via REST API URL: https://github.com/apache/flink/pull/8532#issuecomment-503888610 @tillrohrmann do you still think we should not merge this change into 1.9? just double check and confirm. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox

synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables URL: https://github.com/apache/flink/pull/8766#discussion_r295654901 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSink.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.batch.connectors.hive; + +import org.apache.flink.api.common.io.OutputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.catalog.exceptions.CatalogException; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientFactory; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper; +import org.apache.flink.table.catalog.hive.client.HiveShimLoader; +import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator; +import org.apache.flink.table.sinks.OutputFormatTableSink; +import org.apache.flink.table.sinks.TableSink; +import org.apache.flink.types.Row; +import org.apache.flink.util.FlinkRuntimeException; +import org.apache.flink.util.Preconditions; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.MetaStoreUtils; +import org.apache.hadoop.hive.metastore.Warehouse; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.mapred.JobConf; +import org.apache.thrift.TException; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; + +/** + * Table sink to write to Hive tables. + */ +public class HiveTableSink extends OutputFormatTableSink { + + private final JobConf jobConf; + private final RowTypeInfo rowTypeInfo; + private final String dbName; + private final String tableName; + private final List partitionColumns; + private final String hiveVersion; + + // TODO: need OverwritableTableSink to configure this + private boolean overwrite = false; + + public HiveTableSink(JobConf jobConf, RowTypeInfo rowTypeInfo, String dbName, String tableName, + List partitionColumns) { + this.jobConf = jobConf; + this.rowTypeInfo = rowTypeInfo; + this.dbName = dbName; + this.tableName = tableName; + this.partitionColumns = partitionColumns; + hiveVersion = jobConf.get(HiveCatalogValidator.CATALOG_HIVE_VERSION, HiveShimLoader.getHiveVersion()); Review comment: `this` is used on the initialization of all the other fields except for hiveVersion, just for code lint. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox

synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables URL: https://github.com/apache/flink/pull/8766#discussion_r295654935 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSink.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.batch.connectors.hive; + +import org.apache.flink.api.common.io.OutputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.catalog.exceptions.CatalogException; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientFactory; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper; +import org.apache.flink.table.catalog.hive.client.HiveShimLoader; +import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator; +import org.apache.flink.table.sinks.OutputFormatTableSink; +import org.apache.flink.table.sinks.TableSink; +import org.apache.flink.types.Row; +import org.apache.flink.util.FlinkRuntimeException; +import org.apache.flink.util.Preconditions; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.MetaStoreUtils; +import org.apache.hadoop.hive.metastore.Warehouse; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.mapred.JobConf; +import org.apache.thrift.TException; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; + +/** + * Table sink to write to Hive tables. + */ +public class HiveTableSink extends OutputFormatTableSink { + + private final JobConf jobConf; + private final RowTypeInfo rowTypeInfo; + private final String dbName; + private final String tableName; + private final List partitionColumns; + private final String hiveVersion; + + // TODO: need OverwritableTableSink to configure this + private boolean overwrite = false; + + public HiveTableSink(JobConf jobConf, RowTypeInfo rowTypeInfo, String dbName, String tableName, + List partitionColumns) { + this.jobConf = jobConf; + this.rowTypeInfo = rowTypeInfo; + this.dbName = dbName; + this.tableName = tableName; + this.partitionColumns = partitionColumns; + hiveVersion = jobConf.get(HiveCatalogValidator.CATALOG_HIVE_VERSION, HiveShimLoader.getHiveVersion()); + } + + @Override + public OutputFormat getOutputFormat() { + boolean isPartitioned = partitionColumns != null && !partitionColumns.isEmpty(); + // TODO: need PartitionableTableSink to decide whether it's dynamic partitioning + boolean isDynamicPartition = isPartitioned; + try (HiveMetastoreClientWrapper client = HiveMetastoreClientFactory.create(new HiveConf(jobConf, HiveConf.class), hiveVersion)) { + Table table = client.getTable(dbName, tableName); + StorageDescriptor sd = table.getSd(); + // here we use the sdLocation to store the output path of the job, which is always a staging dir + String sdLocation = sd.getLocation(); + HiveTablePartition hiveTablePartition; + if (isPartitioned) { + // TODO: validate partition spec + // TODO: strip quotes in partition values + LinkedHashMap strippedPartSpec = new LinkedHashMap<>(); Review comment: Makes sense. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [flink] zhijiangW commented on a change in pull request #8646: [FLINK-12735][network] Make shuffle environment implementation independent with IOManager

2019-06-19 Thread GitBox

zhijiangW commented on a change in pull request #8646: [FLINK-12735][network] Make shuffle environment implementation independent with IOManager URL: https://github.com/apache/flink/pull/8646#discussion_r295654877 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/disk/FileChannelManager.java ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.io.disk; + +import org.apache.flink.runtime.io.disk.iomanager.FileIOChannel.Enumerator; +import org.apache.flink.runtime.io.disk.iomanager.FileIOChannel.ID; +import org.apache.flink.util.FileUtils; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.File; +import java.io.IOException; +import java.util.Random; +import java.util.UUID; + +/** + * The manager used for creating/deleting file channels based on config temp dirs. + */ +public class FileChannelManager implements AutoCloseable { + private static final Logger LOG = LoggerFactory.getLogger(FileChannelManager.class); + + /** The temporary directories for files. */ + private final File[] paths; Review comment: yes, got it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] zhijiangW commented on a change in pull request #8646: [FLINK-12735][network] Make shuffle environment implementation independent with IOManager

2019-06-19 Thread GitBox

zhijiangW commented on a change in pull request #8646: [FLINK-12735][network] Make shuffle environment implementation independent with IOManager URL: https://github.com/apache/flink/pull/8646#discussion_r295654659 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/disk/iomanager/IOManager.java ## @@ -120,41 +121,40 @@ public boolean isProperlyShutDown() { // /** -* Creates a new {@link FileIOChannel.ID} in one of the temp directories. Multiple -* invocations of this method spread the channels evenly across the different directories. +* Creates a new {@link ID} in one of the temp directories. Multiple invocations of this +* method spread the channels evenly across the different directories. * * @return A channel to a temporary directory. */ - public FileIOChannel.ID createChannel() { + public ID createChannel() { final int num = getNextPathNum(); - return new FileIOChannel.ID(this.paths[num], num, this.random); + return new ID(this.paths[num], num, this.random); } /** -* Creates a new {@link FileIOChannel.Enumerator}, spreading the channels in a round-robin fashion +* Creates a new {@link Enumerator}, spreading the channels in a round-robin fashion * across the temporary file directories. * * @return An enumerator for channels. */ - public FileIOChannel.Enumerator createChannelEnumerator() { - return new FileIOChannel.Enumerator(this.paths, this.random); + public Enumerator createChannelEnumerator() { + return new Enumerator(this.paths, this.random); } /** * Deletes the file underlying the given channel. If the channel is still open, this * call may fail. -* +* * @param channel The channel to be deleted. -* @throws IOException Thrown if the deletion fails. */ - public void deleteChannel(FileIOChannel.ID channel) throws IOException { + public void deleteChannel(ID channel) { Review comment: Yes, I think the other public methods in `IOManager` could also be static if so. But I wonder it might not belong to the scope this PR. How about refactoring it if needed future? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] lirui-apache commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox

lirui-apache commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables URL: https://github.com/apache/flink/pull/8766#discussion_r295654359 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSink.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.batch.connectors.hive; + +import org.apache.flink.api.common.io.OutputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.catalog.exceptions.CatalogException; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientFactory; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper; +import org.apache.flink.table.catalog.hive.client.HiveShimLoader; +import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator; +import org.apache.flink.table.sinks.OutputFormatTableSink; +import org.apache.flink.table.sinks.TableSink; +import org.apache.flink.types.Row; +import org.apache.flink.util.FlinkRuntimeException; +import org.apache.flink.util.Preconditions; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.MetaStoreUtils; +import org.apache.hadoop.hive.metastore.Warehouse; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.mapred.JobConf; +import org.apache.thrift.TException; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; + +/** + * Table sink to write to Hive tables. + */ +public class HiveTableSink extends OutputFormatTableSink { + + private final JobConf jobConf; + private final RowTypeInfo rowTypeInfo; + private final String dbName; + private final String tableName; + private final List partitionColumns; + private final String hiveVersion; + + // TODO: need OverwritableTableSink to configure this + private boolean overwrite = false; + + public HiveTableSink(JobConf jobConf, RowTypeInfo rowTypeInfo, String dbName, String tableName, + List partitionColumns) { + this.jobConf = jobConf; + this.rowTypeInfo = rowTypeInfo; + this.dbName = dbName; + this.tableName = tableName; + this.partitionColumns = partitionColumns; + hiveVersion = jobConf.get(HiveCatalogValidator.CATALOG_HIVE_VERSION, HiveShimLoader.getHiveVersion()); + } + + @Override + public OutputFormat getOutputFormat() { + boolean isPartitioned = partitionColumns != null && !partitionColumns.isEmpty(); + // TODO: need PartitionableTableSink to decide whether it's dynamic partitioning + boolean isDynamicPartition = isPartitioned; + try (HiveMetastoreClientWrapper client = HiveMetastoreClientFactory.create(new HiveConf(jobConf, HiveConf.class), hiveVersion)) { + Table table = client.getTable(dbName, tableName); + StorageDescriptor sd = table.getSd(); + // here we use the sdLocation to store the output path of the job, which is always a staging dir + String sdLocation = sd.getLocation(); + HiveTablePartition hiveTablePartition; + if (isPartitioned) { + // TODO: validate partition spec + // TODO: strip quotes in partition values + LinkedHashMap strippedPartSpec = new LinkedHashMap<>(); Review comment: Partition spec requires the partition columns in a specific order. So it's better to use `LinkedHashMap ` to explicitly indicate we need an ordered map here. -

[GitHub] [flink] yuezhuangshi commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

yuezhuangshi commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#discussion_r295654431 ## File path: docs/dev/projectsetup/java_api_quickstart.zh.md ## @@ -219,15 +215,15 @@ configurations { // declare the dependencies for your production and test code dependencies { // -- -// Compile-time dependencies that should NOT be part of the -// shadow jar and are provided in the lib folder of Flink +// 编译时依赖不应该包含在 shadow jar 中， +// 这些依赖会在 Flink 的 lib 目录中提供。 // -- compile "org.apache.flink:flink-java:${flinkVersion}" compile "org.apache.flink:flink-streaming-java_${scalaBinaryVersion}:${flinkVersion}" // -- -// Dependencies that should be part of the shadow jar, e.g. -// connectors. These must be in the flinkShadowJar configuration! +// 应该包含在 shadow jar 中的依赖，例如：连接器。 +// 这些必须在 flinkShadowJar 的配置中！ Review comment: `这些` instead of `他们`？ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] yuezhuangshi commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

yuezhuangshi commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#discussion_r295652737 ## File path: docs/dev/projectsetup/java_api_quickstart.zh.md ## @@ -304,51 +297,46 @@ quickstart/ └── log4j.properties {% endhighlight %} -The sample project is a __Gradle project__, which contains two classes: _StreamingJob_ and _BatchJob_ are the basic skeleton programs for a *DataStream* and *DataSet* program. -The _main_ method is the entry point of the program, both for in-IDE testing/execution and for proper deployments. +示例项目是一个 __Gradle 项目__，它包含了两个类：_StreamingJob_ 和 _BatchJob_ 是 *DataStream* 和 *DataSet* 程序的基础骨架程序。 +_main_ 方法是程序的入口，即可用于IDE测试/执行，也可用于部署。 -We recommend you __import this project into your IDE__ to develop and -test it. IntelliJ IDEA supports Gradle projects after installing the `Gradle` plugin. -Eclipse does so via the [Eclipse Buildship](https://projects.eclipse.org/projects/tools.buildship) plugin -(make sure to specify a Gradle version >= 3.0 in the last step of the import wizard; the `shadow` plugin requires it). -You may also use [Gradle's IDE integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration) -to create project files from Gradle. +我们建议你将 __此项目导入你的 IDE__ 来开发和测试它。 +IntelliJ IDEA 在安装 `Gradle` 插件后支持 Gradle 项目。Eclipse 则通过 [Eclipse Buildship](https://projects.eclipse +.org/projects/tools.buildship) 插件支持 Gradle 项目（鉴于 `shadow` 插件对 Gradle 版本有要求，请确保在导入向导的最后一步指定 Gradle 版本 >= 3.0）。 +你也可以使用 [Gradle’s IDE integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration) 从 Gradle +创建项目文件。 -*Please note*: The default JVM heapsize for Java may be too -small for Flink. You have to manually increase it. -In Eclipse, choose `Run Configurations -> Arguments` and write into the `VM Arguments` box: `-Xmx800m`. -In IntelliJ IDEA recommended way to change JVM options is from the `Help | Edit Custom VM Options` menu. See [this article](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties) for details. +*请注意*：对 Flink 来说，默认的 JVM 堆内存可能太小，你应当手动增加堆内存。 +在 Eclipse中，选择 `Run Configurations -> Arguments` 并在 `VM Arguments` 对应的输入框中写入：`-Xmx800m`。 +在 IntelliJ IDEA 中，推荐从菜单 `Help | Edit Custom VM Options` 来修改 JVM 选项。有关详细信息，请参阅[此文章](https://intellij-support +.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties)。 -### Build Project +### 构建项目 -If you want to __build/package your project__, go to your project directory and -run the '`gradle clean shadowJar`' command. -You will __find a JAR file__ that contains your application, plus connectors and libraries -that you may have added as dependencies to the application: `build/libs/--all.jar`. +如果你想要 __构建/打包项目__，请在项目目录下运行 '`gradle clean shadowJar`' 命令。 +命令执行后，你将 __找到一个 JAR 文件__，里面包含了你的应用程序，以及已作为依赖项添加到应用程序的连接器和库：`build/libs/--all.jar`。 -__Note:__ If you use a different class than *StreamingJob* as the application's main class / entry point, -we recommend you change the `mainClassName` setting in the `build.gradle` file accordingly. That way, Flink -can run the application from the JAR file without additionally specifying the main class. +__注意：__ 如果你使用其他类而不是 *StreamingJob* 作为应用程序的主类/入口， +我们建议你相应地修改 `build.gradle` 文件中的 `mainClassName` 配置。 +这样，Flink 可以从 JAR 文件运行应用程序，而无需另外指定主类。 -## Next Steps +## 下一步 -Write your application! +开始编写应用！ -If you are writing a streaming application and you are looking for inspiration what to write, -take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/tutorials/datastream_api.html#writing-a-flink-program). +如果你准备编写流处理应用，正在寻找灵感来写什么， +可以看看[流处理应用程序教程]({{ site.baseurl }}/zh/tutorials/datastream_api.html#writing-a-flink-program) -If you are writing a batch processing application and you are looking for inspiration what to write, -take a look at the [Batch Application Examples]({{ site.baseurl }}/dev/batch/examples.html). +如果你准备编写批处理应用，正在寻找灵感来写什么， +可以看看[批处理应用程序示例]({{ site.baseurl }}/zh/dev/batch/examples.html) -For a complete overview over the APIs, have a look at the -[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and -[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections. +有关 API 的完整概述，请查看 +[DataStream API]({{ site.baseurl }}/zh/dev/datastream_api.html) 和 +[DataSet API]({{ site.baseurl }}/zh/dev/batch/index.html) 章节。 -[Here]({{ site.baseurl }}/tutorials/local_setup.html) you can find out how to run an application outside the IDE on a local cluster. +在[这里]({{ site.baseurl }}/zh/tutorials/local_setup.html)，你可以找到如何在 IDE 之外的本地集群中运行应用程序。 Review comment: I think it can be assessed from this link : [https://ci.apache.org/projects/flink/flink-docs-master/zh/tutorials/local_setup.html](https://ci.apache.org/projects/flink/flink-docs-master/zh/tutorials/local_setup.

[GitHub] [flink] yuezhuangshi commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

yuezhuangshi commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#discussion_r295652737 ## File path: docs/dev/projectsetup/java_api_quickstart.zh.md ## @@ -304,51 +297,46 @@ quickstart/ └── log4j.properties {% endhighlight %} -The sample project is a __Gradle project__, which contains two classes: _StreamingJob_ and _BatchJob_ are the basic skeleton programs for a *DataStream* and *DataSet* program. -The _main_ method is the entry point of the program, both for in-IDE testing/execution and for proper deployments. +示例项目是一个 __Gradle 项目__，它包含了两个类：_StreamingJob_ 和 _BatchJob_ 是 *DataStream* 和 *DataSet* 程序的基础骨架程序。 +_main_ 方法是程序的入口，即可用于IDE测试/执行，也可用于部署。 -We recommend you __import this project into your IDE__ to develop and -test it. IntelliJ IDEA supports Gradle projects after installing the `Gradle` plugin. -Eclipse does so via the [Eclipse Buildship](https://projects.eclipse.org/projects/tools.buildship) plugin -(make sure to specify a Gradle version >= 3.0 in the last step of the import wizard; the `shadow` plugin requires it). -You may also use [Gradle's IDE integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration) -to create project files from Gradle. +我们建议你将 __此项目导入你的 IDE__ 来开发和测试它。 +IntelliJ IDEA 在安装 `Gradle` 插件后支持 Gradle 项目。Eclipse 则通过 [Eclipse Buildship](https://projects.eclipse +.org/projects/tools.buildship) 插件支持 Gradle 项目（鉴于 `shadow` 插件对 Gradle 版本有要求，请确保在导入向导的最后一步指定 Gradle 版本 >= 3.0）。 +你也可以使用 [Gradle’s IDE integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration) 从 Gradle +创建项目文件。 -*Please note*: The default JVM heapsize for Java may be too -small for Flink. You have to manually increase it. -In Eclipse, choose `Run Configurations -> Arguments` and write into the `VM Arguments` box: `-Xmx800m`. -In IntelliJ IDEA recommended way to change JVM options is from the `Help | Edit Custom VM Options` menu. See [this article](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties) for details. +*请注意*：对 Flink 来说，默认的 JVM 堆内存可能太小，你应当手动增加堆内存。 +在 Eclipse中，选择 `Run Configurations -> Arguments` 并在 `VM Arguments` 对应的输入框中写入：`-Xmx800m`。 +在 IntelliJ IDEA 中，推荐从菜单 `Help | Edit Custom VM Options` 来修改 JVM 选项。有关详细信息，请参阅[此文章](https://intellij-support +.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties)。 -### Build Project +### 构建项目 -If you want to __build/package your project__, go to your project directory and -run the '`gradle clean shadowJar`' command. -You will __find a JAR file__ that contains your application, plus connectors and libraries -that you may have added as dependencies to the application: `build/libs/--all.jar`. +如果你想要 __构建/打包项目__，请在项目目录下运行 '`gradle clean shadowJar`' 命令。 +命令执行后，你将 __找到一个 JAR 文件__，里面包含了你的应用程序，以及已作为依赖项添加到应用程序的连接器和库：`build/libs/--all.jar`。 -__Note:__ If you use a different class than *StreamingJob* as the application's main class / entry point, -we recommend you change the `mainClassName` setting in the `build.gradle` file accordingly. That way, Flink -can run the application from the JAR file without additionally specifying the main class. +__注意：__ 如果你使用其他类而不是 *StreamingJob* 作为应用程序的主类/入口， +我们建议你相应地修改 `build.gradle` 文件中的 `mainClassName` 配置。 +这样，Flink 可以从 JAR 文件运行应用程序，而无需另外指定主类。 -## Next Steps +## 下一步 -Write your application! +开始编写应用！ -If you are writing a streaming application and you are looking for inspiration what to write, -take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/tutorials/datastream_api.html#writing-a-flink-program). +如果你准备编写流处理应用，正在寻找灵感来写什么， +可以看看[流处理应用程序教程]({{ site.baseurl }}/zh/tutorials/datastream_api.html#writing-a-flink-program) -If you are writing a batch processing application and you are looking for inspiration what to write, -take a look at the [Batch Application Examples]({{ site.baseurl }}/dev/batch/examples.html). +如果你准备编写批处理应用，正在寻找灵感来写什么， +可以看看[批处理应用程序示例]({{ site.baseurl }}/zh/dev/batch/examples.html) -For a complete overview over the APIs, have a look at the -[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and -[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections. +有关 API 的完整概述，请查看 +[DataStream API]({{ site.baseurl }}/zh/dev/datastream_api.html) 和 +[DataSet API]({{ site.baseurl }}/zh/dev/batch/index.html) 章节。 -[Here]({{ site.baseurl }}/tutorials/local_setup.html) you can find out how to run an application outside the IDE on a local cluster. +在[这里]({{ site.baseurl }}/zh/tutorials/local_setup.html)，你可以找到如何在 IDE 之外的本地集群中运行应用程序。 Review comment: I think it can be assessed from this link : [https://ci.apache.org/projects/flink/flink-docs-master/zh/tutorials/local_setup.html](https://ci.apache.org/projects/flink/flink-docs-master/zh/tutorials/local_setup.

[GitHub] [flink] lirui-apache commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox

lirui-apache commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables URL: https://github.com/apache/flink/pull/8766#discussion_r295653875 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSink.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.batch.connectors.hive; + +import org.apache.flink.api.common.io.OutputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.catalog.exceptions.CatalogException; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientFactory; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper; +import org.apache.flink.table.catalog.hive.client.HiveShimLoader; +import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator; +import org.apache.flink.table.sinks.OutputFormatTableSink; +import org.apache.flink.table.sinks.TableSink; +import org.apache.flink.types.Row; +import org.apache.flink.util.FlinkRuntimeException; +import org.apache.flink.util.Preconditions; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.MetaStoreUtils; +import org.apache.hadoop.hive.metastore.Warehouse; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.mapred.JobConf; +import org.apache.thrift.TException; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; + +/** + * Table sink to write to Hive tables. + */ +public class HiveTableSink extends OutputFormatTableSink { + + private final JobConf jobConf; + private final RowTypeInfo rowTypeInfo; + private final String dbName; + private final String tableName; + private final List partitionColumns; + private final String hiveVersion; + + // TODO: need OverwritableTableSink to configure this + private boolean overwrite = false; + + public HiveTableSink(JobConf jobConf, RowTypeInfo rowTypeInfo, String dbName, String tableName, + List partitionColumns) { + this.jobConf = jobConf; + this.rowTypeInfo = rowTypeInfo; + this.dbName = dbName; + this.tableName = tableName; + this.partitionColumns = partitionColumns; + hiveVersion = jobConf.get(HiveCatalogValidator.CATALOG_HIVE_VERSION, HiveShimLoader.getHiveVersion()); Review comment: Could you explain what's the benefit of doing that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] lirui-apache commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox

lirui-apache commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables URL: https://github.com/apache/flink/pull/8766#discussion_r295653711 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSink.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.batch.connectors.hive; + +import org.apache.flink.api.common.io.OutputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.catalog.exceptions.CatalogException; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientFactory; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper; +import org.apache.flink.table.catalog.hive.client.HiveShimLoader; +import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator; +import org.apache.flink.table.sinks.OutputFormatTableSink; +import org.apache.flink.table.sinks.TableSink; +import org.apache.flink.types.Row; +import org.apache.flink.util.FlinkRuntimeException; +import org.apache.flink.util.Preconditions; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.MetaStoreUtils; +import org.apache.hadoop.hive.metastore.Warehouse; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.mapred.JobConf; +import org.apache.thrift.TException; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; + +/** + * Table sink to write to Hive tables. + */ +public class HiveTableSink extends OutputFormatTableSink { + + private final JobConf jobConf; + private final RowTypeInfo rowTypeInfo; + private final String dbName; + private final String tableName; + private final List partitionColumns; + private final String hiveVersion; + + // TODO: need OverwritableTableSink to configure this + private boolean overwrite = false; Review comment: As the comment said, `overwrite ` is intended to be configured via the `OverwritableTableSink ` interface. Therefore it shouldn't be initialized in the constructor. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#discussion_r295653024 ## File path: docs/dev/projectsetup/java_api_quickstart.zh.md ## @@ -304,51 +297,46 @@ quickstart/ └── log4j.properties {% endhighlight %} -The sample project is a __Gradle project__, which contains two classes: _StreamingJob_ and _BatchJob_ are the basic skeleton programs for a *DataStream* and *DataSet* program. -The _main_ method is the entry point of the program, both for in-IDE testing/execution and for proper deployments. +示例项目是一个 __Gradle 项目__，它包含了两个类：_StreamingJob_ 和 _BatchJob_ 是 *DataStream* 和 *DataSet* 程序的基础骨架程序。 +_main_ 方法是程序的入口，即可用于IDE测试/执行，也可用于部署。 -We recommend you __import this project into your IDE__ to develop and -test it. IntelliJ IDEA supports Gradle projects after installing the `Gradle` plugin. -Eclipse does so via the [Eclipse Buildship](https://projects.eclipse.org/projects/tools.buildship) plugin -(make sure to specify a Gradle version >= 3.0 in the last step of the import wizard; the `shadow` plugin requires it). -You may also use [Gradle's IDE integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration) -to create project files from Gradle. +我们建议你将 __此项目导入你的 IDE__ 来开发和测试它。 +IntelliJ IDEA 在安装 `Gradle` 插件后支持 Gradle 项目。Eclipse 则通过 [Eclipse Buildship](https://projects.eclipse +.org/projects/tools.buildship) 插件支持 Gradle 项目（鉴于 `shadow` 插件对 Gradle 版本有要求，请确保在导入向导的最后一步指定 Gradle 版本 >= 3.0）。 +你也可以使用 [Gradle’s IDE integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration) 从 Gradle +创建项目文件。 -*Please note*: The default JVM heapsize for Java may be too -small for Flink. You have to manually increase it. -In Eclipse, choose `Run Configurations -> Arguments` and write into the `VM Arguments` box: `-Xmx800m`. -In IntelliJ IDEA recommended way to change JVM options is from the `Help | Edit Custom VM Options` menu. See [this article](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties) for details. +*请注意*：对 Flink 来说，默认的 JVM 堆内存可能太小，你应当手动增加堆内存。 +在 Eclipse中，选择 `Run Configurations -> Arguments` 并在 `VM Arguments` 对应的输入框中写入：`-Xmx800m`。 +在 IntelliJ IDEA 中，推荐从菜单 `Help | Edit Custom VM Options` 来修改 JVM 选项。有关详细信息，请参阅[此文章](https://intellij-support +.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties)。 -### Build Project +### 构建项目 -If you want to __build/package your project__, go to your project directory and -run the '`gradle clean shadowJar`' command. -You will __find a JAR file__ that contains your application, plus connectors and libraries -that you may have added as dependencies to the application: `build/libs/--all.jar`. +如果你想要 __构建/打包项目__，请在项目目录下运行 '`gradle clean shadowJar`' 命令。 +命令执行后，你将 __找到一个 JAR 文件__，里面包含了你的应用程序，以及已作为依赖项添加到应用程序的连接器和库：`build/libs/--all.jar`。 -__Note:__ If you use a different class than *StreamingJob* as the application's main class / entry point, -we recommend you change the `mainClassName` setting in the `build.gradle` file accordingly. That way, Flink -can run the application from the JAR file without additionally specifying the main class. +__注意：__ 如果你使用其他类而不是 *StreamingJob* 作为应用程序的主类/入口， +我们建议你相应地修改 `build.gradle` 文件中的 `mainClassName` 配置。 +这样，Flink 可以从 JAR 文件运行应用程序，而无需另外指定主类。 -## Next Steps +## 下一步 -Write your application! +开始编写应用！ -If you are writing a streaming application and you are looking for inspiration what to write, -take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/tutorials/datastream_api.html#writing-a-flink-program). +如果你准备编写流处理应用，正在寻找灵感来写什么， +可以看看[流处理应用程序教程]({{ site.baseurl }}/zh/tutorials/datastream_api.html#writing-a-flink-program) -If you are writing a batch processing application and you are looking for inspiration what to write, -take a look at the [Batch Application Examples]({{ site.baseurl }}/dev/batch/examples.html). +如果你准备编写批处理应用，正在寻找灵感来写什么， +可以看看[批处理应用程序示例]({{ site.baseurl }}/zh/dev/batch/examples.html) -For a complete overview over the APIs, have a look at the -[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and -[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections. +有关 API 的完整概述，请查看 +[DataStream API]({{ site.baseurl }}/zh/dev/datastream_api.html) 和 +[DataSet API]({{ site.baseurl }}/zh/dev/batch/index.html) 章节。 -[Here]({{ site.baseurl }}/tutorials/local_setup.html) you can find out how to run an application outside the IDE on a local cluster. +在[这里]({{ site.baseurl }}/zh/tutorials/local_setup.html)，你可以找到如何在 IDE 之外的本地集群中运行应用程序。 Review comment: Oh sorry, my fault. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[jira] [Closed] (FLINK-12856) Introduce planner rule to push projection into TableSource

2019-06-19 Thread Kurt Young (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Young closed FLINK-12856. -- Resolution: Implemented Fix Version/s: 1.9.0 merged in 1.9.0: 800fe61cb6074eed0311abd4634d71f5569451b5 > Introduce planner rule to push projection into TableSource > -- > > Key: FLINK-12856 > URL: https://issues.apache.org/jira/browse/FLINK-12856 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Planner >Reporter: godfrey he >Assignee: godfrey he >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This issue aims to support push projection into ProjectableTableSource or > NestedFieldsProjectableTableSource to reduce output fields of a TableSource -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[GitHub] [flink] KurtYoung merged pull request #8747: [FLINK-12856] [table-planner-blink] Introduce planner rule to push projection into TableSource

2019-06-19 Thread GitBox

KurtYoung merged pull request #8747: [FLINK-12856] [table-planner-blink] Introduce planner rule to push projection into TableSource URL: https://github.com/apache/flink/pull/8747 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] yuezhuangshi commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

yuezhuangshi commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#discussion_r295652737 ## File path: docs/dev/projectsetup/java_api_quickstart.zh.md ## @@ -304,51 +297,46 @@ quickstart/ └── log4j.properties {% endhighlight %} -The sample project is a __Gradle project__, which contains two classes: _StreamingJob_ and _BatchJob_ are the basic skeleton programs for a *DataStream* and *DataSet* program. -The _main_ method is the entry point of the program, both for in-IDE testing/execution and for proper deployments. +示例项目是一个 __Gradle 项目__，它包含了两个类：_StreamingJob_ 和 _BatchJob_ 是 *DataStream* 和 *DataSet* 程序的基础骨架程序。 +_main_ 方法是程序的入口，即可用于IDE测试/执行，也可用于部署。 -We recommend you __import this project into your IDE__ to develop and -test it. IntelliJ IDEA supports Gradle projects after installing the `Gradle` plugin. -Eclipse does so via the [Eclipse Buildship](https://projects.eclipse.org/projects/tools.buildship) plugin -(make sure to specify a Gradle version >= 3.0 in the last step of the import wizard; the `shadow` plugin requires it). -You may also use [Gradle's IDE integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration) -to create project files from Gradle. +我们建议你将 __此项目导入你的 IDE__ 来开发和测试它。 +IntelliJ IDEA 在安装 `Gradle` 插件后支持 Gradle 项目。Eclipse 则通过 [Eclipse Buildship](https://projects.eclipse +.org/projects/tools.buildship) 插件支持 Gradle 项目（鉴于 `shadow` 插件对 Gradle 版本有要求，请确保在导入向导的最后一步指定 Gradle 版本 >= 3.0）。 +你也可以使用 [Gradle’s IDE integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration) 从 Gradle +创建项目文件。 -*Please note*: The default JVM heapsize for Java may be too -small for Flink. You have to manually increase it. -In Eclipse, choose `Run Configurations -> Arguments` and write into the `VM Arguments` box: `-Xmx800m`. -In IntelliJ IDEA recommended way to change JVM options is from the `Help | Edit Custom VM Options` menu. See [this article](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties) for details. +*请注意*：对 Flink 来说，默认的 JVM 堆内存可能太小，你应当手动增加堆内存。 +在 Eclipse中，选择 `Run Configurations -> Arguments` 并在 `VM Arguments` 对应的输入框中写入：`-Xmx800m`。 +在 IntelliJ IDEA 中，推荐从菜单 `Help | Edit Custom VM Options` 来修改 JVM 选项。有关详细信息，请参阅[此文章](https://intellij-support +.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties)。 -### Build Project +### 构建项目 -If you want to __build/package your project__, go to your project directory and -run the '`gradle clean shadowJar`' command. -You will __find a JAR file__ that contains your application, plus connectors and libraries -that you may have added as dependencies to the application: `build/libs/--all.jar`. +如果你想要 __构建/打包项目__，请在项目目录下运行 '`gradle clean shadowJar`' 命令。 +命令执行后，你将 __找到一个 JAR 文件__，里面包含了你的应用程序，以及已作为依赖项添加到应用程序的连接器和库：`build/libs/--all.jar`。 -__Note:__ If you use a different class than *StreamingJob* as the application's main class / entry point, -we recommend you change the `mainClassName` setting in the `build.gradle` file accordingly. That way, Flink -can run the application from the JAR file without additionally specifying the main class. +__注意：__ 如果你使用其他类而不是 *StreamingJob* 作为应用程序的主类/入口， +我们建议你相应地修改 `build.gradle` 文件中的 `mainClassName` 配置。 +这样，Flink 可以从 JAR 文件运行应用程序，而无需另外指定主类。 -## Next Steps +## 下一步 -Write your application! +开始编写应用！ -If you are writing a streaming application and you are looking for inspiration what to write, -take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/tutorials/datastream_api.html#writing-a-flink-program). +如果你准备编写流处理应用，正在寻找灵感来写什么， +可以看看[流处理应用程序教程]({{ site.baseurl }}/zh/tutorials/datastream_api.html#writing-a-flink-program) -If you are writing a batch processing application and you are looking for inspiration what to write, -take a look at the [Batch Application Examples]({{ site.baseurl }}/dev/batch/examples.html). +如果你准备编写批处理应用，正在寻找灵感来写什么， +可以看看[批处理应用程序示例]({{ site.baseurl }}/zh/dev/batch/examples.html) -For a complete overview over the APIs, have a look at the -[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and -[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections. +有关 API 的完整概述，请查看 +[DataStream API]({{ site.baseurl }}/zh/dev/datastream_api.html) 和 +[DataSet API]({{ site.baseurl }}/zh/dev/batch/index.html) 章节。 -[Here]({{ site.baseurl }}/tutorials/local_setup.html) you can find out how to run an application outside the IDE on a local cluster. +在[这里]({{ site.baseurl }}/zh/tutorials/local_setup.html)，你可以找到如何在 IDE 之外的本地集群中运行应用程序。 Review comment: I think it can be assessed from this link : [https://ci.apache.org/projects/flink/flink-docs-master/zh/tutorials/local_setup.html](https://ci.apache.org/projects/flink/flink-docs-master/zh/tutorials/local_setup.

[jira] [Comment Edited] (FLINK-12849) Add support for build Python Docs in Buildbot

2019-06-19 Thread sunjincheng (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868272#comment-16868272 ] sunjincheng edited comment on FLINK-12849 at 6/20/19 5:47 AM: -- We successfully built PythonDocs: master: [https://ci.apache.org/builders/flink-docs-master/builds/1509] release-1.7: [https://ci.apache.org/builders/flink-docs-release-1.7/builds/204] then I'll close this JIRA. Thanks again for your help [~aljoscha] [~Zentol] :) was (Author: sunjincheng121): We successfully built PythonDocs， [https://ci.apache.org/builders/flink-docs-master/builds/1509] ，then I'll close this JIRA. Thanks again for your help [~aljoscha] [~Zentol] :) > Add support for build Python Docs in Buildbot > - > > Key: FLINK-12849 > URL: https://issues.apache.org/jira/browse/FLINK-12849 > Project: Flink > Issue Type: Sub-task > Components: API / Python, Build System >Affects Versions: 1.9.0 >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 1.9.0 > > Attachments: image-2019-06-14-16-14-35-439.png, > image-2019-06-20-13-45-40-876.png, python_docs.patch > > > We should add the Python Doc for Python API, and add the link to the web > page. i.e.: > !image-2019-06-14-16-14-35-439.png! > In FLINK-12720 we will add how to generate the Python Docs, and in this PR we > should add support for build Python Docs in Buildbot. We may need to modify > the build config: > [https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf] > > the Wiki of how to change the Buildbot code: > [https://cwiki.apache.org/confluence/display/FLINK/Managing+Flink+Documentation] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Comment Edited] (FLINK-12849) Add support for build Python Docs in Buildbot

2019-06-19 Thread sunjincheng (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868273#comment-16868273 ] sunjincheng edited comment on FLINK-12849 at 6/20/19 5:45 AM: -- Fixed in the SVN, the change code is as follows: !image-2019-06-20-13-45-40-876.png! was (Author: sunjincheng121): Fixed in the SVN, the change code is as follows: !image-2019-06-20-13-45-05-090.png! > Add support for build Python Docs in Buildbot > - > > Key: FLINK-12849 > URL: https://issues.apache.org/jira/browse/FLINK-12849 > Project: Flink > Issue Type: Sub-task > Components: API / Python, Build System >Affects Versions: 1.9.0 >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 1.9.0 > > Attachments: image-2019-06-14-16-14-35-439.png, > image-2019-06-20-13-45-40-876.png, python_docs.patch > > > We should add the Python Doc for Python API, and add the link to the web > page. i.e.: > !image-2019-06-14-16-14-35-439.png! > In FLINK-12720 we will add how to generate the Python Docs, and in this PR we > should add support for build Python Docs in Buildbot. We may need to modify > the build config: > [https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf] > > the Wiki of how to change the Buildbot code: > [https://cwiki.apache.org/confluence/display/FLINK/Managing+Flink+Documentation] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Closed] (FLINK-12849) Add support for build Python Docs in Buildbot

2019-06-19 Thread sunjincheng (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng closed FLINK-12849. --- Resolution: Fixed Fix Version/s: 1.9.0 Fixed in the SVN, the change code is as follows: !image-2019-06-20-13-45-05-090.png! > Add support for build Python Docs in Buildbot > - > > Key: FLINK-12849 > URL: https://issues.apache.org/jira/browse/FLINK-12849 > Project: Flink > Issue Type: Sub-task > Components: API / Python, Build System >Affects Versions: 1.9.0 >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 1.9.0 > > Attachments: image-2019-06-14-16-14-35-439.png, > image-2019-06-20-13-45-40-876.png, python_docs.patch > > > We should add the Python Doc for Python API, and add the link to the web > page. i.e.: > !image-2019-06-14-16-14-35-439.png! > In FLINK-12720 we will add how to generate the Python Docs, and in this PR we > should add support for build Python Docs in Buildbot. We may need to modify > the build config: > [https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf] > > the Wiki of how to change the Buildbot code: > [https://cwiki.apache.org/confluence/display/FLINK/Managing+Flink+Documentation] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[GitHub] [flink] zhijiangW commented on issue #8761: [FLINK-12842][network] Fix invalid check released state during ResultPartition#createSubpartitionView

2019-06-19 Thread GitBox

zhijiangW commented on issue #8761: [FLINK-12842][network] Fix invalid check released state during ResultPartition#createSubpartitionView URL: https://github.com/apache/flink/pull/8761#issuecomment-503884044 Thanks @zentol , it has already green. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[jira] [Commented] (FLINK-12849) Add support for build Python Docs in Buildbot

2019-06-19 Thread sunjincheng (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868272#comment-16868272 ] sunjincheng commented on FLINK-12849: - We successfully built PythonDocs， [https://ci.apache.org/builders/flink-docs-master/builds/1509] ，then I'll close this JIRA. Thanks again for your help [~aljoscha] [~Zentol] :) > Add support for build Python Docs in Buildbot > - > > Key: FLINK-12849 > URL: https://issues.apache.org/jira/browse/FLINK-12849 > Project: Flink > Issue Type: Sub-task > Components: API / Python, Build System >Affects Versions: 1.9.0 >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Attachments: image-2019-06-14-16-14-35-439.png, python_docs.patch > > > We should add the Python Doc for Python API, and add the link to the web > page. i.e.: > !image-2019-06-14-16-14-35-439.png! > In FLINK-12720 we will add how to generate the Python Docs, and in this PR we > should add support for build Python Docs in Buildbot. We may need to modify > the build config: > [https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf] > > the Wiki of how to change the Buildbot code: > [https://cwiki.apache.org/confluence/display/FLINK/Managing+Flink+Documentation] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[GitHub] [flink] synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox

synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables URL: https://github.com/apache/flink/pull/8766#discussion_r295649613 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSink.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.batch.connectors.hive; + +import org.apache.flink.api.common.io.OutputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.catalog.exceptions.CatalogException; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientFactory; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper; +import org.apache.flink.table.catalog.hive.client.HiveShimLoader; +import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator; +import org.apache.flink.table.sinks.OutputFormatTableSink; +import org.apache.flink.table.sinks.TableSink; +import org.apache.flink.types.Row; +import org.apache.flink.util.FlinkRuntimeException; +import org.apache.flink.util.Preconditions; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.MetaStoreUtils; +import org.apache.hadoop.hive.metastore.Warehouse; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.mapred.JobConf; +import org.apache.thrift.TException; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; + +/** + * Table sink to write to Hive tables. + */ +public class HiveTableSink extends OutputFormatTableSink { + + private final JobConf jobConf; + private final RowTypeInfo rowTypeInfo; + private final String dbName; + private final String tableName; + private final List partitionColumns; + private final String hiveVersion; + + // TODO: need OverwritableTableSink to configure this + private boolean overwrite = false; + + public HiveTableSink(JobConf jobConf, RowTypeInfo rowTypeInfo, String dbName, String tableName, + List partitionColumns) { + this.jobConf = jobConf; + this.rowTypeInfo = rowTypeInfo; + this.dbName = dbName; + this.tableName = tableName; + this.partitionColumns = partitionColumns; + hiveVersion = jobConf.get(HiveCatalogValidator.CATALOG_HIVE_VERSION, HiveShimLoader.getHiveVersion()); + } + + @Override + public OutputFormat getOutputFormat() { + boolean isPartitioned = partitionColumns != null && !partitionColumns.isEmpty(); + // TODO: need PartitionableTableSink to decide whether it's dynamic partitioning + boolean isDynamicPartition = isPartitioned; + try (HiveMetastoreClientWrapper client = HiveMetastoreClientFactory.create(new HiveConf(jobConf, HiveConf.class), hiveVersion)) { + Table table = client.getTable(dbName, tableName); + StorageDescriptor sd = table.getSd(); + // here we use the sdLocation to store the output path of the job, which is always a staging dir + String sdLocation = sd.getLocation(); + HiveTablePartition hiveTablePartition; + if (isPartitioned) { + // TODO: validate partition spec + // TODO: strip quotes in partition values + LinkedHashMap strippedPartSpec = new LinkedHashMap<>(); Review comment: Is there a reason to make strippedPartSpec `LinkedHashMap` not `Map`? This is an automated message from the Apache Git Service. To re

[GitHub] [flink] synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox

synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables URL: https://github.com/apache/flink/pull/8766#discussion_r295648808 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSink.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.batch.connectors.hive; + +import org.apache.flink.api.common.io.OutputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.catalog.exceptions.CatalogException; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientFactory; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper; +import org.apache.flink.table.catalog.hive.client.HiveShimLoader; +import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator; +import org.apache.flink.table.sinks.OutputFormatTableSink; +import org.apache.flink.table.sinks.TableSink; +import org.apache.flink.types.Row; +import org.apache.flink.util.FlinkRuntimeException; +import org.apache.flink.util.Preconditions; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.MetaStoreUtils; +import org.apache.hadoop.hive.metastore.Warehouse; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.mapred.JobConf; +import org.apache.thrift.TException; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; + +/** + * Table sink to write to Hive tables. + */ +public class HiveTableSink extends OutputFormatTableSink { + + private final JobConf jobConf; + private final RowTypeInfo rowTypeInfo; + private final String dbName; + private final String tableName; + private final List partitionColumns; + private final String hiveVersion; + + // TODO: need OverwritableTableSink to configure this + private boolean overwrite = false; Review comment: lint: would you move the initiation of `overwrite` to constructor? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox

synckey commented on a change in pull request #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables URL: https://github.com/apache/flink/pull/8766#discussion_r295648942 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableSink.java ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.batch.connectors.hive; + +import org.apache.flink.api.common.io.OutputFormat; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.table.catalog.exceptions.CatalogException; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientFactory; +import org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper; +import org.apache.flink.table.catalog.hive.client.HiveShimLoader; +import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator; +import org.apache.flink.table.sinks.OutputFormatTableSink; +import org.apache.flink.table.sinks.TableSink; +import org.apache.flink.types.Row; +import org.apache.flink.util.FlinkRuntimeException; +import org.apache.flink.util.Preconditions; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.MetaStoreUtils; +import org.apache.hadoop.hive.metastore.Warehouse; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.mapred.JobConf; +import org.apache.thrift.TException; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.LinkedHashMap; +import java.util.List; + +/** + * Table sink to write to Hive tables. + */ +public class HiveTableSink extends OutputFormatTableSink { + + private final JobConf jobConf; + private final RowTypeInfo rowTypeInfo; + private final String dbName; + private final String tableName; + private final List partitionColumns; + private final String hiveVersion; + + // TODO: need OverwritableTableSink to configure this + private boolean overwrite = false; + + public HiveTableSink(JobConf jobConf, RowTypeInfo rowTypeInfo, String dbName, String tableName, + List partitionColumns) { + this.jobConf = jobConf; + this.rowTypeInfo = rowTypeInfo; + this.dbName = dbName; + this.tableName = tableName; + this.partitionColumns = partitionColumns; + hiveVersion = jobConf.get(HiveCatalogValidator.CATALOG_HIVE_VERSION, HiveShimLoader.getHiveVersion()); Review comment: lint: would you also add a `this` before `hiveVersion`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#discussion_r295647061 ## File path: docs/dev/projectsetup/java_api_quickstart.zh.md ## @@ -304,51 +297,46 @@ quickstart/ └── log4j.properties {% endhighlight %} -The sample project is a __Gradle project__, which contains two classes: _StreamingJob_ and _BatchJob_ are the basic skeleton programs for a *DataStream* and *DataSet* program. -The _main_ method is the entry point of the program, both for in-IDE testing/execution and for proper deployments. +示例项目是一个 __Gradle 项目__，它包含了两个类：_StreamingJob_ 和 _BatchJob_ 是 *DataStream* 和 *DataSet* 程序的基础骨架程序。 +_main_ 方法是程序的入口，即可用于IDE测试/执行，也可用于部署。 -We recommend you __import this project into your IDE__ to develop and -test it. IntelliJ IDEA supports Gradle projects after installing the `Gradle` plugin. -Eclipse does so via the [Eclipse Buildship](https://projects.eclipse.org/projects/tools.buildship) plugin -(make sure to specify a Gradle version >= 3.0 in the last step of the import wizard; the `shadow` plugin requires it). -You may also use [Gradle's IDE integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration) -to create project files from Gradle. +我们建议你将 __此项目导入你的 IDE__ 来开发和测试它。 +IntelliJ IDEA 在安装 `Gradle` 插件后支持 Gradle 项目。Eclipse 则通过 [Eclipse Buildship](https://projects.eclipse +.org/projects/tools.buildship) 插件支持 Gradle 项目（鉴于 `shadow` 插件对 Gradle 版本有要求，请确保在导入向导的最后一步指定 Gradle 版本 >= 3.0）。 +你也可以使用 [Gradle’s IDE integration](https://docs.gradle.org/current/userguide/userguide.html#ide-integration) 从 Gradle +创建项目文件。 -*Please note*: The default JVM heapsize for Java may be too -small for Flink. You have to manually increase it. -In Eclipse, choose `Run Configurations -> Arguments` and write into the `VM Arguments` box: `-Xmx800m`. -In IntelliJ IDEA recommended way to change JVM options is from the `Help | Edit Custom VM Options` menu. See [this article](https://intellij-support.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties) for details. +*请注意*：对 Flink 来说，默认的 JVM 堆内存可能太小，你应当手动增加堆内存。 +在 Eclipse中，选择 `Run Configurations -> Arguments` 并在 `VM Arguments` 对应的输入框中写入：`-Xmx800m`。 +在 IntelliJ IDEA 中，推荐从菜单 `Help | Edit Custom VM Options` 来修改 JVM 选项。有关详细信息，请参阅[此文章](https://intellij-support +.jetbrains.com/hc/en-us/articles/206544869-Configuring-JVM-options-and-platform-properties)。 -### Build Project +### 构建项目 -If you want to __build/package your project__, go to your project directory and -run the '`gradle clean shadowJar`' command. -You will __find a JAR file__ that contains your application, plus connectors and libraries -that you may have added as dependencies to the application: `build/libs/--all.jar`. +如果你想要 __构建/打包项目__，请在项目目录下运行 '`gradle clean shadowJar`' 命令。 +命令执行后，你将 __找到一个 JAR 文件__，里面包含了你的应用程序，以及已作为依赖项添加到应用程序的连接器和库：`build/libs/--all.jar`。 -__Note:__ If you use a different class than *StreamingJob* as the application's main class / entry point, -we recommend you change the `mainClassName` setting in the `build.gradle` file accordingly. That way, Flink -can run the application from the JAR file without additionally specifying the main class. +__注意：__ 如果你使用其他类而不是 *StreamingJob* 作为应用程序的主类/入口， +我们建议你相应地修改 `build.gradle` 文件中的 `mainClassName` 配置。 +这样，Flink 可以从 JAR 文件运行应用程序，而无需另外指定主类。 -## Next Steps +## 下一步 -Write your application! +开始编写应用！ -If you are writing a streaming application and you are looking for inspiration what to write, -take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/tutorials/datastream_api.html#writing-a-flink-program). +如果你准备编写流处理应用，正在寻找灵感来写什么， +可以看看[流处理应用程序教程]({{ site.baseurl }}/zh/tutorials/datastream_api.html#writing-a-flink-program) -If you are writing a batch processing application and you are looking for inspiration what to write, -take a look at the [Batch Application Examples]({{ site.baseurl }}/dev/batch/examples.html). +如果你准备编写批处理应用，正在寻找灵感来写什么， +可以看看[批处理应用程序示例]({{ site.baseurl }}/zh/dev/batch/examples.html) -For a complete overview over the APIs, have a look at the -[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and -[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections. +有关 API 的完整概述，请查看 +[DataStream API]({{ site.baseurl }}/zh/dev/datastream_api.html) 和 +[DataSet API]({{ site.baseurl }}/zh/dev/batch/index.html) 章节。 -[Here]({{ site.baseurl }}/tutorials/local_setup.html) you can find out how to run an application outside the IDE on a local cluster. +在[这里]({{ site.baseurl }}/zh/tutorials/local_setup.html)，你可以找到如何在 IDE 之外的本地集群中运行应用程序。 Review comment: Looks like this link https://flink.apache.org/zh/tutorials/local_setup.html is dead. This is an automated message from the Apache Git Service. To resp

[GitHub] [flink] synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#discussion_r295646328 ## File path: docs/dev/projectsetup/java_api_quickstart.zh.md ## @@ -219,15 +215,15 @@ configurations { // declare the dependencies for your production and test code dependencies { // -- -// Compile-time dependencies that should NOT be part of the -// shadow jar and are provided in the lib folder of Flink +// 编译时依赖不应该包含在 shadow jar 中， +// 这些依赖会在 Flink 的 lib 目录中提供。 // -- compile "org.apache.flink:flink-java:${flinkVersion}" compile "org.apache.flink:flink-streaming-java_${scalaBinaryVersion}:${flinkVersion}" // -- -// Dependencies that should be part of the shadow jar, e.g. -// connectors. These must be in the flinkShadowJar configuration! +// 应该包含在 shadow jar 中的依赖，例如：连接器。 +// 这些必须在 flinkShadowJar 的配置中！ Review comment: What about `他们` instead of `他们`？ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

synckey commented on a change in pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#discussion_r295645437 ## File path: docs/dev/projectsetup/java_api_quickstart.zh.md ## @@ -74,16 +74,17 @@ Use one of the following commands to __create a project__: {% unless site.is_stable %} -Note: For Maven 3.0 or higher, it is no longer possible to specify the repository (-DarchetypeCatalog) via the command line. If you wish to use the snapshot repository, you need to add a repository entry to your settings.xml. For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html";>Maven official document +注意：对于 Maven 3.0 及更高版本，不再支持通过命令行指定仓库（-DarchetypeCatalog）。 Review comment: Would you remove `对于`？ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[jira] [Comment Edited] (FLINK-12886) Support container memory segment

2019-06-19 Thread Jingsong Lee (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868265#comment-16868265 ] Jingsong Lee edited comment on FLINK-12886 at 6/20/19 5:16 AM: --- Back to the original intention of Jira and consider option 2, what kind of code do you want to optimize? Maybe you can talk deeply to the detail code instead of introduce a new big and abstract something. Maybe just a little bit of code refactoring is needed to achieve good results. was (Author: lzljs3620320): Back to the original intention of Jira and consider option 2, what kind of code do you want to optimize? Maybe you can talk deeply to the detail code instead of introduce a new big and abstract something. Maybe just a little bit of code refactoring is needed to achieve good results. > Support container memory segment > > > Key: FLINK-12886 > URL: https://issues.apache.org/jira/browse/FLINK-12886 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Runtime >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Attachments: image-2019-06-18-17-59-42-136.png > > Time Spent: 10m > Remaining Estimate: 0h > > We observe that in many scenarios, the operations/algorithms are based on an > array of MemorySegment. These memory segments form a large, combined, and > continuous memory space. > For example, suppose we have an array of n memory segments. Memory addresses > from 0 to segment_size - 1 are served by the first memory segment; memory > addresses from segment_size to 2 * segment_size - 1 are served by the second > memory segment, and so on. > Specific algorithms decide the actual MemorySegment to serve the operation > requests. For some rare cases, two or more memory segments serve the > requests. There are many operations based on such a paradigm, for example, > {{BinaryString#matchAt}}, {{SegmentsUtil#copyToBytes}}, > {{LongHashPartition#MatchIterator#get}}, etc. > The problem is that, for memory segment array based operations, large amounts > of code is devoted to > 1. Computing the memory segment index & offset within the memory segment. > 2. Processing boundary cases. For example, to write an integer, there are > only 2 bytes left in the first memory segment, and the remaining 2 bytes must > be written to the next memory segment. > 3. Differentiate processing for short/long data. For example, when copying > memory data to a byte array. Different methods are implemented for cases when > 1) the data fits in a single segment; 2) the data spans multiple segments. > Therefore, there are much duplicated code to achieve above purposes. What is > worse, this paradigm significantly increases the amount of code, making the > code more difficult to read and maintain. Furthermore, it easily gives rise > to bugs which difficult to find and debug. > To address these problems, we propose a new type of memory segment: > {{ContainerMemorySegment}}. It is based on an array of underlying memory > segments with the same size. It extends from the {{MemorySegment}} base > class, so it provides all the functionalities provided by {{MemorySegment}}. > In addition, it hides all the details for dealing with specific memory > segments, and acts as if it were a big continuous memory region. > A prototype implementation is given below: > !image-2019-06-18-17-59-42-136.png|thumbnail! > With this new type of memory segment, many operations/algorithms can be > greatly simplified, without affecting performance. This is because, > 1. Many checks, boundary processing are already there. We just move them to > the new class. > 2. We optimize the implementation of the new class, so the special > optimizations (e.g. optimizations for short data) are still preserved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Commented] (FLINK-12886) Support container memory segment

2019-06-19 Thread Jingsong Lee (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868265#comment-16868265 ] Jingsong Lee commented on FLINK-12886: -- Back to the original intention of Jira and consider option 2, what kind of code do you want to optimize? Maybe you can talk deeply to the detail code instead of introduce a new big and abstract something. Maybe just a little bit of code refactoring is needed to achieve good results. > Support container memory segment > > > Key: FLINK-12886 > URL: https://issues.apache.org/jira/browse/FLINK-12886 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Runtime >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Attachments: image-2019-06-18-17-59-42-136.png > > Time Spent: 10m > Remaining Estimate: 0h > > We observe that in many scenarios, the operations/algorithms are based on an > array of MemorySegment. These memory segments form a large, combined, and > continuous memory space. > For example, suppose we have an array of n memory segments. Memory addresses > from 0 to segment_size - 1 are served by the first memory segment; memory > addresses from segment_size to 2 * segment_size - 1 are served by the second > memory segment, and so on. > Specific algorithms decide the actual MemorySegment to serve the operation > requests. For some rare cases, two or more memory segments serve the > requests. There are many operations based on such a paradigm, for example, > {{BinaryString#matchAt}}, {{SegmentsUtil#copyToBytes}}, > {{LongHashPartition#MatchIterator#get}}, etc. > The problem is that, for memory segment array based operations, large amounts > of code is devoted to > 1. Computing the memory segment index & offset within the memory segment. > 2. Processing boundary cases. For example, to write an integer, there are > only 2 bytes left in the first memory segment, and the remaining 2 bytes must > be written to the next memory segment. > 3. Differentiate processing for short/long data. For example, when copying > memory data to a byte array. Different methods are implemented for cases when > 1) the data fits in a single segment; 2) the data spans multiple segments. > Therefore, there are much duplicated code to achieve above purposes. What is > worse, this paradigm significantly increases the amount of code, making the > code more difficult to read and maintain. Furthermore, it easily gives rise > to bugs which difficult to find and debug. > To address these problems, we propose a new type of memory segment: > {{ContainerMemorySegment}}. It is based on an array of underlying memory > segments with the same size. It extends from the {{MemorySegment}} base > class, so it provides all the functionalities provided by {{MemorySegment}}. > In addition, it hides all the details for dealing with specific memory > segments, and acts as if it were a big continuous memory region. > A prototype implementation is given below: > !image-2019-06-18-17-59-42-136.png|thumbnail! > With this new type of memory segment, many operations/algorithms can be > greatly simplified, without affecting performance. This is because, > 1. Many checks, boundary processing are already there. We just move them to > the new class. > 2. We optimize the implementation of the new class, so the special > optimizations (e.g. optimizations for short data) are still preserved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Commented] (FLINK-12886) Support container memory segment

2019-06-19 Thread Kurt Young (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868254#comment-16868254 ] Kurt Young commented on FLINK-12886: Generally speak I would vote for option 2, but let's first decide whether it's worthy to have a new utility class. > Support container memory segment > > > Key: FLINK-12886 > URL: https://issues.apache.org/jira/browse/FLINK-12886 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Runtime >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Attachments: image-2019-06-18-17-59-42-136.png > > Time Spent: 10m > Remaining Estimate: 0h > > We observe that in many scenarios, the operations/algorithms are based on an > array of MemorySegment. These memory segments form a large, combined, and > continuous memory space. > For example, suppose we have an array of n memory segments. Memory addresses > from 0 to segment_size - 1 are served by the first memory segment; memory > addresses from segment_size to 2 * segment_size - 1 are served by the second > memory segment, and so on. > Specific algorithms decide the actual MemorySegment to serve the operation > requests. For some rare cases, two or more memory segments serve the > requests. There are many operations based on such a paradigm, for example, > {{BinaryString#matchAt}}, {{SegmentsUtil#copyToBytes}}, > {{LongHashPartition#MatchIterator#get}}, etc. > The problem is that, for memory segment array based operations, large amounts > of code is devoted to > 1. Computing the memory segment index & offset within the memory segment. > 2. Processing boundary cases. For example, to write an integer, there are > only 2 bytes left in the first memory segment, and the remaining 2 bytes must > be written to the next memory segment. > 3. Differentiate processing for short/long data. For example, when copying > memory data to a byte array. Different methods are implemented for cases when > 1) the data fits in a single segment; 2) the data spans multiple segments. > Therefore, there are much duplicated code to achieve above purposes. What is > worse, this paradigm significantly increases the amount of code, making the > code more difficult to read and maintain. Furthermore, it easily gives rise > to bugs which difficult to find and debug. > To address these problems, we propose a new type of memory segment: > {{ContainerMemorySegment}}. It is based on an array of underlying memory > segments with the same size. It extends from the {{MemorySegment}} base > class, so it provides all the functionalities provided by {{MemorySegment}}. > In addition, it hides all the details for dealing with specific memory > segments, and acts as if it were a big continuous memory region. > A prototype implementation is given below: > !image-2019-06-18-17-59-42-136.png|thumbnail! > With this new type of memory segment, many operations/algorithms can be > greatly simplified, without affecting performance. This is because, > 1. Many checks, boundary processing are already there. We just move them to > the new class. > 2. We optimize the implementation of the new class, so the special > optimizations (e.g. optimizations for short data) are still preserved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Commented] (FLINK-12886) Support container memory segment

2019-06-19 Thread Liya Fan (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868251#comment-16868251 ] Liya Fan commented on FLINK-12886: -- [~ykt836] [~lzljs3620320], two ideas to resolve the performance degradation. Would you please give some comments? 1. Let ContainerMemorySegment and MemorySegment extends a common super interface, which defines the basic operations for accessing data: public interface MemoryAccessible { public int getInt(int index); public void setInt(int index, int value); ... } public class MemorySegment implements MemoryAccessible ... public class ContainerMemorySegment implements MemoryAccessible ... For this method, the MemorySegment class hierarchy is unaffected, so code that depends on MemorySegment does not have performance affected. In addition, the code that expects a MemoryAccessible can accept both a MemorySegment and a ContainerMemorySegment. 2. ContainerMemorySegment no longer inherits from MemorySegment. In this way, ContainerMemorySegment just acts a wrapper for a set of MemorySegment. So wherever a MemorySegment is expected, a ContainerMemorySegment cannot be provided. Also, ContainerMemorySegment can be moved to module blink-runtime, because it is not a general MemorySegment. > Support container memory segment > > > Key: FLINK-12886 > URL: https://issues.apache.org/jira/browse/FLINK-12886 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Runtime >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Attachments: image-2019-06-18-17-59-42-136.png > > Time Spent: 10m > Remaining Estimate: 0h > > We observe that in many scenarios, the operations/algorithms are based on an > array of MemorySegment. These memory segments form a large, combined, and > continuous memory space. > For example, suppose we have an array of n memory segments. Memory addresses > from 0 to segment_size - 1 are served by the first memory segment; memory > addresses from segment_size to 2 * segment_size - 1 are served by the second > memory segment, and so on. > Specific algorithms decide the actual MemorySegment to serve the operation > requests. For some rare cases, two or more memory segments serve the > requests. There are many operations based on such a paradigm, for example, > {{BinaryString#matchAt}}, {{SegmentsUtil#copyToBytes}}, > {{LongHashPartition#MatchIterator#get}}, etc. > The problem is that, for memory segment array based operations, large amounts > of code is devoted to > 1. Computing the memory segment index & offset within the memory segment. > 2. Processing boundary cases. For example, to write an integer, there are > only 2 bytes left in the first memory segment, and the remaining 2 bytes must > be written to the next memory segment. > 3. Differentiate processing for short/long data. For example, when copying > memory data to a byte array. Different methods are implemented for cases when > 1) the data fits in a single segment; 2) the data spans multiple segments. > Therefore, there are much duplicated code to achieve above purposes. What is > worse, this paradigm significantly increases the amount of code, making the > code more difficult to read and maintain. Furthermore, it easily gives rise > to bugs which difficult to find and debug. > To address these problems, we propose a new type of memory segment: > {{ContainerMemorySegment}}. It is based on an array of underlying memory > segments with the same size. It extends from the {{MemorySegment}} base > class, so it provides all the functionalities provided by {{MemorySegment}}. > In addition, it hides all the details for dealing with specific memory > segments, and acts as if it were a big continuous memory region. > A prototype implementation is given below: > !image-2019-06-18-17-59-42-136.png|thumbnail! > With this new type of memory segment, many operations/algorithms can be > greatly simplified, without affecting performance. This is because, > 1. Many checks, boundary processing are already there. We just move them to > the new class. > 2. We optimize the implementation of the new class, so the special > optimizations (e.g. optimizations for short data) are still preserved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[GitHub] [flink] flinkbot commented on issue #8801: [hotfix][FLINK-12896][HistoryServer] modify :jobId key in TaskCheckpointStatisticDetailsHandler

2019-06-19 Thread GitBox

flinkbot commented on issue #8801: [hotfix][FLINK-12896][HistoryServer] modify :jobId key in TaskCheckpointStatisticDetailsHandler URL: https://github.com/apache/flink/pull/8801#issuecomment-503835553 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[jira] [Updated] (FLINK-12896) Missing some information in History Server

2019-06-19 Thread ASF GitHub Bot (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12896: --- Labels: pull-request-available (was: ) > Missing some information in History Server > -- > > Key: FLINK-12896 > URL: https://issues.apache.org/jira/browse/FLINK-12896 > Project: Flink > Issue Type: Bug >Reporter: xymaqingxiang >Priority: Major > Labels: pull-request-available > Attachments: image-2019-06-19-15-32-15-994.png, > image-2019-06-19-15-33-12-243.png, image-2019-06-19-15-40-08-481.png, > image-2019-06-19-15-41-48-051.png > > > There are three bugs, as follows: > 1. could not found the metrics for vertices. > !image-2019-06-19-15-33-12-243.png! > 2. could not found the checkpoint details for subtasks. > !image-2019-06-19-15-32-15-994.png! > 3. The jobs directory has an exception: job directory, the ArchivedJson we > get in FsJobArchivist is wrong. > !image-2019-06-19-15-40-08-481.png! > !image-2019-06-19-15-41-48-051.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[GitHub] [flink] maqingxiang opened a new pull request #8801: [hotfix][FLINK-12896][HistoryServer] modify :jobId key in TaskCheckpointStatisticDetailsHandler

2019-06-19 Thread GitBox

maqingxiang opened a new pull request #8801: [hotfix][FLINK-12896][HistoryServer] modify :jobId key in TaskCheckpointStatisticDetailsHandler URL: https://github.com/apache/flink/pull/8801 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] bowenli86 commented on issue #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox

bowenli86 commented on issue #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables URL: https://github.com/apache/flink/pull/8766#issuecomment-503830225 @lirui-apache Thanks for your contribution. Will merge once the CI passes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] flinkbot commented on issue #8800: [FLINK-12627][doc][sql client][hive] Document how to configure and use catalogs in SQL CLI

2019-06-19 Thread GitBox

flinkbot commented on issue #8800: [FLINK-12627][doc][sql client][hive] Document how to configure and use catalogs in SQL CLI URL: https://github.com/apache/flink/pull/8800#issuecomment-503830112 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] bowenli86 opened a new pull request #8800: [FLINK-12627][doc][sql client][hive] Document how to configure and use catalogs in SQL CLI

2019-06-19 Thread GitBox

bowenli86 opened a new pull request #8800: [FLINK-12627][doc][sql client][hive] Document how to configure and use catalogs in SQL CLI URL: https://github.com/apache/flink/pull/8800 ## What is the purpose of the change This PR adds English doc for configuring catalogs in SQL CLI. Chinese doc is in [FLINK-12894](https://issues.apache.org/jira/browse/FLINK-12894). ## Brief change log - adds document for configuring catalogs in SQL CLI. ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (docs) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[jira] [Updated] (FLINK-12627) Document how to configure and use catalogs in SQL CLI

2019-06-19 Thread ASF GitHub Bot (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12627: --- Labels: pull-request-available (was: ) > Document how to configure and use catalogs in SQL CLI > - > > Key: FLINK-12627 > URL: https://issues.apache.org/jira/browse/FLINK-12627 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive, Documentation, Table SQL / Client >Reporter: Bowen Li >Assignee: Bowen Li >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Commented] (FLINK-12847) Update Kinesis Connectors to latest Apache licensed libraries

2019-06-19 Thread Bowen Li (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868227#comment-16868227 ] Bowen Li commented on FLINK-12847: -- [~dyanarose] Thanks for your contribution! I'm currently busy with other features in release 1.9, and also given that this work heavily relies on licenses updates from AWS side, I won't be able to review the changes before licenses of all kinesis connector's dependencies have been updated to Apache 2.0 and their new releases are officially published. > Update Kinesis Connectors to latest Apache licensed libraries > - > > Key: FLINK-12847 > URL: https://issues.apache.org/jira/browse/FLINK-12847 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kinesis >Reporter: Dyana Rose >Assignee: Dyana Rose >Priority: Major > > Currently the referenced Kinesis Client Library and Kinesis Producer Library > code in the flink-connector-kinesis is licensed under the Amazon Software > License which is not compatible with the Apache License. This then requires a > fair amount of work in the CI pipeline and for users who want to use the > flink-connector-kinesis. > The Kinesis Client Library v2.x and the AWS Java SDK v2.x both are now on the > Apache 2.0 license. > [https://github.com/awslabs/amazon-kinesis-client/blob/master/LICENSE.txt] > [https://github.com/aws/aws-sdk-java-v2/blob/master/LICENSE.txt] > There is a PR for the Kinesis Producer Library to update it to the Apache 2.0 > license ([https://github.com/awslabs/amazon-kinesis-producer/pull/256]) > The task should include, but not limited to, upgrading KCL/KPL to new > versions of Apache 2.0 license, changing licenses and NOTICE files in > flink-connector-kinesis, and adding flink-connector-kinesis to build, CI and > artifact publishing pipeline, updating the build profiles, updating > documentation that references the license incompatibility > The expected outcome of this issue is that the flink-connector-kinesis will > be included with the standard build artifacts and will no longer need to be > built separately by users. -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Commented] (FLINK-12848) Method equals() in RowTypeInfo should consider fieldsNames

2019-06-19 Thread aloyszhang (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868224#comment-16868224 ] aloyszhang commented on FLINK-12848: Hi Enrico, Simple add the fieldNames to the equals method of RowTypeInfo is not safe. It will cause test failed in `ExternalCatalogInsertTest` because of some operator like `union` use equals in RowTypeInfo to determine whether the two input are the of same type. So I did not find a way to meet both tableEnv.scan() and union operator. And more , this problem does not appear in flink-1.9. > Method equals() in RowTypeInfo should consider fieldsNames > -- > > Key: FLINK-12848 > URL: https://issues.apache.org/jira/browse/FLINK-12848 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.7.2 >Reporter: aloyszhang >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Since the `RowTypeInfo#equals()` does not consider the fieldNames , when > process data with RowTypeInfo type there may comes an error of the field > name.   > {code:java} > String [] fields = new String []{"first", "second"}; > TypeInformation[] types = new TypeInformation[]{ > Types.ROW_NAMED(new String[]{"first001"}, Types.INT), > Types.ROW_NAMED(new String[]{"second002"}, Types.INT) }; > StreamExecutionEnvironment execEnv = > StreamExecutionEnvironment.getExecutionEnvironment(); > StreamTableEnvironment env = > StreamTableEnvironment.getTableEnvironment(execEnv); > SimpleProcessionTimeSource streamTableSource = new > SimpleProcessionTimeSource(fields, types); > env.registerTableSource("testSource", streamTableSource); > Table sourceTable = env.scan("testSource"); > System.out.println("Source table schema : "); > sourceTable.printSchema(); > {code} > The table shcema will be > {code:java} > Source table schema : > root > |-- first: Row(first001: Integer) > |-- second: Row(first001: Integer) > |-- timestamp: TimeIndicatorTypeInfo(proctime) > {code} > the second field has the same name with the first field. > So, we should consider the fieldnames in RowTypeInfo#equals() > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[GitHub] [flink] klion26 commented on issue #8797: [Docks] Checkpoints: fix typo

2019-06-19 Thread GitBox

klion26 commented on issue #8797: [Docks] Checkpoints: fix typo URL: https://github.com/apache/flink/pull/8797#issuecomment-503828052 @casidiablo thanks for your contribution, LGTM, could you please also update the `stream_checkpointing.zh.md` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] sunhaibotb commented on issue #8731: [FLINK-11878][runtime] Implement the runtime handling of BoundedOneInput and BoundedMultiInput

2019-06-19 Thread GitBox

sunhaibotb commented on issue #8731: [FLINK-11878][runtime] Implement the runtime handling of BoundedOneInput and BoundedMultiInput URL: https://github.com/apache/flink/pull/8731#issuecomment-503827844 The comments were addressed, and the PR has been updated. @pnowojski This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[jira] [Commented] (FLINK-12849) Add support for build Python Docs in Buildbot

2019-06-19 Thread sunjincheng (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868215#comment-16868215 ] sunjincheng commented on FLINK-12849: - Yes, I had added the file exists check, in Committed revision 1046594. But I found that there a  write permissions problem, I am checking it now, and will add new commit after fixed the bug. > Add support for build Python Docs in Buildbot > - > > Key: FLINK-12849 > URL: https://issues.apache.org/jira/browse/FLINK-12849 > Project: Flink > Issue Type: Sub-task > Components: API / Python, Build System >Affects Versions: 1.9.0 >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Attachments: image-2019-06-14-16-14-35-439.png, python_docs.patch > > > We should add the Python Doc for Python API, and add the link to the web > page. i.e.: > !image-2019-06-14-16-14-35-439.png! > In FLINK-12720 we will add how to generate the Python Docs, and in this PR we > should add support for build Python Docs in Buildbot. We may need to modify > the build config: > [https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf] > > the Wiki of how to change the Buildbot code: > [https://cwiki.apache.org/confluence/display/FLINK/Managing+Flink+Documentation] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Updated] (FLINK-11947) Support MapState value schema evolution for RocksDB

2019-06-19 Thread Tzu-Li (Gordon) Tai (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tzu-Li (Gordon) Tai updated FLINK-11947: Fix Version/s: 1.9.0 > Support MapState value schema evolution for RocksDB > --- > > Key: FLINK-11947 > URL: https://issues.apache.org/jira/browse/FLINK-11947 > Project: Flink > Issue Type: Improvement > Components: API / Type Serialization System, Runtime / State Backends >Reporter: Tzu-Li (Gordon) Tai >Assignee: Congxian Qiu(klion26) >Priority: Critical > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently, we do not attempt to perform state schema evolution if the key or > value's schema of a user {{MapState}} has changed when using {{RocksDB}}: > https://github.com/apache/flink/blob/953a5ffcbdae4115f7d525f310723cf8770779df/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L542 > This was disallowed in the initial support for state schema evolution because > the way we did state evolution in the RocksDB state backend was simply > overwriting values. > For {{MapState}} key evolution, only overwriting RocksDB values does not > work, since RocksDB entries for {{MapState}} uses a composite key containing > the map state key. This means that when evolving {{MapState}} in this case > with an evolved key schema, we will have new entries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Resolved] (FLINK-11869) [checkpoint] Make buffer size in checkpoint stream factory configurable

2019-06-19 Thread Tzu-Li (Gordon) Tai (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-11869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tzu-Li (Gordon) Tai resolved FLINK-11869. - Resolution: Fixed Merged for 1.9.0: bc5e8a77aa3419afc03c0751dd339e5027cf3664 > [checkpoint] Make buffer size in checkpoint stream factory configurable > --- > > Key: FLINK-11869 > URL: https://issues.apache.org/jira/browse/FLINK-11869 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: Yun Tang >Assignee: Yun Tang >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently, the default buffer size for {{FsCheckpointStateOutputStream}} is > only 4KB. This would case a lot of IOPS if stream is large. Unfortunately, > when user want to checkpoint on a totally disaggregated file system which has > no data node manager running in local machine, they might have a IOPS limit > or cannot serve too many IOPS at a time. This would cause the checkpoint > duration really large and might expire often. > If we want to increase this buffer size, we have to increase the > {{fileStateThreshold}} to indirectly increase the buffer size. However, as we > all know, too many not-so-small {{ByteStreamStateHandle}} returned to > checkpoint coordinator would easily cause job manager OOM and checkpoint meta > file large. > We should also make the buffer size configurable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Resolved] (FLINK-11947) Support MapState value schema evolution for RocksDB

2019-06-19 Thread Tzu-Li (Gordon) Tai (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tzu-Li (Gordon) Tai resolved FLINK-11947. - Resolution: Fixed > Support MapState value schema evolution for RocksDB > --- > > Key: FLINK-11947 > URL: https://issues.apache.org/jira/browse/FLINK-11947 > Project: Flink > Issue Type: Improvement > Components: API / Type Serialization System, Runtime / State Backends >Reporter: Tzu-Li (Gordon) Tai >Assignee: Congxian Qiu(klion26) >Priority: Critical > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently, we do not attempt to perform state schema evolution if the key or > value's schema of a user {{MapState}} has changed when using {{RocksDB}}: > https://github.com/apache/flink/blob/953a5ffcbdae4115f7d525f310723cf8770779df/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L542 > This was disallowed in the initial support for state schema evolution because > the way we did state evolution in the RocksDB state backend was simply > overwriting values. > For {{MapState}} key evolution, only overwriting RocksDB values does not > work, since RocksDB entries for {{MapState}} uses a composite key containing > the map state key. This means that when evolving {{MapState}} in this case > with an evolved key schema, we will have new entries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Commented] (FLINK-11947) Support MapState value schema evolution for RocksDB

2019-06-19 Thread Tzu-Li (Gordon) Tai (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868193#comment-16868193 ] Tzu-Li (Gordon) Tai commented on FLINK-11947: - Merged for 1.9.0: 829146d516751a592c3ab15908baebfd13429e8e > Support MapState value schema evolution for RocksDB > --- > > Key: FLINK-11947 > URL: https://issues.apache.org/jira/browse/FLINK-11947 > Project: Flink > Issue Type: Improvement > Components: API / Type Serialization System, Runtime / State Backends >Reporter: Tzu-Li (Gordon) Tai >Assignee: Congxian Qiu(klion26) >Priority: Critical > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently, we do not attempt to perform state schema evolution if the key or > value's schema of a user {{MapState}} has changed when using {{RocksDB}}: > https://github.com/apache/flink/blob/953a5ffcbdae4115f7d525f310723cf8770779df/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L542 > This was disallowed in the initial support for state schema evolution because > the way we did state evolution in the RocksDB state backend was simply > overwriting values. > For {{MapState}} key evolution, only overwriting RocksDB values does not > work, since RocksDB entries for {{MapState}} uses a composite key containing > the map state key. This means that when evolving {{MapState}} in this case > with an evolved key schema, we will have new entries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[GitHub] [flink] lirui-apache commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwrit

2019-06-19 Thread GitBox

lirui-apache commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and OverwritableTableSink URL: https://github.com/apache/flink/pull/8695#discussion_r295593755 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/sinks/PartitionableTableSink.java ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.sinks; + +import java.util.List; +import java.util.Map; + +/** + * An abstract class with trait about partitionable table sink. This is mainly used for Review comment: I guess we can remove that statement, since we'll support dynamic partitioning in Flink. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] lirui-apache commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and Overwrit

2019-06-19 Thread GitBox

lirui-apache commented on a change in pull request #8695: [FLINK-12805][FLINK-12808][FLINK-12809][table-api] Introduce PartitionableTableSource and PartitionableTableSink and OverwritableTableSink URL: https://github.com/apache/flink/pull/8695#discussion_r295595082 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/sinks/PartitionableTableSink.java ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.sinks; + +import java.util.List; +import java.util.Map; + +/** + * An abstract class with trait about partitionable table sink. This is mainly used for + * static partitions. For sql statement: + * + * + * INSERT INTO A PARTITION(a='ab', b='cd') select c, d from B + * + * + * We Assume the A has partition columns as , , . + * The columns and are called static partition columns, while c is called Review comment: I think we should give the definition of table `A` if we intend to offer a valid example. It's true the dynamic column should appear last, but the column names in `SELECT` don't have to be the same as the column names in the destination table -- so it's hard to tell w/o a DDL :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] flinkbot commented on issue #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

flinkbot commented on issue #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799#issuecomment-503816730 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[jira] [Updated] (FLINK-11612) Translate the "Project Template for Java" page into Chinese

2019-06-19 Thread ASF GitHub Bot (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-11612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-11612: --- Labels: pull-request-available (was: ) > Translate the "Project Template for Java" page into Chinese > --- > > Key: FLINK-11612 > URL: https://issues.apache.org/jira/browse/FLINK-11612 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Jark Wu >Assignee: Jasper Yue >Priority: Major > Labels: pull-request-available > > The page url is > https://ci.apache.org/projects/flink/flink-docs-master/dev/projectsetup/java_api_quickstart.html > The markdown file is located in > flink/docs/dev/projectsetup/java_api_quickstart.zh.md > The markdown file will be created once FLINK-11529 is merged. > You can reference the translation from : > https://github.com/flink-china/1.6.0/blob/master/quickstart/java_api_quickstart.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[GitHub] [flink] yuezhuangshi opened a new pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje…

2019-06-19 Thread GitBox

yuezhuangshi opened a new pull request #8799: [FLINK-11612][chinese-translation,Documentation] Translate the "Proje… URL: https://github.com/apache/flink/pull/8799 ## What is the purpose of the change This pull request completes the Chinese translation of "Project Template for Java" page from official document. ## Brief change log - *Translate the "Project Template for Java" page into Chinese* ## Verifying this change This change is to add a new translated document. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] leesf closed pull request #8159: [hotfix][runtime] fix error log description

2019-06-19 Thread GitBox

leesf closed pull request #8159: [hotfix][runtime] fix error log description URL: https://github.com/apache/flink/pull/8159 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] asfgit closed pull request #8686: [FLINK-11869] Make buffer size in checkpoint stream factory configurable

2019-06-19 Thread GitBox

asfgit closed pull request #8686: [FLINK-11869] Make buffer size in checkpoint stream factory configurable URL: https://github.com/apache/flink/pull/8686 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] asfgit closed pull request #8565: [FLINK-11947] Support MapState value schema evolution for RocksDB

2019-06-19 Thread GitBox

asfgit closed pull request #8565: [FLINK-11947] Support MapState value schema evolution for RocksDB URL: https://github.com/apache/flink/pull/8565 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[jira] [Commented] (FLINK-12846) Carry primary key and unique key information in TableSchema

2019-06-19 Thread Hequn Cheng (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868170#comment-16868170 ] Hequn Cheng commented on FLINK-12846: - +1 for adding the key information! Primary key and unique key are part of TableSchema and they are very helpful for the optimization. BTW, how are we going to add the key infos? We may also need to consider other information, like column nullable and computed column in order not to make the TableSchema becoming more and more mussy. What do you think? Best, Hequn > Carry primary key and unique key information in TableSchema > --- > > Key: FLINK-12846 > URL: https://issues.apache.org/jira/browse/FLINK-12846 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Jark Wu >Assignee: Jark Wu >Priority: Major > Fix For: 1.9.0 > > > The primary key and unique key is a standard meta information in SQL. And > they are important information for optimization, for example, > AggregateRemove, AggregateReduceGrouping and state layout optimization for > TopN and Join. > So in this issue, we want to extend {{TableSchema}} to carry more information > about primary key and unique keys. So that the TableSource can declare this > meta information. -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Assigned] (FLINK-11612) Translate the "Project Template for Java" page into Chinese

2019-06-19 Thread Jasper Yue (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-11612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jasper Yue reassigned FLINK-11612: -- Assignee: Jasper Yue (was: LakeShen) > Translate the "Project Template for Java" page into Chinese > --- > > Key: FLINK-11612 > URL: https://issues.apache.org/jira/browse/FLINK-11612 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Jark Wu >Assignee: Jasper Yue >Priority: Major > > The page url is > https://ci.apache.org/projects/flink/flink-docs-master/dev/projectsetup/java_api_quickstart.html > The markdown file is located in > flink/docs/dev/projectsetup/java_api_quickstart.zh.md > The markdown file will be created once FLINK-11529 is merged. > You can reference the translation from : > https://github.com/flink-china/1.6.0/blob/master/quickstart/java_api_quickstart.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Commented] (FLINK-11612) Translate the "Project Template for Java" page into Chinese

2019-06-19 Thread Jark Wu (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-11612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868158#comment-16868158 ] Jark Wu commented on FLINK-11612: - [~yuetongshu], sure, I think it's fine. > Translate the "Project Template for Java" page into Chinese > --- > > Key: FLINK-11612 > URL: https://issues.apache.org/jira/browse/FLINK-11612 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Jark Wu >Assignee: LakeShen >Priority: Major > > The page url is > https://ci.apache.org/projects/flink/flink-docs-master/dev/projectsetup/java_api_quickstart.html > The markdown file is located in > flink/docs/dev/projectsetup/java_api_quickstart.zh.md > The markdown file will be created once FLINK-11529 is merged. > You can reference the translation from : > https://github.com/flink-china/1.6.0/blob/master/quickstart/java_api_quickstart.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Closed] (FLINK-12907) flink-table-planner-blink fails to compile with scala 2.12

2019-06-19 Thread Jark Wu (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jark Wu closed FLINK-12907. --- Resolution: Fixed Assignee: Jark Wu Fixed in 1.9.0: 671ac182e514500c5f2b430877c6ac30b26e6ec7 > flink-table-planner-blink fails to compile with scala 2.12 > -- > > Key: FLINK-12907 > URL: https://issues.apache.org/jira/browse/FLINK-12907 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.9.0 >Reporter: Dawid Wysakowicz >Assignee: Jark Wu >Priority: Blocker > Fix For: 1.9.0 > > > {code} > 14:03:15.204 [ERROR] > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/plan/nodes/resource/ExecNodeResourceTest.scala:269: > error: overriding method getOutputType in trait TableSink of type > ()org.apache.flink.api.common.typeinfo.TypeInformation[org.apache.flink.table.dataformat.BaseRow]; > 14:03:15.204 [ERROR] method getOutputType needs `override' modifier > 14:03:15.204 [ERROR] @deprecated def getOutputType: > TypeInformation[BaseRow] = { > 14:03:15.204 [ERROR] ^ > 14:03:15.217 [ERROR] > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/plan/nodes/resource/ExecNodeResourceTest.scala:275: > error: overriding method getFieldNames in trait TableSink of type > ()Array[String]; > 14:03:15.217 [ERROR] method getFieldNames needs `override' modifier > 14:03:15.217 [ERROR] @deprecated def getFieldNames: Array[String] = > schema.getFieldNames > 14:03:15.217 [ERROR] ^ > 14:03:15.219 [ERROR] > /home/travis/build/apache/flink/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/plan/nodes/resource/ExecNodeResourceTest.scala:280: > error: overriding method getFieldTypes in trait TableSink of type > ()Array[org.apache.flink.api.common.typeinfo.TypeInformation[_]]; > 14:03:15.219 [ERROR] method getFieldTypes needs `override' modifier > 14:03:15.219 [ERROR] @deprecated def getFieldTypes: > Array[TypeInformation[_]] = schema.getFieldTypes > {code} > https://api.travis-ci.org/v3/job/547655787/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Closed] (FLINK-10348) Solve data skew when consuming data from kafka

2019-06-19 Thread Jiayi Liao (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Liao closed FLINK-10348. -- Resolution: Not A Problem > Solve data skew when consuming data from kafka > -- > > Key: FLINK-10348 > URL: https://issues.apache.org/jira/browse/FLINK-10348 > Project: Flink > Issue Type: New Feature > Components: Connectors / Kafka >Affects Versions: 1.6.0 >Reporter: Jiayi Liao >Assignee: Jiayi Liao >Priority: Major > > By using KafkaConsumer, our strategy is to send fetch request to brokers with > a fixed fetch size. Assume x topic has n partition and there exists data skew > between partitions, now we need to consume data from x topic with earliest > offset, and we can get max fetch size data in every fetch request. The > problem is that when an task consumes data from both "big" partitions and > "small" partitions, the data in "big" partitions may be late elements because > "small" partitions are consumed faster. > *Solution: * > I think we can leverage two parameters to control this. > 1. data.skew.check // whether to check data skew > 2. data.skew.check.interval // the interval between checks > Every data.skew.check.interval, we will check the latest offset of every > specific partition, and calculate (latest offset - current offset), then get > partitions which need to slow down and redefine their fetch size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[jira] [Commented] (FLINK-11612) Translate the "Project Template for Java" page into Chinese

2019-06-19 Thread Jasper Yue (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-11612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868141#comment-16868141 ] Jasper Yue commented on FLINK-11612: Hi [~jark], can I assign this issue to me? > Translate the "Project Template for Java" page into Chinese > --- > > Key: FLINK-11612 > URL: https://issues.apache.org/jira/browse/FLINK-11612 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Jark Wu >Assignee: LakeShen >Priority: Major > > The page url is > https://ci.apache.org/projects/flink/flink-docs-master/dev/projectsetup/java_api_quickstart.html > The markdown file is located in > flink/docs/dev/projectsetup/java_api_quickstart.zh.md > The markdown file will be created once FLINK-11529 is merged. > You can reference the translation from : > https://github.com/flink-china/1.6.0/blob/master/quickstart/java_api_quickstart.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[GitHub] [flink] flinkbot commented on issue #8798: [FLINK-12659][hive] Integrate Flink with Hive GenericUDTF

2019-06-19 Thread GitBox

flinkbot commented on issue #8798: [FLINK-12659][hive] Integrate Flink with Hive GenericUDTF URL: https://github.com/apache/flink/pull/8798#issuecomment-503786750 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] bowenli86 commented on issue #8798: [FLINK-12659][hive] Integrate Flink with Hive GenericUDTF

2019-06-19 Thread GitBox

bowenli86 commented on issue #8798: [FLINK-12659][hive] Integrate Flink with Hive GenericUDTF URL: https://github.com/apache/flink/pull/8798#issuecomment-503786615 cc @xuefuz @JingsongLi @lirui-apache @zjuwangg This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[jira] [Updated] (FLINK-12659) Integrate Flink with Hive GenericUDTF

2019-06-19 Thread ASF GitHub Bot (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12659: --- Labels: pull-request-available (was: ) > Integrate Flink with Hive GenericUDTF > - > > Key: FLINK-12659 > URL: https://issues.apache.org/jira/browse/FLINK-12659 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: Bowen Li >Assignee: Bowen Li >Priority: Major > Labels: pull-request-available > > https://hive.apache.org/javadocs/r3.1.1/api/org/apache/hadoop/hive/ql/udf/generic/GenericUDTF.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[GitHub] [flink] bowenli86 opened a new pull request #8798: [FLINK-12659][hive] Integrate Flink with Hive GenericUDTF

2019-06-19 Thread GitBox

bowenli86 opened a new pull request #8798: [FLINK-12659][hive] Integrate Flink with Hive GenericUDTF URL: https://github.com/apache/flink/pull/8798 ## What is the purpose of the change This PR integrates Flink with Hive GenericUDTF. ## Brief change log - added `HiveGenericUDTF` to delegate function calls to Hive's GenericUDTF - extracted a few util methods to `HiveFunctionUtil` - added unit tests for `HiveGenericUDTF` ## Verifying this change This change added tests and can be verified as `HiveGenericUDTFTest` ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (yes) - If yes, how is the feature documented? (docs) Documentation will be added later This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] walterddr commented on a change in pull request #8632: [FLINK-12744][ml] add shared params in ml package

2019-06-19 Thread GitBox

walterddr commented on a change in pull request #8632: [FLINK-12744][ml] add shared params in ml package URL: https://github.com/apache/flink/pull/8632#discussion_r295563061 ## File path: flink-ml-parent/flink-ml-lib/src/main/java/org/apache/flink/ml/params/shared/colname/HasKeepColNames.java ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.flink.ml.params.shared.colname; + +import org.apache.flink.ml.api.misc.param.ParamInfo; +import org.apache.flink.ml.api.misc.param.ParamInfoFactory; +import org.apache.flink.ml.api.misc.param.WithParams; + +/** + * An interface for classes with a parameter specifying the names of the columns to be retained in the output table. Review comment: my suggestion was following the convention of [RowTypeInfo](https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/RowTypeInfo.java). which IMO most of the table schema is defined against. but either works fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8785: [FLINK-11480][hive] Create HiveTableFactory that creates TableSource/…

2019-06-19 Thread GitBox

bowenli86 commented on a change in pull request #8785: [FLINK-11480][hive] Create HiveTableFactory that creates TableSource/… URL: https://github.com/apache/flink/pull/8785#discussion_r295561400 ## File path: flink-table/flink-table-planner/src/main/java/org/apache/flink/table/catalog/DatabaseCalciteSchema.java ## @@ -77,6 +84,23 @@ public Table getTable(String tableName) { !connectorTable.isBatch(), FlinkStatistic.of(tableSource.getTableStats().orElse(null .orElseThrow(() -> new TableException("Cannot query a sink only table.")); + } else if (table instanceof CatalogTable) { + Optional tableFactory = catalog.getTableFactory(); + TableSource tableSource = tableFactory.map(tf -> ((TableSourceFactory) tf).createTableSource((CatalogTable) table)) Review comment: nice, I didn't know map() can be applied to Optional objects This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8785: [FLINK-11480][hive] Create HiveTableFactory that creates TableSource/…

2019-06-19 Thread GitBox

xuefuz commented on a change in pull request #8785: [FLINK-11480][hive] Create HiveTableFactory that creates TableSource/… URL: https://github.com/apache/flink/pull/8785#discussion_r295548810 ## File path: flink-table/flink-table-planner/src/main/java/org/apache/flink/table/catalog/DatabaseCalciteSchema.java ## @@ -77,6 +84,23 @@ public Table getTable(String tableName) { !connectorTable.isBatch(), FlinkStatistic.of(tableSource.getTableStats().orElse(null .orElseThrow(() -> new TableException("Cannot query a sink only table.")); + } else if (table instanceof CatalogTable) { + Optional tableFactory = catalog.getTableFactory(); + TableSource tableSource = tableFactory.map(tf -> ((TableSourceFactory) tf).createTableSource((CatalogTable) table)) + .orElse(TableFactoryUtil.findAndCreateTableSource(((CatalogTable) table).toProperties())); + + if (!(tableSource instanceof StreamTableSource)) { Review comment: The message was from Dawid's change. I didn't add it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8785: [FLINK-11480][hive] Create HiveTableFactory that creates TableSource/…

2019-06-19 Thread GitBox

xuefuz commented on a change in pull request #8785: [FLINK-11480][hive] Create HiveTableFactory that creates TableSource/… URL: https://github.com/apache/flink/pull/8785#discussion_r295537393 ## File path: flink-table/flink-table-planner/src/main/java/org/apache/flink/table/catalog/DatabaseCalciteSchema.java ## @@ -77,6 +84,23 @@ public Table getTable(String tableName) { !connectorTable.isBatch(), FlinkStatistic.of(tableSource.getTableStats().orElse(null .orElseThrow(() -> new TableException("Cannot query a sink only table.")); + } else if (table instanceof CatalogTable) { + Optional tableFactory = catalog.getTableFactory(); + TableSource tableSource = tableFactory.map(tf -> ((TableSourceFactory) tf).createTableSource((CatalogTable) table)) Review comment: map() is applied to an Optional object, so the key (tf) is a TableFactory instance. As you can see from other types of tables, calcite seems only need TableSource. If tableFactory isn't present, then orElse() clause kicks in. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8785: [FLINK-11480][hive] Create HiveTableFactory that creates TableSource/…

2019-06-19 Thread GitBox

xuefuz commented on a change in pull request #8785: [FLINK-11480][hive] Create HiveTableFactory that creates TableSource/… URL: https://github.com/apache/flink/pull/8785#discussion_r295537393 ## File path: flink-table/flink-table-planner/src/main/java/org/apache/flink/table/catalog/DatabaseCalciteSchema.java ## @@ -77,6 +84,23 @@ public Table getTable(String tableName) { !connectorTable.isBatch(), FlinkStatistic.of(tableSource.getTableStats().orElse(null .orElseThrow(() -> new TableException("Cannot query a sink only table.")); + } else if (table instanceof CatalogTable) { + Optional tableFactory = catalog.getTableFactory(); + TableSource tableSource = tableFactory.map(tf -> ((TableSourceFactory) tf).createTableSource((CatalogTable) table)) Review comment: map() is applied to an Optional object, so the key (tf) is a TableFactory instance. As you can see from other types of tables, calcite seems only need TableSource. If tableFactory isn't present, then orElse() clause kicks in. The test verifies that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[jira] [Commented] (FLINK-12848) Method equals() in RowTypeInfo should consider fieldsNames

2019-06-19 Thread Enrico Canzonieri (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868029#comment-16868029 ] Enrico Canzonieri commented on FLINK-12848: --- This is causing issues to one of our queries where the schema has two nested records that have two fields of the same type but different name, e.g. Row(Row(a: Int, b: Int), Row(c: Int, d:Int)) where "a", "b", "c", "d" are the field names. The code in [https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkTypeFactory.scala#L92|https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/calcite/FlinkTypeFactory.scala#L92)] that is caching the Type conversion is returning the for Row(c: Int, d: Int) the conversion cached for the first nested Row. As result the generated table schema will have mixed up (and clashing) field names. I see that the equals() change to RowTypeInfo was introduced in FLINK-9444. Is there any reason why we should never consider the field names for RowTypeInfo equals? If so would it make sense to have a method that covers that special (to my understanding) case and make equals also include names? I'm currently planning to fix this locally by extending the equals method of RowTypeInfo, but it'd be great to know whether that's safe to do or not. > Method equals() in RowTypeInfo should consider fieldsNames > -- > > Key: FLINK-12848 > URL: https://issues.apache.org/jira/browse/FLINK-12848 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.7.2 >Reporter: aloyszhang >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Since the `RowTypeInfo#equals()` does not consider the fieldNames , when > process data with RowTypeInfo type there may comes an error of the field > name.   > {code:java} > String [] fields = new String []{"first", "second"}; > TypeInformation[] types = new TypeInformation[]{ > Types.ROW_NAMED(new String[]{"first001"}, Types.INT), > Types.ROW_NAMED(new String[]{"second002"}, Types.INT) }; > StreamExecutionEnvironment execEnv = > StreamExecutionEnvironment.getExecutionEnvironment(); > StreamTableEnvironment env = > StreamTableEnvironment.getTableEnvironment(execEnv); > SimpleProcessionTimeSource streamTableSource = new > SimpleProcessionTimeSource(fields, types); > env.registerTableSource("testSource", streamTableSource); > Table sourceTable = env.scan("testSource"); > System.out.println("Source table schema : "); > sourceTable.printSchema(); > {code} > The table shcema will be > {code:java} > Source table schema : > root > |-- first: Row(first001: Integer) > |-- second: Row(first001: Integer) > |-- timestamp: TimeIndicatorTypeInfo(proctime) > {code} > the second field has the same name with the first field. > So, we should consider the fieldnames in RowTypeInfo#equals() > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[GitHub] [flink] asfgit closed pull request #8770: [FLINK-12658][hive] Integrate Flink with Hive GenericUDF

2019-06-19 Thread GitBox

asfgit closed pull request #8770: [FLINK-12658][hive] Integrate Flink with Hive GenericUDF URL: https://github.com/apache/flink/pull/8770 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[jira] [Closed] (FLINK-12658) Integrate Flink with Hive GenericUDF

2019-06-19 Thread Bowen Li (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Li closed FLINK-12658. Resolution: Fixed merged in 1.9.0: 6f89f3d0720823019e3200a4eb572f7281657344 > Integrate Flink with Hive GenericUDF > > > Key: FLINK-12658 > URL: https://issues.apache.org/jira/browse/FLINK-12658 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: Bowen Li >Assignee: Bowen Li >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 10m > Remaining Estimate: 0h > > https://hive.apache.org/javadocs/r3.1.1/api/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[GitHub] [flink] bowenli86 commented on issue #8770: [FLINK-12658][hive] Integrate Flink with Hive GenericUDF

2019-06-19 Thread GitBox

bowenli86 commented on issue #8770: [FLINK-12658][hive] Integrate Flink with Hive GenericUDF URL: https://github.com/apache/flink/pull/8770#issuecomment-503735850 @xuefuz Thanks for your review! Merging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] vim345 commented on issue #8737: [FLINK-12848][core] Consider fieldNames in RowTypeInfo#equals()

2019-06-19 Thread GitBox

vim345 commented on issue #8737: [FLINK-12848][core] Consider fieldNames in RowTypeInfo#equals() URL: https://github.com/apache/flink/pull/8737#issuecomment-503735121 @aloyszhang Is there any reasons you didn't merge this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] ex00 commented on issue #8632: [FLINK-12744][ml] add shared params in ml package

2019-06-19 Thread GitBox

ex00 commented on issue #8632: [FLINK-12744][ml] add shared params in ml package URL: https://github.com/apache/flink/pull/8632#issuecomment-503721142 Thanks @xuyang1706 LGFM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] ex00 commented on a change in pull request #8776: [FLINK-12881][ml] Add more functionalities for ML Params and ParamInfo class

2019-06-19 Thread GitBox

ex00 commented on a change in pull request #8776: [FLINK-12881][ml] Add more functionalities for ML Params and ParamInfo class URL: https://github.com/apache/flink/pull/8776#discussion_r295493529 ## File path: flink-ml-parent/flink-ml-api/src/main/java/org/apache/flink/ml/api/misc/param/Params.java ## @@ -44,17 +86,39 @@ * @param the type of the specific parameter * @return the value of the specific parameter, or default value defined in the {@code info} if * this Params doesn't contain the parameter -* @throws RuntimeException if the Params doesn't contains the specific parameter, while the -* param is not optional but has no default value in the {@code info} +* @throws IllegalArgumentException if the Params doesn't contains the specific parameter, while the +* param is not optional but has no default value in the {@code info} or +* if the Params contains the specific parameter and alias, but has more +* than one value or +* if the Params doesn't contains the specific parameter, while the ParamInfo +* is optional but has no default value */ - @SuppressWarnings("unchecked") public V get(ParamInfo info) { - V value = (V) paramMap.getOrDefault(info.getName(), info.getDefaultValue()); - if (value == null && !info.isOptional() && !info.hasDefaultValue()) { - throw new RuntimeException(info.getName() + - " not exist which is not optional and don't have a default value"); + String value = null; + String usedParamName = null; + for (String nameOrAlias : getParamNameAndAlias(info)) { + if (params.containsKey(nameOrAlias)) { + if (usedParamName != null) { + throw new IllegalArgumentException(String.format("Duplicate parameters of %s and %s", Review comment: Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] asfgit closed pull request #8786: [FLINK-12877][table][hive] Unify catalog database implementations

2019-06-19 Thread GitBox

asfgit closed pull request #8786: [FLINK-12877][table][hive] Unify catalog database implementations URL: https://github.com/apache/flink/pull/8786 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[jira] [Closed] (FLINK-12877) Unify catalog database implementations

2019-06-19 Thread Bowen Li (JIRA)

[ https://issues.apache.org/jira/browse/FLINK-12877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Li closed FLINK-12877. Resolution: Fixed merged in 1.9.0: 3cf65b11237a2928273ce5675faf6b2900b0b76a > Unify catalog database implementations > -- > > Key: FLINK-12877 > URL: https://issues.apache.org/jira/browse/FLINK-12877 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive, Table SQL / API >Reporter: Bowen Li >Assignee: Bowen Li >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 10m > Remaining Estimate: 0h > > per discussion in https://issues.apache.org/jira/browse/FLINK-12841 -- This message was sent by Atlassian JIRA (v7.6.3#76005)

[GitHub] [flink] bowenli86 commented on issue #8786: [FLINK-12877][table][hive] Unify catalog database implementations

2019-06-19 Thread GitBox

bowenli86 commented on issue #8786: [FLINK-12877][table][hive] Unify catalog database implementations URL: https://github.com/apache/flink/pull/8786#issuecomment-503686014 @xuefuz thanks for your review! Merging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] bowenli86 commented on issue #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables

2019-06-19 Thread GitBox

bowenli86 commented on issue #8766: [FLINK-12664][hive] Implement TableSink to write Hive tables URL: https://github.com/apache/flink/pull/8766#issuecomment-503682405 I don't have any other concerns. @xuefuz do you? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] bowenli86 commented on issue #8786: [FLINK-12877][table][hive] Unify catalog database implementations

2019-06-19 Thread GitBox

bowenli86 commented on issue #8786: [FLINK-12877][table][hive] Unify catalog database implementations URL: https://github.com/apache/flink/pull/8786#issuecomment-503681013 @lirui-apache thanks for your review! After a discussion with @xuefuz, we felt it's not necessary to distinguish whether a database is generic or not. Thus I completely removed that part from the PR. @xuefuz can you take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] flinkbot commented on issue #8797: [Docks] Checkpoints: fix typo

2019-06-19 Thread GitBox

flinkbot commented on issue #8797: [Docks] Checkpoints: fix typo URL: https://github.com/apache/flink/pull/8797#issuecomment-503670139 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] casidiablo opened a new pull request #8797: [Docks] Checkpoints: fix typo

2019-06-19 Thread GitBox

casidiablo opened a new pull request #8797: [Docks] Checkpoints: fix typo URL: https://github.com/apache/flink/pull/8797 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #8783: [FLINK-12863][FLINK-12865] Remove concurrency from HeartbeatManager(Sender)Impl

2019-06-19 Thread GitBox

azagrebin commented on a change in pull request #8783: [FLINK-12863][FLINK-12865] Remove concurrency from HeartbeatManager(Sender)Impl URL: https://github.com/apache/flink/pull/8783#discussion_r295407861 ## File path: flink-runtime/src/test/java/org/apache/flink/runtime/heartbeat/TestingHeartbeatServices.java ## @@ -18,43 +18,18 @@ package org.apache.flink.runtime.heartbeat; -import org.apache.flink.runtime.clusterframework.types.ResourceID; import org.apache.flink.runtime.concurrent.ScheduledExecutor; -import org.apache.flink.runtime.testingUtils.TestingUtils; -import org.apache.flink.util.Preconditions; - -import org.slf4j.Logger; /** * A {@link HeartbeatServices} that allows the injection of a {@link ScheduledExecutor}. */ public class TestingHeartbeatServices extends HeartbeatServices { - private final ScheduledExecutor scheduledExecutorToUse; - - public TestingHeartbeatServices(long heartbeatInterval, long heartbeatTimeout, ScheduledExecutor scheduledExecutorToUse) { + public TestingHeartbeatServices(long heartbeatInterval, long heartbeatTimeout) { super(heartbeatInterval, heartbeatTimeout); - - this.scheduledExecutorToUse = Preconditions.checkNotNull(scheduledExecutorToUse); } public TestingHeartbeatServices() { - this(1000L, 1L, TestingUtils.defaultScheduledExecutor()); - } - - @Override - public HeartbeatManager createHeartbeatManagerSender( - ResourceID resourceId, - HeartbeatListener heartbeatListener, - ScheduledExecutor mainThreadExecutor, - Logger log) { - - return new HeartbeatManagerSenderImpl<>( - heartbeatInterval, - heartbeatTimeout, - resourceId, - heartbeatListener, - scheduledExecutorToUse, - log); + this(1000L, 1L); Review comment: seems this class is not needed any more or at least the comment is outdated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #8783: [FLINK-12863][FLINK-12865] Remove concurrency from HeartbeatManager(Sender)Impl

2019-06-19 Thread GitBox

azagrebin commented on a change in pull request #8783: [FLINK-12863][FLINK-12865] Remove concurrency from HeartbeatManager(Sender)Impl URL: https://github.com/apache/flink/pull/8783#discussion_r295395850 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/heartbeat/HeartbeatManagerSenderImpl.java ## @@ -36,54 +34,43 @@ */ public class HeartbeatManagerSenderImpl extends HeartbeatManagerImpl implements Runnable { - private final ScheduledFuture triggerFuture; + private final long heartbeatPeriod; - public HeartbeatManagerSenderImpl( - long heartbeatPeriod, - long heartbeatTimeout, - ResourceID ownResourceID, - HeartbeatListener heartbeatListener, - ScheduledExecutor mainThreadExecutor, - Logger log) { + HeartbeatManagerSenderImpl( + long heartbeatPeriod, + long heartbeatTimeout, + ResourceID ownResourceID, + HeartbeatListener heartbeatListener, + ScheduledExecutor mainThreadExecutor, + Logger log) { Review comment: formatting This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

[GitHub] [flink] azagrebin commented on a change in pull request #8783: [FLINK-12863][FLINK-12865] Remove concurrency from HeartbeatManager(Sender)Impl

2019-06-19 Thread GitBox

azagrebin commented on a change in pull request #8783: [FLINK-12863][FLINK-12865] Remove concurrency from HeartbeatManager(Sender)Impl URL: https://github.com/apache/flink/pull/8783#discussion_r295380352 ## File path: flink-runtime/src/test/java/org/apache/flink/runtime/taskexecutor/TaskExecutorTest.java ## @@ -1898,4 +1998,22 @@ public SlotReport createSlotReport(ResourceID resourceId) { return slotReports.poll(); } } + + private static final class AllocateSlotNotifyingTaskSlotTable extends TaskSlotTable { + + private final OneShotLatch allocateSlotLatch; + + private AllocateSlotNotifyingTaskSlotTable(Collection resourceProfiles, TimerService timerService, OneShotLatch allocateSlotLatch) { Review comment: nit: I would avoid long lines for readability. Shorter lines are easier to read in reviews and comparing diffs and easier perceived in general. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services

1 2 3 4 >

1 2 3 4 >

1 - 100 of 363 matches

Mail list logo