date:20190603

[jira] [Updated] (FLINK-12715) Hive-1.2.1 build is broken

2019-06-03 Thread Rui Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated FLINK-12715:
---
Description: Some Hive util methods used are not compatible between 
Hive-2.3.4 and Hive-1.2.1.  (was: Some Hive util methods used are not 
compatible between Hive-2.3.4 and Hive-1.2.1. Seems we have to implement these 
method ourselves.)

> Hive-1.2.1 build is broken
> --
>
> Key: FLINK-12715
> URL: https://issues.apache.org/jira/browse/FLINK-12715
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Major
>
> Some Hive util methods used are not compatible between Hive-2.3.4 and 
> Hive-1.2.1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10855) CheckpointCoordinator does not delete checkpoint directory of late/failed checkpoints

2019-06-03 Thread vinoyang (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855394#comment-16855394
 ] 

vinoyang commented on FLINK-10855:
--

[~till.rohrmann] I have submitted a design draft, if you have time you can have 
a look, thanks.

> CheckpointCoordinator does not delete checkpoint directory of late/failed 
> checkpoints
> -
>
> Key: FLINK-10855
> URL: https://issues.apache.org/jira/browse/FLINK-10855
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.5.5, 1.6.2, 1.7.0
>Reporter: Till Rohrmann
>Assignee: vinoyang
>Priority: Major
>
> In case that an acknowledge checkpoint message is late or a checkpoint cannot 
> be acknowledged, we discard the subtask state in the 
> {{CheckpointCoordinator}}. What's not happening in this case is that we 
> delete the parent directory of the checkpoint. This only happens when we 
> dispose a {{PendingCheckpoint#dispose}}. 
> Due to this behaviour it can happen that a checkpoint fails (e.g. a task not 
> being ready) and we delete the checkpoint directory. Next another task writes 
> its checkpoint data to the checkpoint directory (thereby creating it again) 
> and sending an acknowledge message back to the {{CheckpointCoordinator}}. The 
> {{CheckpointCoordinator}} will realize that there is no longer a 
> {{PendingCheckpoint}} and will discard the sub task state. This will remove 
> the state files from the checkpoint directory but will leave the checkpoint 
> directory untouched.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] dawidwys commented on issue #8585: [FLINK-12690][table-api] Introduce a Planner interface

2019-06-03 Thread GitBox

dawidwys commented on issue #8585:  [FLINK-12690][table-api] Introduce a 
Planner interface 
URL: https://github.com/apache/flink/pull/8585#issuecomment-498543605
 
 
   You are right, the `Planner` will need the `CatalogManager` (not only for 
parse, but also for Table scans). I don't want to add an `init` method though 
as this is rather bad design to have objects that you can construct but they 
are unusable unitl some method call. Therefore we should just add the 
`CatalogManager` to the ctor of the `Planner`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] twalthr commented on issue #8521: [FLINK-12601][table] Make the DataStream & DataSet conversion to a Table independent from Calcite

2019-06-03 Thread GitBox

twalthr commented on issue #8521: [FLINK-12601][table] Make the DataStream & 
DataSet conversion to a Table independent from Calcite
URL: https://github.com/apache/flink/pull/8521#issuecomment-498543106
 
 
   @flinkbot approve all


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #8607: [FLINK-12652] [documentation] add first version of a glossary

2019-06-03 Thread GitBox

flinkbot commented on issue #8607: [FLINK-12652] [documentation] add first 
version of a glossary
URL: https://github.com/apache/flink/pull/8607#issuecomment-498542685
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (FLINK-12652) Add Glossary to Concepts Section of Documentation

2019-06-03 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12652:
---
Labels: pull-request-available  (was: )

> Add Glossary to Concepts Section of Documentation
> -
>
> Key: FLINK-12652
> URL: https://issues.apache.org/jira/browse/FLINK-12652
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Konstantin Knauf
>Assignee: Konstantin Knauf
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12724) Add Links to new Concepts Section to Glossary

2019-06-03 Thread Konstantin Knauf (JIRA)

Konstantin Knauf created FLINK-12724:


 Summary: Add Links to new Concepts Section to Glossary
 Key: FLINK-12724
 URL: https://issues.apache.org/jira/browse/FLINK-12724
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Konstantin Knauf


Once we have reworked the Concepts section, we should add references to to the 
glossary. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] knaufk opened a new pull request #8607: [FLINK-12652] [documentation] add first version of a glossary

2019-06-03 Thread GitBox

knaufk opened a new pull request #8607: [FLINK-12652] [documentation] add first 
version of a glossary
URL: https://github.com/apache/flink/pull/8607
 
 
   ## What is the purpose of the change
   
   * Part of FLIP-42
   * Add a glossary section to the documentation for further reference and to 
establish a common terminology for common terms
   
   ## Brief change log
   
   * add first version of glossary
   
   ## Verifying this change
   
   * build docs and proof-read
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? not applicable
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Comment Edited] (FLINK-10455) Potential Kafka producer leak in case of failures

2019-06-03 Thread sunjincheng (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855384#comment-16855384
 ] 

sunjincheng edited comment on FLINK-10455 at 6/4/19 6:31 AM:
-

Currently, we have added an exception capture for the 
`FlinkKafkaProducer011#commit` method to ensure that the 
`recycleTransactionalProducer` will be executed, so do we also need to add an 
exception capture for `FlinkKafkaProducer011#abort`?

 

!image-2019-06-04-14-30-55-985.png!


was (Author: sunjincheng121):
Currently, we have added an exception capture for the 
`FlinkKafkaProducer011#commit` method to ensure that the 
`recycleTransactionalProducer` will be executed, so do we also need to add an 
exception capture for `FlinkKafkaProducer011#abort`?

 

!image-2019-06-04-14-25-16-916.png!

> Potential Kafka producer leak in case of failures
> -
>
> Key: FLINK-10455
> URL: https://issues.apache.org/jira/browse/FLINK-10455
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.5.2
>Reporter: Nico Kruber
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.6, 1.7.0, 1.7.3, 1.9.0, 1.8.1
>
> Attachments: image-2019-06-04-14-25-16-916.png, 
> image-2019-06-04-14-30-55-985.png
>
>
> If the Kafka brokers' timeout is too low for our checkpoint interval [1], we 
> may get an {{ProducerFencedException}}. Documentation around 
> {{ProducerFencedException}} explicitly states that we should close the 
> producer after encountering it.
> By looking at the code, it doesn't seem like this is actually done in 
> {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in 
> {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an 
> exception, we don't clean up (nor try to commit) any other transaction.
> -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} 
> simply iterates over the {{pendingCommitTransactions}} which is not touched 
> during {{close()}}
> Now if we restart the failing job on the same Flink cluster, any resources 
> from the previous attempt will still linger around.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10455) Potential Kafka producer leak in case of failures

2019-06-03 Thread sunjincheng (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855384#comment-16855384
 ] 

sunjincheng commented on FLINK-10455:
-

Currently, we have added an exception capture for the 
`FlinkKafkaProducer011#commit` method to ensure that the 
`recycleTransactionalProducer` will be executed, so do we also need to add an 
exception capture for `FlinkKafkaProducer011#abort`?

 

!image-2019-06-04-14-25-16-916.png!

> Potential Kafka producer leak in case of failures
> -
>
> Key: FLINK-10455
> URL: https://issues.apache.org/jira/browse/FLINK-10455
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.5.2
>Reporter: Nico Kruber
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.6, 1.7.0, 1.7.3, 1.9.0, 1.8.1
>
> Attachments: image-2019-06-04-14-25-16-916.png
>
>
> If the Kafka brokers' timeout is too low for our checkpoint interval [1], we 
> may get an {{ProducerFencedException}}. Documentation around 
> {{ProducerFencedException}} explicitly states that we should close the 
> producer after encountering it.
> By looking at the code, it doesn't seem like this is actually done in 
> {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in 
> {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an 
> exception, we don't clean up (nor try to commit) any other transaction.
> -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} 
> simply iterates over the {{pendingCommitTransactions}} which is not touched 
> during {{close()}}
> Now if we restart the failing job on the same Flink cluster, any resources 
> from the previous attempt will still linger around.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-10455) Potential Kafka producer leak in case of failures

2019-06-03 Thread sunjincheng (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunjincheng updated FLINK-10455:

Attachment: image-2019-06-04-14-25-16-916.png

> Potential Kafka producer leak in case of failures
> -
>
> Key: FLINK-10455
> URL: https://issues.apache.org/jira/browse/FLINK-10455
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.5.2
>Reporter: Nico Kruber
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.6, 1.7.0, 1.7.3, 1.9.0, 1.8.1
>
> Attachments: image-2019-06-04-14-25-16-916.png
>
>
> If the Kafka brokers' timeout is too low for our checkpoint interval [1], we 
> may get an {{ProducerFencedException}}. Documentation around 
> {{ProducerFencedException}} explicitly states that we should close the 
> producer after encountering it.
> By looking at the code, it doesn't seem like this is actually done in 
> {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in 
> {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an 
> exception, we don't clean up (nor try to commit) any other transaction.
> -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} 
> simply iterates over the {{pendingCommitTransactions}} which is not touched 
> during {{close()}}
> Now if we restart the failing job on the same Flink cluster, any resources 
> from the previous attempt will still linger around.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client

2019-06-03 Thread vinoyang (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855369#comment-16855369
 ] 

vinoyang commented on FLINK-6755:
-

Hi [~till.rohrmann] [~aljoscha] [~gyfora] It seems there is another issue try 
to work for trigger checkpoint manually(FLINK-12619). I think we need to agree 
with this idea before we start these issues so that we can reduce unnecessary 
work. cc [~klion26]

> Allow triggering Checkpoints through command line client
> 
>
> Key: FLINK-6755
> URL: https://issues.apache.org/jira/browse/FLINK-6755
> Project: Flink
>  Issue Type: New Feature
>  Components: Command Line Client, Runtime / Checkpointing
>Affects Versions: 1.3.0
>Reporter: Gyula Fora
>Assignee: vinoyang
>Priority: Major
>
> The command line client currently only allows triggering (and canceling with) 
> Savepoints. 
> While this is good if we want to fork or modify the pipelines in a 
> non-checkpoint compatible way, now with incremental checkpoints this becomes 
> wasteful for simple job restarts/pipeline updates. 
> I suggest we add a new command: 
> ./bin/flink checkpoint  [checkpointDirectory]
> and a new flag -c for the cancel command to indicate we want to trigger a 
> checkpoint:
> ./bin/flink cancel -c [targetDirectory] 
> Otherwise this can work similar to the current savepoint taking logic, we 
> could probably even piggyback on the current messages by adding boolean flag 
> indicating whether it should be a savepoint or a checkpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-12713) deprecate descriptor, validator, and factory of ExternalCatalog

2019-06-03 Thread Timo Walther (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-12713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855362#comment-16855362
 ] 

Timo Walther commented on FLINK-12713:
--

[~phoenixjiangnan] there is no need to deprecate those classes. They have been 
merged recently in this release. I thought we just update them once the new 
interfaces are ready.

> deprecate descriptor, validator, and factory of ExternalCatalog
> ---
>
> Key: FLINK-12713
> URL: https://issues.apache.org/jira/browse/FLINK-12713
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Client
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12723) Adds a wiki page about setting up a Python Table API development environment

2019-06-03 Thread Dian Fu (JIRA)

Dian Fu created FLINK-12723:
---

 Summary: Adds a wiki page about setting up a Python Table API 
development environment
 Key: FLINK-12723
 URL: https://issues.apache.org/jira/browse/FLINK-12723
 Project: Flink
  Issue Type: Sub-task
Reporter: Dian Fu


We should add a wiki page showing how to set up a Python Table API development 
environment to help contributors who are interested in the Python Table API to 
join in easily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-12719) Add the Python catalog API

2019-06-03 Thread Dian Fu (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-12719:

Summary: Add the Python catalog API  (was: Add the catalog API for the 
Python Table API)

> Add the Python catalog API
> --
>
> Key: FLINK-12719
> URL: https://issues.apache.org/jira/browse/FLINK-12719
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Python
>Reporter: Dian Fu
>Assignee: Dian Fu
>Priority: Major
>
> The new catalog API is almost ready. We should add the corresponding Python 
> catalog API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] dawidwys commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables

2019-06-03 Thread GitBox

dawidwys commented on a change in pull request #8549: 
[FLINK-12604][table-api][table-planner] Register TableSource/Sink as 
CatalogTables
URL: https://github.com/apache/flink/pull/8549#discussion_r290133398
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/ConnectorCatalogTable.java
 ##
 @@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.catalog;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.table.sinks.TableSink;
+import org.apache.flink.table.sources.DefinedProctimeAttribute;
+import org.apache.flink.table.sources.DefinedRowtimeAttributes;
+import org.apache.flink.table.sources.RowtimeAttributeDescriptor;
+import org.apache.flink.table.sources.TableSource;
+import org.apache.flink.table.typeutils.TimeIndicatorTypeInfo;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+
+/**
+ * A {@link CatalogTable} that wraps a {@link TableSource} and/or {@link 
TableSink}.
+ * This allows registering those in a {@link Catalog}. It can not be persisted 
as the
+ * source and/or sink might be inline implementations and not be representable 
in a
 
 Review comment:
   It says the `TableSource` might be a inline implementation. You can either 
use `TableEnvironment#connect` than theoretically the table source is property 
serializable. The other possibility is just to use `TableSource` explicitly: 
`TableEnvironment#fromTableSource`. 
   
   The `ConnectorCatalogTable` therefore is always inline as we don't know 
which case we are handling.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (FLINK-12722) Adds Python Table API tutorial

2019-06-03 Thread Dian Fu (JIRA)

Dian Fu created FLINK-12722:
---

 Summary: Adds Python Table API tutorial
 Key: FLINK-12722
 URL: https://issues.apache.org/jira/browse/FLINK-12722
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Dian Fu
Assignee: Dian Fu


We should add a tutorial for Python Table API in the docs to help beginners of 
Python Table API to get a basic knowledge of how to create a simple Python 
Table API job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] flinkbot commented on issue #8606: [FLINK-12721] Process integer type in json format

2019-06-03 Thread GitBox

flinkbot commented on issue #8606: [FLINK-12721] Process integer type in json 
format
URL: https://github.com/apache/flink/pull/8606#issuecomment-498529338
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] aloyszhang opened a new pull request #8606: Process integer type in json format

2019-06-03 Thread GitBox

aloyszhang opened a new pull request #8606: Process integer type in json format
URL: https://github.com/apache/flink/pull/8606
 
 
   ## What is the purpose of the change
   
   This PR aims at providing a more precisely way to process `integer` type in 
JSON format.
   
   
   ## Brief change log
   
   - convert `integer` type in JSON Schema to `Types.INT`
   
   
   ## Verifying this change
   
   This change is already covered by existing tests, such as test in flink-json 
module.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (**yes** / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (FLINK-12721) make flink-json more precisely when handle integer type

2019-06-03 Thread aloyszhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

aloyszhang updated FLINK-12721:
---
Description: 
At  present， flink-json convert integer type to `Types.BIG_DEC` which will make 
some mismatch error when sink to external storage system like MySql, we can 
make it more precisely when by process integer as `Types.INT`

 

  was:
At  present， flink-json convert integer type to `Types.BIG_DEC` which will make 
some mismatch error when sink to some external storage system like MySql, we 
can make it more precisely when by process integer as `Types.INT`

 


> make flink-json more precisely when handle integer type
> ---
>
> Key: FLINK-12721
> URL: https://issues.apache.org/jira/browse/FLINK-12721
> Project: Flink
>  Issue Type: Improvement
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.8.0
>Reporter: aloyszhang
>Assignee: aloyszhang
>Priority: Major
>
> At  present， flink-json convert integer type to `Types.BIG_DEC` which will 
> make some mismatch error when sink to external storage system like MySql, we 
> can make it more precisely when by process integer as `Types.INT`
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12721) make flink-json more precisely when handle integer type

2019-06-03 Thread aloyszhang (JIRA)

aloyszhang created FLINK-12721:
--

 Summary: make flink-json more precisely when handle integer type
 Key: FLINK-12721
 URL: https://issues.apache.org/jira/browse/FLINK-12721
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.8.0
Reporter: aloyszhang
Assignee: aloyszhang


At  present， flink-json convert integer type to `Types.BIG_DEC` which will make 
some mismatch error when sink to some external storage system like MySql, we 
can make it more precisely when by process integer as `Types.INT`

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12720) Add the Python Table API Sphinx docs

2019-06-03 Thread Dian Fu (JIRA)

Dian Fu created FLINK-12720:
---

 Summary: Add the Python Table API Sphinx docs
 Key: FLINK-12720
 URL: https://issues.apache.org/jira/browse/FLINK-12720
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python, Documentation
Reporter: Dian Fu


As the Python Table API is added, we should add the Python Table API Sphinx 
docs. This includes the following work:
1) Add scripts to build the Sphinx docs
2) Add a link in the main page to the generated doc



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] godfreyhe commented on issue #8585: [FLINK-12690][table-api] Introduce a Planner interface

2019-06-03 Thread GitBox

godfreyhe commented on issue #8585:  [FLINK-12690][table-api] Introduce a 
Planner interface 
URL: https://github.com/apache/flink/pull/8585#issuecomment-498524833
 
 
   hi @dawidwys, there is a minor question: `parse` method in `Planner` class 
returns validated `TableOperation`, so the `Planner` instance should hold 
`CatalogManager` instance i think. My question is does `Planner` need methods 
like `setCatalogManager` or `registerXXX` to connect `TableEnvironment` and 
`Planner` by service discovery. thanks~


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #8605: [FLINK-9363] bump up flink-shaded-jackson version to 2.9.8-7.0

2019-06-03 Thread GitBox

flinkbot commented on issue #8605: [FLINK-9363] bump up flink-shaded-jackson 
version to 2.9.8-7.0
URL: https://github.com/apache/flink/pull/8605#issuecomment-498522976
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (FLINK-12719) Add the catalog API for the Python Table API

2019-06-03 Thread Dian Fu (JIRA)

Dian Fu created FLINK-12719:
---

 Summary: Add the catalog API for the Python Table API
 Key: FLINK-12719
 URL: https://issues.apache.org/jira/browse/FLINK-12719
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu


The new catalog API is almost ready. We should add the corresponding Python 
catalog API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] aloyszhang opened a new pull request #8605: [FLINK-9363] bump up flink-shaded-jackson version to 2.9.8-7.0

2019-06-03 Thread GitBox

aloyszhang opened a new pull request #8605: [FLINK-9363] bump up 
flink-shaded-jackson version to 2.9.8-7.0
URL: https://github.com/apache/flink/pull/8605
 
 
   ## What is the purpose of the change
   
   This PR is to bump up flink-shaded-jackson version to 2.9.8-7.0, since the 
flink-shaded project has release 7.0.
   
   ## Brief change log
   
 - bump up flink-shaded-jackson version to 2.9.8-7.0
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (**yes** / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (**yes** / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (FLINK-12426) TM occasionally hang in deploying state

2019-06-03 Thread Qi (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-12426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855306#comment-16855306
 ] 

Qi commented on FLINK-12426:


[~sunhaibotb] Sure, closed.

> TM occasionally hang in deploying state
> ---
>
> Key: FLINK-12426
> URL: https://issues.apache.org/jira/browse/FLINK-12426
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Qi
>Priority: Major
>
> Hi all,
>   
>  We use Flink batch and start thousands of jobs per day. Occasionally we 
> observed some stuck jobs, due to some TM hang in “DEPLOYING” state. 
>   
>  It seems that the TM is calling BlobClient to download jars from 
> JM/BlobServer. Under hood it’s calling Socket.connect() and then 
> Socket.read() to retrieve results. 
>   
>  These jobs usually have many TM slots (1~2k). We checked the TM log and 
> dumped the TM thread. It indeed hung on socket read to download jar from Blob 
> server. 
>   
>  We're using Flink 1.5 but this may also affect later versions since related 
> code are not changed much. We've tried to add socket timeout in BlobClient, 
> but still no luck.
>   
>  
>  TM log
>  
>  ...
>  INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Received task 
> DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) 
> (184/2000).
> INFO org.apache.flink.runtime.taskmanager.Task - DataSource (at 
> createInput(ExecutionEnvironment.java:548) (our.code)) (184/2000) switched 
> from CREATED to DEPLOYING.
> INFO org.apache.flink.runtime.taskmanager.Task - Creating FileSystem stream 
> leak safety net for task DataSource (at 
> createInput(ExecutionEnvironment.java:548) (our.code)) (184/2000) [DEPLOYING]
> INFO org.apache.flink.runtime.taskmanager.Task - Loading JAR files for task 
> DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) 
> (184/2000) [DEPLOYING].
> INFO org.apache.flink.runtime.blob.BlobClient - Downloading 
> 19e65c0caa41f264f9ffe4ca2a48a434/p-3ecd6341bf97d5512b14c93f6c9f51f682b6db26-37d5e69d156ee00a924c1ebff0c0d280
>  from some-host-ip-port
> {color:#22}no more logs...{color}
>   
>  
>  TM thread dump:
>  
>  _"DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) 
> (1999/2000)" #72 prio=5 os_prio=0 tid=0x7fb9a1521000 nid=0xa0994 runnable 
> [0x7fb97cfbf000]_
>     _java.lang.Thread.State: RUNNABLE_
>          _at java.net.SocketInputStream.socketRead0(Native Method)_
>          _at 
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)_
>          _at java.net.SocketInputStream.read(SocketInputStream.java:171)_
>          _at java.net.SocketInputStream.read(SocketInputStream.java:141)_
>          _at 
> org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:152)_
>          _at 
> org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:140)_
>          _at 
> org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:170)_
>          _at 
> org.apache.flink.runtime.blob.AbstractBlobCache.getFileInternal(AbstractBlobCache.java:181)_
>          _at 
> org.apache.flink.runtime.blob.PermanentBlobCache.getFile(PermanentBlobCache.java:206)_
>          _at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:120)_
>          _- locked <0x00078ab60ba8> (a java.lang.Object)_
>          _at 
> org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:893)_
>          _at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585)_
>          _at java.lang.Thread.run(Thread.java:748)_
>  __
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (FLINK-12426) TM occasionally hang in deploying state

2019-06-03 Thread Qi (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qi closed FLINK-12426.
--
Resolution: Duplicate

> TM occasionally hang in deploying state
> ---
>
> Key: FLINK-12426
> URL: https://issues.apache.org/jira/browse/FLINK-12426
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Qi
>Priority: Major
>
> Hi all,
>   
>  We use Flink batch and start thousands of jobs per day. Occasionally we 
> observed some stuck jobs, due to some TM hang in “DEPLOYING” state. 
>   
>  It seems that the TM is calling BlobClient to download jars from 
> JM/BlobServer. Under hood it’s calling Socket.connect() and then 
> Socket.read() to retrieve results. 
>   
>  These jobs usually have many TM slots (1~2k). We checked the TM log and 
> dumped the TM thread. It indeed hung on socket read to download jar from Blob 
> server. 
>   
>  We're using Flink 1.5 but this may also affect later versions since related 
> code are not changed much. We've tried to add socket timeout in BlobClient, 
> but still no luck.
>   
>  
>  TM log
>  
>  ...
>  INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Received task 
> DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) 
> (184/2000).
> INFO org.apache.flink.runtime.taskmanager.Task - DataSource (at 
> createInput(ExecutionEnvironment.java:548) (our.code)) (184/2000) switched 
> from CREATED to DEPLOYING.
> INFO org.apache.flink.runtime.taskmanager.Task - Creating FileSystem stream 
> leak safety net for task DataSource (at 
> createInput(ExecutionEnvironment.java:548) (our.code)) (184/2000) [DEPLOYING]
> INFO org.apache.flink.runtime.taskmanager.Task - Loading JAR files for task 
> DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) 
> (184/2000) [DEPLOYING].
> INFO org.apache.flink.runtime.blob.BlobClient - Downloading 
> 19e65c0caa41f264f9ffe4ca2a48a434/p-3ecd6341bf97d5512b14c93f6c9f51f682b6db26-37d5e69d156ee00a924c1ebff0c0d280
>  from some-host-ip-port
> {color:#22}no more logs...{color}
>   
>  
>  TM thread dump:
>  
>  _"DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) 
> (1999/2000)" #72 prio=5 os_prio=0 tid=0x7fb9a1521000 nid=0xa0994 runnable 
> [0x7fb97cfbf000]_
>     _java.lang.Thread.State: RUNNABLE_
>          _at java.net.SocketInputStream.socketRead0(Native Method)_
>          _at 
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)_
>          _at java.net.SocketInputStream.read(SocketInputStream.java:171)_
>          _at java.net.SocketInputStream.read(SocketInputStream.java:141)_
>          _at 
> org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:152)_
>          _at 
> org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:140)_
>          _at 
> org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:170)_
>          _at 
> org.apache.flink.runtime.blob.AbstractBlobCache.getFileInternal(AbstractBlobCache.java:181)_
>          _at 
> org.apache.flink.runtime.blob.PermanentBlobCache.getFile(PermanentBlobCache.java:206)_
>          _at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:120)_
>          _- locked <0x00078ab60ba8> (a java.lang.Object)_
>          _at 
> org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:893)_
>          _at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585)_
>          _at java.lang.Thread.run(Thread.java:748)_
>  __
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] ifndef-SleePy commented on issue #8582: [FLINK-12681][metrics] Introduce a built-in thread-safe counter

2019-06-03 Thread GitBox

ifndef-SleePy commented on issue #8582: [FLINK-12681][metrics] Introduce a 
built-in thread-safe counter
URL: https://github.com/apache/flink/pull/8582#issuecomment-498512073
 
 
   Travis checking has been passed.
   @flinkbot attention @zentol 
   Would you mind take a look?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (FLINK-12718) allow users to specify hive-site.xml location to configure hive metastore client in HiveCatalog

2019-06-03 Thread Bowen Li (JIRA)

Bowen Li created FLINK-12718:


 Summary: allow users to specify hive-site.xml location to 
configure hive metastore client in HiveCatalog
 Key: FLINK-12718
 URL: https://issues.apache.org/jira/browse/FLINK-12718
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.9.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (FLINK-12649) Add a shim layer to support multiple versions of HMS

2019-06-03 Thread Bowen Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li closed FLINK-12649.

Resolution: Fixed

merged in 1.9.0: 038ab385c6f9af129b5eda7fe05d8b39d6122077

> Add a shim layer to support multiple versions of HMS
> 
>
> Key: FLINK-12649
> URL: https://issues.apache.org/jira/browse/FLINK-12649
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We need a shim layer of HMS client to talk to different versions of HMS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (FLINK-12310) shade user facing common libraries in flink-connector-hive

2019-06-03 Thread Bowen Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li closed FLINK-12310.

Resolution: Fixed

closed since we've changed hive jars' dependencies from "compile" to "provided"

> shade user facing common libraries in flink-connector-hive
> --
>
> Key: FLINK-12310
> URL: https://issues.apache.org/jira/browse/FLINK-12310
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As discussed with Stephan and Timo, we will need to shade user facing common 
> libraries in flink-connector-hive but can do it later in a separate PR than 
> FLINK-12238



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-12686) Disentangle StreamGraph/StreamNode and StreamGraphGenerator from StreamExecutionEnvironment

2019-06-03 Thread Biao Liu (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855282#comment-16855282
 ] 

Biao Liu commented on FLINK-12686:
--

Hi [~aljoscha], yes, there is no setting for slot sharing currently.
Let's work on the disentangling first. I think it's clear enough.

> Disentangle StreamGraph/StreamNode and StreamGraphGenerator from 
> StreamExecutionEnvironment
> ---
>
> Key: FLINK-12686
> URL: https://issues.apache.org/jira/browse/FLINK-12686
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Biao Liu
>Assignee: Biao Liu
>Priority: Major
> Fix For: 1.9.0
>
>
> This is a part of merging Blink batch runner. I would like to improvement the 
> StreamGraph to support both streaming job and batch job. There is a google 
> doc to describe this proposal, here is the 
> [link|https://docs.google.com/document/d/17X6csUWKUdVn55c47YbOTEipHDWFt_1upyTPrBD6Fjc/edit?usp=sharing].
>  This proposal is under discussion, any feedback is welcome!
> The new Table API runner directly wants to translate to {{StreamGraphs}} and 
> therefore doesn't have a {{StreamExecutionEnvironment}}, we therefore need to 
> disentangle them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (FLINK-12686) Disentangle StreamGraph/StreamNode and StreamGraphGenerator from StreamExecutionEnvironment

2019-06-03 Thread Biao Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Liu reassigned FLINK-12686:


Assignee: Biao Liu

> Disentangle StreamGraph/StreamNode and StreamGraphGenerator from 
> StreamExecutionEnvironment
> ---
>
> Key: FLINK-12686
> URL: https://issues.apache.org/jira/browse/FLINK-12686
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Biao Liu
>Assignee: Biao Liu
>Priority: Major
> Fix For: 1.9.0
>
>
> This is a part of merging Blink batch runner. I would like to improvement the 
> StreamGraph to support both streaming job and batch job. There is a google 
> doc to describe this proposal, here is the 
> [link|https://docs.google.com/document/d/17X6csUWKUdVn55c47YbOTEipHDWFt_1upyTPrBD6Fjc/edit?usp=sharing].
>  This proposal is under discussion, any feedback is welcome!
> The new Table API runner directly wants to translate to {{StreamGraphs}} and 
> therefore doesn't have a {{StreamExecutionEnvironment}}, we therefore need to 
> disentangle them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12717) Add windows support for the Python shell script

2019-06-03 Thread Dian Fu (JIRA)

Dian Fu created FLINK-12717:
---

 Summary: Add windows support for the Python shell script
 Key: FLINK-12717
 URL: https://issues.apache.org/jira/browse/FLINK-12717
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Dian Fu


We should add a windows shell script for pyflink-gateway-server.sh.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] danny0405 edited a comment on issue #8548: [FLINK-6962] [table] Add create(drop) table SQL DDL

2019-06-03 Thread GitBox

danny0405 edited a comment on issue #8548: [FLINK-6962] [table] Add 
create(drop) table SQL DDL
URL: https://github.com/apache/flink/pull/8548#issuecomment-498509086
 
 
   @twalthr 
   
   1. I agree that remove the colon between ROW type field name and type is 
more in line with the sql standard
   2. I agree that use left/right paren is more SQL compliant
   3. I agree to add comments for row type item
   4. The parameterized VARCHARs is supported as a builtin type in Calcite, and 
our test 
   already extends it.
   5. I don't think we should support type keyword like `STRING` or `BYTES`, 
same reason for SQL compliant
   6. REAL is supported by CALCTE, it's hard to say we do not support it, 
exclude the REAL type will bring a lot of extra work
   
   For other data types that are listed in FLIP-37 but not in this PR, i would 
rather fire a new JIRA issue for tracing them, after all,
   support the types Flink implement in query execution has higher priority 
than those that are not totally supported.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] danny0405 commented on issue #8548: [FLINK-6962] [table] Add create(drop) table SQL DDL

2019-06-03 Thread GitBox

danny0405 commented on issue #8548: [FLINK-6962] [table] Add create(drop) table 
SQL DDL
URL: https://github.com/apache/flink/pull/8548#issuecomment-498509086
 
 
   > @danny0405 Maybe the merging happened too quickly for this PR. It is still 
not 100% aligned with FLIP-37. We should make sure to do that before the 
release. In particular:
   > 
   > * no colon between ROW field name and type
   > * alternative syntax `ROW(fieldName fieldType)` to be SQL compliant
   > * support for comments in rows like `ROW`
   > * no tests for parameterized VARCHARs like `VARCHAR(100)`
   > * alternative syntax for `VARCHAR(INT_MAX)` as `STRING`
   > * alternative syntax for `VARBINARY(INT_MAX)` as `BYTES`
   > * REAL is not supported
   > 
   > Furthermore it would be great if the parser would already support all data 
types defined in FLIP-37 otherwise we need to check for compatibility in every 
corner of the stack again.
   > 
   > Missing are (among others): CHAR, BINARY, user-defined type identifiers, 
MULTISET
   > Please ensure consistency here.
   
   1. I agree that remove the colon between ROW type field name and type is 
more in line with the sql standard
   2. I agree that use left/right paren is more SQL compliant
   3. I agree to add comments for row type item
   4. The parameterized VARCHARs is supported as a builtin type in Calcite, and 
our test 
   already extends it.
   5. I don't think we should support type keyword like `STRING` or `BYTES`, 
same reason for SQL compliant
   6. REAL is supported by CALCTE, it's hard to say we do not support it, 
exclude the REAL type will bring a lot of extra work
   
   For other data types that are listed in FLIP-37 but not in this PR, i would 
rather fire a new JIRA issue for tracing them, after all,
   support the types Flink implement in query execution has higher priority 
than those that are not totally supported.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Created] (FLINK-12716) Add an interactive shell for Python Table API

2019-06-03 Thread Dian Fu (JIRA)

Dian Fu created FLINK-12716:
---

 Summary: Add an interactive shell for Python Table API
 Key: FLINK-12716
 URL: https://issues.apache.org/jira/browse/FLINK-12716
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Dian Fu


We should add an interactive shell for the Python Table API. It will have the 
similar functionality like the Scala Shell.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-12715) Hive-1.2.1 build is broken

2019-06-03 Thread Rui Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated FLINK-12715:
---
Component/s: Connectors / Hive

> Hive-1.2.1 build is broken
> --
>
> Key: FLINK-12715
> URL: https://issues.apache.org/jira/browse/FLINK-12715
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Major
>
> Some Hive util methods used are not compatible between Hive-2.3.4 and 
> Hive-1.2.1. Seems we have to implement these method ourselves.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] lirui-apache commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables

2019-06-03 Thread GitBox

lirui-apache commented on a change in pull request #8522: 
[FLINK-12572][hive]Implement HiveInputFormat to read Hive tables
URL: https://github.com/apache/flink/pull/8522#discussion_r290112999
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableInputFormat.java
 ##
 @@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.batch.connectors.hive;
+
+import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
+import org.apache.flink.api.common.io.statistics.BaseStatistics;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.hadoop.common.HadoopInputFormatCommonBase;
+import org.apache.flink.api.java.hadoop.mapred.wrapper.HadoopDummyReporter;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.core.io.InputSplitAssigner;
+import org.apache.flink.table.catalog.hive.util.HiveTableUtil;
+import org.apache.flink.types.Row;
+
+import org.apache.hadoop.conf.Configurable;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.serde2.Deserializer;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.objectinspector.StructField;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.InputFormat;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.JobConfigurable;
+import org.apache.hadoop.mapred.RecordReader;
+import org.apache.hadoop.security.Credentials;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Properties;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR;
+
+/**
+ * The HiveTableInputFormat are inspired by the HCatInputFormat and 
HadoopInputFormatBase.
+ * It's used to read from hive partition/non-partition table.
+ */
+public class HiveTableInputFormat extends HadoopInputFormatCommonBase
+   implements ResultTypeQueryable {
+   private static final long serialVersionUID = 6351448428766433164L;
+   private static Logger logger = 
LoggerFactory.getLogger(HiveTableInputFormat.class);
+
+   private JobConf jobConf;
+
+   protected transient Writable key;
+   protected transient Writable value;
+
+   private transient RecordReader recordReader;
+   protected transient boolean fetched = false;
+   protected transient boolean hasNext;
+
+   private Boolean isPartitioned;
+   private RowTypeInfo rowTypeInfo;
+
+   //Necessary info to init deserializer
+   private String[] partitionColNames;
+   //For non-partition hive table, partitions only contains one partition 
which partitionValues is empty.
+   private List partitions;
+   private transient Deserializer deserializer;
+   //Hive StructField list contain all related info for specific serde.
+   private transient List structFields;
+   //StructObjectInspector in hive helps us to look into the internal 
structure of a struct object.
+   private transient StructObjectInspector structObjectInspector;
+   private transient InputFormat mapredInputFormat;
+   private transient HiveTablePartition hiveTablePartition;
+
+   public HiveTableInputFormat(
+   JobConf jobConf,
+   Boolean isPartitioned,
+   String[] partitionColNames,
+   List partitions,
+   RowTypeInfo rowTypeInfo) {
+   super(jobConf.getCredentials());
+   this.rowTypeInfo = checkNotNull(rowTypeInfo, "rowTypeInfo can 
not be null.");
+

[jira] [Commented] (FLINK-12070) Make blocking result partitions consumable multiple times

2019-06-03 Thread Yingjie Cao (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855260#comment-16855260
 ] 

Yingjie Cao commented on FLINK-12070:
-

[~StephanEwen] There is no swap partition in my testing OS. The data of mmaped 
region was flushed to disk at the very begining, that is, no lazy swaping. 
However, the disk writing speed is far lower than the memory writing speeding, 
so memory will be exhausted sooner or latter so as long the data volume is 
large enough. 

As for the reason why the machine froze up, I guess it is because that flushing 
mmaped region to disk also need memory while no enough pages left.

I'd like to perform some more tests following your suggestion and will post the 
results out latter. BTW, we made Blink spill to file directly and read from 
file directly and there is no significant performance regression.

[~srichter] The max heap memory in my test setting is fairly large, but most of 
it is never used (not allocated), so it should not have much impact. Maybe 
reducing max heap memory can improve the performance, but the problem will not 
be solved.

> Make blocking result partitions consumable multiple times
> -
>
> Key: FLINK-12070
> URL: https://issues.apache.org/jira/browse/FLINK-12070
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Affects Versions: 1.9.0
>Reporter: Till Rohrmann
>Assignee: Stephan Ewen
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.9.0
>
> Attachments: image-2019-04-18-17-38-24-949.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to avoid writing produced results multiple times for multiple 
> consumers and in order to speed up batch recoveries, we should make the 
> blocking result partitions to be consumable multiple times. At the moment a 
> blocking result partition will be released once the consumers has processed 
> all data. Instead the result partition should be released once the next 
> blocking result has been produced and all consumers of a blocking result 
> partition have terminated. Moreover, blocking results should not hold on slot 
> resources like network buffers or memory as it is currently the case with 
> {{SpillableSubpartitions}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12715) Hive-1.2.1 build is broken

2019-06-03 Thread Rui Li (JIRA)

Rui Li created FLINK-12715:
--

 Summary: Hive-1.2.1 build is broken
 Key: FLINK-12715
 URL: https://issues.apache.org/jira/browse/FLINK-12715
 Project: Flink
  Issue Type: Sub-task
Reporter: Rui Li
Assignee: Rui Li


Some Hive util methods used are not compatible between Hive-2.3.4 and 
Hive-1.2.1. Seems we have to implement these method ourselves.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] zjuwangg commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables

2019-06-03 Thread GitBox

zjuwangg commented on a change in pull request #8522: 
[FLINK-12572][hive]Implement HiveInputFormat to read Hive tables
URL: https://github.com/apache/flink/pull/8522#discussion_r290110383
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableInputFormat.java
 ##
 @@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.batch.connectors.hive;
+
+import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
+import org.apache.flink.api.common.io.statistics.BaseStatistics;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.hadoop.common.HadoopInputFormatCommonBase;
+import org.apache.flink.api.java.hadoop.mapred.wrapper.HadoopDummyReporter;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.core.io.InputSplitAssigner;
+import org.apache.flink.table.catalog.hive.util.HiveTableUtil;
+import org.apache.flink.types.Row;
+
+import org.apache.hadoop.conf.Configurable;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.serde2.Deserializer;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.objectinspector.StructField;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.InputFormat;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.JobConfigurable;
+import org.apache.hadoop.mapred.RecordReader;
+import org.apache.hadoop.security.Credentials;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Properties;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR;
+
+/**
+ * The HiveTableInputFormat are inspired by the HCatInputFormat and 
HadoopInputFormatBase.
+ * It's used to read from hive partition/non-partition table.
+ */
+public class HiveTableInputFormat extends HadoopInputFormatCommonBase
+   implements ResultTypeQueryable {
+   private static final long serialVersionUID = 6351448428766433164L;
+   private static Logger logger = 
LoggerFactory.getLogger(HiveTableInputFormat.class);
+
+   private JobConf jobConf;
+
+   protected transient Writable key;
+   protected transient Writable value;
+
+   private transient RecordReader recordReader;
+   protected transient boolean fetched = false;
+   protected transient boolean hasNext;
+
+   private Boolean isPartitioned;
+   private RowTypeInfo rowTypeInfo;
+
+   //Necessary info to init deserializer
+   private String[] partitionColNames;
+   //For non-partition hive table, partitions only contains one partition 
which partitionValues is empty.
+   private List partitions;
+   private transient Deserializer deserializer;
+   //Hive StructField list contain all related info for specific serde.
+   private transient List structFields;
+   //StructObjectInspector in hive helps us to look into the internal 
structure of a struct object.
+   private transient StructObjectInspector structObjectInspector;
+   private transient InputFormat mapredInputFormat;
+   private transient HiveTablePartition hiveTablePartition;
+
+   public HiveTableInputFormat(
+   JobConf jobConf,
+   Boolean isPartitioned,
+   String[] partitionColNames,
+   List partitions,
+   RowTypeInfo rowTypeInfo) {
+   super(jobConf.getCredentials());
+   this.rowTypeInfo = checkNotNull(rowTypeInfo, "rowTypeInfo can 
not be null.");
+   this

[GitHub] [flink] zjuwangg commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables

2019-06-03 Thread GitBox

zjuwangg commented on a change in pull request #8522: 
[FLINK-12572][hive]Implement HiveInputFormat to read Hive tables
URL: https://github.com/apache/flink/pull/8522#discussion_r290110408
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableInputFormat.java
 ##
 @@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.batch.connectors.hive;
+
+import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
+import org.apache.flink.api.common.io.statistics.BaseStatistics;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.hadoop.common.HadoopInputFormatCommonBase;
+import org.apache.flink.api.java.hadoop.mapred.wrapper.HadoopDummyReporter;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.core.io.InputSplitAssigner;
+import org.apache.flink.table.catalog.hive.util.HiveTableUtil;
+import org.apache.flink.types.Row;
+
+import org.apache.hadoop.conf.Configurable;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.serde2.Deserializer;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.objectinspector.StructField;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.InputFormat;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.JobConfigurable;
+import org.apache.hadoop.mapred.RecordReader;
+import org.apache.hadoop.security.Credentials;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Properties;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR;
+
+/**
+ * The HiveTableInputFormat are inspired by the HCatInputFormat and 
HadoopInputFormatBase.
 
 Review comment:
   yes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zjuwangg commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables

2019-06-03 Thread GitBox

zjuwangg commented on a change in pull request #8522: 
[FLINK-12572][hive]Implement HiveInputFormat to read Hive tables
URL: https://github.com/apache/flink/pull/8522#discussion_r290110147
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableInputFormat.java
 ##
 @@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.batch.connectors.hive;
+
+import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
+import org.apache.flink.api.common.io.statistics.BaseStatistics;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.hadoop.common.HadoopInputFormatCommonBase;
+import org.apache.flink.api.java.hadoop.mapred.wrapper.HadoopDummyReporter;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.core.io.InputSplitAssigner;
+import org.apache.flink.table.catalog.hive.util.HiveTableUtil;
+import org.apache.flink.types.Row;
+
+import org.apache.hadoop.conf.Configurable;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.serde2.Deserializer;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.objectinspector.StructField;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.InputFormat;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.JobConfigurable;
+import org.apache.hadoop.mapred.RecordReader;
+import org.apache.hadoop.security.Credentials;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Properties;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR;
+
+/**
+ * The HiveTableInputFormat are inspired by the HCatInputFormat and 
HadoopInputFormatBase.
+ * It's used to read from hive partition/non-partition table.
+ */
+public class HiveTableInputFormat extends HadoopInputFormatCommonBase
+   implements ResultTypeQueryable {
+   private static final long serialVersionUID = 6351448428766433164L;
+   private static Logger logger = 
LoggerFactory.getLogger(HiveTableInputFormat.class);
+
+   private JobConf jobConf;
+
+   protected transient Writable key;
+   protected transient Writable value;
+
+   private transient RecordReader recordReader;
+   protected transient boolean fetched = false;
+   protected transient boolean hasNext;
+
+   private Boolean isPartitioned;
+   private RowTypeInfo rowTypeInfo;
+
+   //Necessary info to init deserializer
+   private String[] partitionColNames;
+   //For non-partition hive table, partitions only contains one partition 
which partitionValues is empty.
+   private List partitions;
+   private transient Deserializer deserializer;
+   //Hive StructField list contain all related info for specific serde.
+   private transient List structFields;
+   //StructObjectInspector in hive helps us to look into the internal 
structure of a struct object.
+   private transient StructObjectInspector structObjectInspector;
+   private transient InputFormat mapredInputFormat;
+   private transient HiveTablePartition hiveTablePartition;
+
+   public HiveTableInputFormat(
+   JobConf jobConf,
+   Boolean isPartitioned,
+   String[] partitionColNames,
+   List partitions,
+   RowTypeInfo rowTypeInfo) {
+   super(jobConf.getCredentials());
+   this.rowTypeInfo = checkNotNull(rowTypeInfo, "rowTypeInfo can 
not be null.");
+   this

[jira] [Comment Edited] (FLINK-12070) Make blocking result partitions consumable multiple times

2019-06-03 Thread zhijiang (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855252#comment-16855252
 ] 

zhijiang edited comment on FLINK-12070 at 6/4/19 2:59 AM:
--

Agree with Stephan and Stefan's opinions.

We could add the batch blocking case for micro benchmark, then it could verify 
the performance changes for every merged commit for flink-1.9.

I also think it is no need to write to mapped region as now. We actually do not 
confirm the mmap regions would be consumed immediately by downstream side after 
producer finishes, because it is up to scheduler decision and whether it has 
enough resource to schedule consumers.

It is necessary to maintain different ways for reading files. Based on my 
previous lucene index experience, it also provides three ways for reading index 
files.
 * Files.newByteChannel for simple way.
 * Java nio FileChannel way.
 * Mmap way for large files in 64bit system, and with more free physical memory 
for mmap as Stefan mentioned.

Then we could compare the behaviors for different ways and also provide more 
choices for users.


was (Author: zjwang):
Agree with Stephan and Stefan's opinions.

We could add the batch blocking case for micro benchmark, then it could verify 
the performance changes for every merged commit for flink-1.9.

I also think it is no need to write to mapped region as now. We actually do not 
confirm the mmap regions would be consumed immediately by downstream side after 
producer finishes, becuase it is up to scheduler decision and whether it has 
enough resource to schedule consumers.

It is necessary to maintain different ways for reading files. Based on my 
previous lucene index experience, it also provides three ways for reading index 
files.
 * Files.newByteChannel for simple way.
 * Java nio FileChannel way.
 * Mmap way for large files in 64bit system, and with more free physical memory 
for mmap as Stefan mentioned.

Then we could compare the behaviors for different ways and also provide more 
choices for users.

> Make blocking result partitions consumable multiple times
> -
>
> Key: FLINK-12070
> URL: https://issues.apache.org/jira/browse/FLINK-12070
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Affects Versions: 1.9.0
>Reporter: Till Rohrmann
>Assignee: Stephan Ewen
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.9.0
>
> Attachments: image-2019-04-18-17-38-24-949.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to avoid writing produced results multiple times for multiple 
> consumers and in order to speed up batch recoveries, we should make the 
> blocking result partitions to be consumable multiple times. At the moment a 
> blocking result partition will be released once the consumers has processed 
> all data. Instead the result partition should be released once the next 
> blocking result has been produced and all consumers of a blocking result 
> partition have terminated. Moreover, blocking results should not hold on slot 
> resources like network buffers or memory as it is currently the case with 
> {{SpillableSubpartitions}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] zjuwangg commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables

2019-06-03 Thread GitBox

zjuwangg commented on a change in pull request #8522: 
[FLINK-12572][hive]Implement HiveInputFormat to read Hive tables
URL: https://github.com/apache/flink/pull/8522#discussion_r290109929
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableInputFormat.java
 ##
 @@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.batch.connectors.hive;
+
+import org.apache.flink.api.common.io.LocatableInputSplitAssigner;
+import org.apache.flink.api.common.io.statistics.BaseStatistics;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.hadoop.common.HadoopInputFormatCommonBase;
+import org.apache.flink.api.java.hadoop.mapred.wrapper.HadoopDummyReporter;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.api.java.typeutils.RowTypeInfo;
+import org.apache.flink.core.io.InputSplitAssigner;
+import org.apache.flink.table.catalog.hive.util.HiveTableUtil;
+import org.apache.flink.types.Row;
+
+import org.apache.hadoop.conf.Configurable;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.StorageDescriptor;
+import org.apache.hadoop.hive.serde2.Deserializer;
+import org.apache.hadoop.hive.serde2.SerDeUtils;
+import org.apache.hadoop.hive.serde2.objectinspector.StructField;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.InputFormat;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.JobConfigurable;
+import org.apache.hadoop.mapred.RecordReader;
+import org.apache.hadoop.security.Credentials;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.apache.hadoop.util.ReflectionUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Properties;
+
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR;
+
+/**
+ * The HiveTableInputFormat are inspired by the HCatInputFormat and 
HadoopInputFormatBase.
+ * It's used to read from hive partition/non-partition table.
+ */
+public class HiveTableInputFormat extends HadoopInputFormatCommonBase
+   implements ResultTypeQueryable {
+   private static final long serialVersionUID = 6351448428766433164L;
+   private static Logger logger = 
LoggerFactory.getLogger(HiveTableInputFormat.class);
+
+   private JobConf jobConf;
+
+   protected transient Writable key;
+   protected transient Writable value;
+
+   private transient RecordReader recordReader;
+   protected transient boolean fetched = false;
+   protected transient boolean hasNext;
+
+   private Boolean isPartitioned;
+   private RowTypeInfo rowTypeInfo;
+
+   //Necessary info to init deserializer
+   private String[] partitionColNames;
+   //For non-partition hive table, partitions only contains one partition 
which partitionValues is empty.
+   private List partitions;
+   private transient Deserializer deserializer;
+   //Hive StructField list contain all related info for specific serde.
+   private transient List structFields;
+   //StructObjectInspector in hive helps us to look into the internal 
structure of a struct object.
+   private transient StructObjectInspector structObjectInspector;
+   private transient InputFormat mapredInputFormat;
+   private transient HiveTablePartition hiveTablePartition;
+
+   public HiveTableInputFormat(
+   JobConf jobConf,
+   Boolean isPartitioned,
+   String[] partitionColNames,
+   List partitions,
+   RowTypeInfo rowTypeInfo) {
+   super(jobConf.getCredentials());
+   this.rowTypeInfo = checkNotNull(rowTypeInfo, "rowTypeInfo can 
not be null.");
+   this

[jira] [Comment Edited] (FLINK-12070) Make blocking result partitions consumable multiple times

2019-06-03 Thread zhijiang (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855252#comment-16855252
 ] 

zhijiang edited comment on FLINK-12070 at 6/4/19 2:58 AM:
--

Agree with Stephan and Stefan's opinions.

We could add the batch blocking case for micro benchmark, then it could verify 
the performance changes for every merged commit for flink-1.9.

I also think it is no need to write to mapped region as now. We actually do not 
confirm the mmap regions would be consumed immediately by downstream side after 
producer finishes, becuase it is up to scheduler decision and whether it has 
enough resource to schedule consumers.

It is necessary to maintain different ways for reading files. Based on my 
previous lucene index experience, it also provides three ways for reading index 
files.
 * Files.newByteChannel for simple way.
 * Java nio FileChannel way.
 * Mmap way for large files in 64bit system, and with more free physical memory 
for mmap as Stefan mentioned.

Then we could compare the behaviors for different ways and also provide more 
choices for users.


was (Author: zjwang):
Agree with Stephan and Stefan's opinions.

We could add the batch blocking case for micro benchmark, then it could verify 
the performance changes for every merged commit for flink-1.9.

I also think it is no need to write to mapped region as now, because we 
actually do not confirm the mmap regions would be consumed immediately by 
downstream side after producer finishes, and it is up to scheduler decision and 
whether it has enough resource to schedule consumers.

It is necessary to maintain different ways for reading files. Based on my 
previous lucene index experience, it also provides three ways for reading index 
files.
 * Files.newByteChannel for simple way.
 * Java nio FileChannel way.
 * Mmap way for large files in 64bit system, and with more free physical memory 
for mmap as Stefan mentioned.

Then we could compare the behaviors for different ways and also provide more 
choices for users.

> Make blocking result partitions consumable multiple times
> -
>
> Key: FLINK-12070
> URL: https://issues.apache.org/jira/browse/FLINK-12070
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Affects Versions: 1.9.0
>Reporter: Till Rohrmann
>Assignee: Stephan Ewen
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.9.0
>
> Attachments: image-2019-04-18-17-38-24-949.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to avoid writing produced results multiple times for multiple 
> consumers and in order to speed up batch recoveries, we should make the 
> blocking result partitions to be consumable multiple times. At the moment a 
> blocking result partition will be released once the consumers has processed 
> all data. Instead the result partition should be released once the next 
> blocking result has been produced and all consumers of a blocking result 
> partition have terminated. Moreover, blocking results should not hold on slot 
> resources like network buffers or memory as it is currently the case with 
> {{SpillableSubpartitions}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-12070) Make blocking result partitions consumable multiple times

2019-06-03 Thread zhijiang (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855252#comment-16855252
 ] 

zhijiang commented on FLINK-12070:
--

Agree with Stephan and Stefan's opinions.

We could add the batch blocking case for micro benchmark, then it could verify 
the performance changes for every merged commit for flink-1.9.

I also think it is no need to write to mapped region as now, because we 
actually do not confirm the mmap regions would be consumed immediately by 
downstream side after producer finishes, and it is up to scheduler decision and 
whether it has enough resource to schedule consumers.

It is necessary to maintain different ways for reading files. Based on my 
previous lucene index experience, it also provides three ways for reading index 
files.
 * Files.newByteChannel for simple way.
 * Java nio FileChannel way.
 * Mmap way for large files in 64bit system, and with more free physical memory 
for mmap as Stefan mentioned.

Then we could compare the behaviors for different ways and also provide more 
choices for users.

> Make blocking result partitions consumable multiple times
> -
>
> Key: FLINK-12070
> URL: https://issues.apache.org/jira/browse/FLINK-12070
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Affects Versions: 1.9.0
>Reporter: Till Rohrmann
>Assignee: Stephan Ewen
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.9.0
>
> Attachments: image-2019-04-18-17-38-24-949.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to avoid writing produced results multiple times for multiple 
> consumers and in order to speed up batch recoveries, we should make the 
> blocking result partitions to be consumable multiple times. At the moment a 
> blocking result partition will be released once the consumers has processed 
> all data. Instead the result partition should be released once the next 
> blocking result has been produced and all consumers of a blocking result 
> partition have terminated. Moreover, blocking results should not hold on slot 
> resources like network buffers or memory as it is currently the case with 
> {{SpillableSubpartitions}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] lirui-apache commented on a change in pull request #8589: [FLINK-12677][hive][sql-client] Add descriptor, validator, and factory for HiveCatalog

2019-06-03 Thread GitBox

lirui-apache commented on a change in pull request #8589: 
[FLINK-12677][hive][sql-client] Add descriptor, validator, and factory for 
HiveCatalog
URL: https://github.com/apache/flink/pull/8589#discussion_r290109158
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/descriptors/HiveCatalogDescriptor.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.catalog.hive.descriptors;
+
+import org.apache.flink.table.catalog.hive.HiveCatalog;
+import org.apache.flink.table.descriptors.CatalogDescriptor;
+import org.apache.flink.table.descriptors.DescriptorProperties;
+import org.apache.flink.util.Preconditions;
+import org.apache.flink.util.StringUtils;
+
+import java.util.Map;
+
+import static 
org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator.CATALOG_PROPERTOES_HIVE_METASTORE_URIS;
+import static 
org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator.CATALOG_TYPE_VALUE_HIVE;
+
+/**
+ * Catalog descriptor for {@link HiveCatalog}.
+ */
+public class HiveCatalogDescriptor extends CatalogDescriptor {
+
+   private String hiveMetastoreUris;
+
+   // TODO : set default database
+   public HiveCatalogDescriptor(String hiveMetastoreUris) {
+   super(CATALOG_TYPE_VALUE_HIVE, 1);
+
+   
Preconditions.checkArgument(!StringUtils.isNullOrWhitespaceOnly(hiveMetastoreUris));
+   this.hiveMetastoreUris = hiveMetastoreUris;
 
 Review comment:
   I'd prefer to let user explicitly specify the URIs, especially if we allow 
multiple hive catalogs talking to different HMS instances in the yaml file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sunjincheng121 commented on issue #8550: [FLINK-12401][table] Support incremental emit under AccRetract mode for non-window streaming FlatAggregate on Table API

2019-06-03 Thread GitBox

sunjincheng121 commented on issue #8550: [FLINK-12401][table] Support 
incremental emit under AccRetract mode for non-window streaming FlatAggregate 
on Table API
URL: https://github.com/apache/flink/pull/8550#issuecomment-498498965
 
 
   HI @hequn8128, Thanks for the PR.
   Overall, I think I should think about the interface method again, due to in 
this PR there is some change(I think it's a good point) between the design doc 
of FLIP-29. Then review this PR. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (FLINK-12426) TM occasionally hang in deploying state

2019-06-03 Thread Haibo Sun (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-12426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855247#comment-16855247
 ] 

Haibo Sun commented on FLINK-12426:
---

[~QiLuo],  the pull request of FLINK-12547 has been merged into the master 
branch. If this JIRA is the same problem as it, can you close this issue as a 
duplicate?

> TM occasionally hang in deploying state
> ---
>
> Key: FLINK-12426
> URL: https://issues.apache.org/jira/browse/FLINK-12426
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Qi
>Priority: Major
>
> Hi all,
>   
>  We use Flink batch and start thousands of jobs per day. Occasionally we 
> observed some stuck jobs, due to some TM hang in “DEPLOYING” state. 
>   
>  It seems that the TM is calling BlobClient to download jars from 
> JM/BlobServer. Under hood it’s calling Socket.connect() and then 
> Socket.read() to retrieve results. 
>   
>  These jobs usually have many TM slots (1~2k). We checked the TM log and 
> dumped the TM thread. It indeed hung on socket read to download jar from Blob 
> server. 
>   
>  We're using Flink 1.5 but this may also affect later versions since related 
> code are not changed much. We've tried to add socket timeout in BlobClient, 
> but still no luck.
>   
>  
>  TM log
>  
>  ...
>  INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Received task 
> DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) 
> (184/2000).
> INFO org.apache.flink.runtime.taskmanager.Task - DataSource (at 
> createInput(ExecutionEnvironment.java:548) (our.code)) (184/2000) switched 
> from CREATED to DEPLOYING.
> INFO org.apache.flink.runtime.taskmanager.Task - Creating FileSystem stream 
> leak safety net for task DataSource (at 
> createInput(ExecutionEnvironment.java:548) (our.code)) (184/2000) [DEPLOYING]
> INFO org.apache.flink.runtime.taskmanager.Task - Loading JAR files for task 
> DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) 
> (184/2000) [DEPLOYING].
> INFO org.apache.flink.runtime.blob.BlobClient - Downloading 
> 19e65c0caa41f264f9ffe4ca2a48a434/p-3ecd6341bf97d5512b14c93f6c9f51f682b6db26-37d5e69d156ee00a924c1ebff0c0d280
>  from some-host-ip-port
> {color:#22}no more logs...{color}
>   
>  
>  TM thread dump:
>  
>  _"DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) 
> (1999/2000)" #72 prio=5 os_prio=0 tid=0x7fb9a1521000 nid=0xa0994 runnable 
> [0x7fb97cfbf000]_
>     _java.lang.Thread.State: RUNNABLE_
>          _at java.net.SocketInputStream.socketRead0(Native Method)_
>          _at 
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)_
>          _at java.net.SocketInputStream.read(SocketInputStream.java:171)_
>          _at java.net.SocketInputStream.read(SocketInputStream.java:141)_
>          _at 
> org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:152)_
>          _at 
> org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:140)_
>          _at 
> org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:170)_
>          _at 
> org.apache.flink.runtime.blob.AbstractBlobCache.getFileInternal(AbstractBlobCache.java:181)_
>          _at 
> org.apache.flink.runtime.blob.PermanentBlobCache.getFile(PermanentBlobCache.java:206)_
>          _at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:120)_
>          _- locked <0x00078ab60ba8> (a java.lang.Object)_
>          _at 
> org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:893)_
>          _at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585)_
>          _at java.lang.Thread.run(Thread.java:748)_
>  __
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (FLINK-12547) Deadlock when the task thread downloads jars using BlobClient

2019-06-03 Thread Haibo Sun (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Sun closed FLINK-12547.
-

> Deadlock when the task thread downloads jars using BlobClient
> -
>
> Key: FLINK-12547
> URL: https://issues.apache.org/jira/browse/FLINK-12547
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.8.0
>Reporter: Haibo Sun
>Assignee: Haibo Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0, 1.8.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The jstack is as follows (this jstack is from an old Flink version, but the 
> master branch has the same problem).
> {code:java}
> "Source: Custom Source (76/400)" #68 prio=5 os_prio=0 tid=0x7f8139cd3000 
> nid=0xe2 runnable [0x7f80da5fd000]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at 
> org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:152)
> at 
> org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:140)
> at 
> org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:164)
> at 
> org.apache.flink.runtime.blob.AbstractBlobCache.getFileInternal(AbstractBlobCache.java:181)
> at 
> org.apache.flink.runtime.blob.PermanentBlobCache.getFile(PermanentBlobCache.java:206)
> at 
> org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:120)
> - locked <0x00062cf2a188> (a java.lang.Object)
> at 
> org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:968)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:604)
> at java.lang.Thread.run(Thread.java:834)
> Locked ownable synchronizers:
> - None
> {code}
>  
> The reason is that SO_TIMEOUT is not set in the socket connection of the blob 
> client. When the network packet loss seriously due to the high CPU load of 
> the machine, the blob client connection fails to perceive that the server has 
> been disconnected, which results in blocking in the native method. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] lirui-apache commented on issue #8536: [FLINK-12568][hive] Implement OutputFormat to write Hive tables

2019-06-03 Thread GitBox

lirui-apache commented on issue #8536: [FLINK-12568][hive] Implement 
OutputFormat to write Hive tables
URL: https://github.com/apache/flink/pull/8536#issuecomment-498497073
 
 
   > moreover, please have correct commit message in the future. E.g. the first 
commit's msg should be "[FLINK-12568][hive] Implement OutputFormat to write 
Hive tables"
   
   Got it. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #8604: [DOC] add ')' for the doc checkpoints.md

2019-06-03 Thread GitBox

flinkbot commented on issue #8604: [DOC] add  ')'  for  the doc checkpoints.md
URL: https://github.com/apache/flink/pull/8604#issuecomment-498496138
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] hehuiyuan opened a new pull request #8604: [DOC] add ')' for the doc checkpoints.md

2019-06-03 Thread GitBox

hehuiyuan opened a new pull request #8604: [DOC] add  ')'  for  the doc 
checkpoints.md
URL: https://github.com/apache/flink/pull/8604
 
 
   For `Configure for per job when constructing the state backend`, add ')'


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (FLINK-10455) Potential Kafka producer leak in case of failures

2019-06-03 Thread sunjincheng (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855241#comment-16855241
 ] 

sunjincheng commented on FLINK-10455:
-

[~becket_qin] is there any update from your side?

> Potential Kafka producer leak in case of failures
> -
>
> Key: FLINK-10455
> URL: https://issues.apache.org/jira/browse/FLINK-10455
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.5.2
>Reporter: Nico Kruber
>Assignee: Jiangjie Qin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.5.6, 1.7.0, 1.7.3, 1.9.0, 1.8.1
>
>
> If the Kafka brokers' timeout is too low for our checkpoint interval [1], we 
> may get an {{ProducerFencedException}}. Documentation around 
> {{ProducerFencedException}} explicitly states that we should close the 
> producer after encountering it.
> By looking at the code, it doesn't seem like this is actually done in 
> {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in 
> {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an 
> exception, we don't clean up (nor try to commit) any other transaction.
> -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} 
> simply iterates over the {{pendingCommitTransactions}} which is not touched 
> during {{close()}}
> Now if we restart the failing job on the same Flink cluster, any resources 
> from the previous attempt will still linger around.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] sunjincheng121 commented on issue #8550: [FLINK-12401][table] Support incremental emit under AccRetract mode for non-window streaming FlatAggregate on Table API

2019-06-03 Thread GitBox

sunjincheng121 commented on issue #8550: [FLINK-12401][table] Support 
incremental emit under AccRetract mode for non-window streaming FlatAggregate 
on Table API
URL: https://github.com/apache/flink/pull/8550#issuecomment-498492312
 
 
   @flinkbot approve description


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #8603: [FLINK-12608][runtime] Add getVertexOrThrow and

2019-06-03 Thread GitBox

flinkbot commented on issue #8603: [FLINK-12608][runtime] Add getVertexOrThrow 
and
URL: https://github.com/apache/flink/pull/8603#issuecomment-498492126
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] eaglewatcherwb opened a new pull request #8603: [FLINK-12608][runtime] Add getVertexOrThrow and

2019-06-03 Thread GitBox

eaglewatcherwb opened a new pull request #8603: [FLINK-12608][runtime] Add 
getVertexOrThrow and
URL: https://github.com/apache/flink/pull/8603
 
 
   getResultPartitionOrThrow in SchedulingTopology to simplify code
   structure
   
   Change-Id: I33be923c3279755d360e95336199fb1bc3257f4f
   
   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
 - *Add getVertexOrThrow and and getResultPartitionOrThrow in 
SchedulingTopology.*
   
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
 - *Added unit tests in ExecutionGraphToSchedulingTopologyAdapterTest. *
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not documented)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (FLINK-12608) Add getVertex/ResultPartitionOrThrow(ExecutionVertexID/IntermediateResultPartitionID) to SchedulingTopology

2019-06-03 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12608:
---
Labels: pull-request-available  (was: )

> Add 
> getVertex/ResultPartitionOrThrow(ExecutionVertexID/IntermediateResultPartitionID)
>  to SchedulingTopology
> ---
>
> Key: FLINK-12608
> URL: https://issues.apache.org/jira/browse/FLINK-12608
> Project: Flink
>  Issue Type: Sub-task
>Reporter: BoWang
>Assignee: BoWang
>Priority: Minor
>  Labels: pull-request-available
>
> As discussed in 
> [PR#8309|https://github.com/apache/flink/pull/8309#discussion_r287190944], 
> need to add getVertexOrThrow getResultPartitionOrThrow in SchedulingTopology.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-12619) Support TERMINATE/SUSPEND Job with Checkpoint

2019-06-03 Thread vinoyang (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855236#comment-16855236
 ] 

vinoyang commented on FLINK-12619:
--

Hi [~klion26] and [~aljoscha] I think this issue needs to be considered 
together with FLINK-6755 which has tried to introduce trigger checkpoint by CLI 
a long time ago. I think we can move the further discussion to that issue. 

> Support TERMINATE/SUSPEND Job with Checkpoint
> -
>
> Key: FLINK-12619
> URL: https://issues.apache.org/jira/browse/FLINK-12619
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> Inspired by the idea of FLINK-11458, we propose to support terminate/suspend 
> a job with checkpoint. This improvement cooperates with incremental and 
> external checkpoint features, that if checkpoint is retained and this feature 
> is configured, we will trigger a checkpoint before the job stops. It could 
> accelarate job recovery a lot since:
> 1. No source rewinding required any more.
> 2. It's much faster than taking a savepoint since incremental checkpoint is 
> enabled.
> Please note that conceptually savepoints is different from checkpoint in a 
> similar way that backups are different from recovery logs in traditional 
> database systems. So we suggest using this feature only for job recovery, 
> while stick with FLINK-11458 for the 
> upgrading/cross-cluster-job-migration/state-backend-switch cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-1722) Streaming not respecting FinalizeOnMaster for output formats

2019-06-03 Thread Haibo Sun (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855222#comment-16855222
 ] 

Haibo Sun commented on FLINK-1722:
--

 OK, I will.

> Streaming not respecting FinalizeOnMaster for output formats
> 
>
> Key: FLINK-1722
> URL: https://issues.apache.org/jira/browse/FLINK-1722
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Reporter: Robert Metzger
>Priority: Major
>
> The Hadoop output formats execute a process in the end to move the produced 
> files from a temp directory to the final location.
> The batch API is annotating output formats that execute something in the end 
> with the {{FinalizeOnMaster}} interface.
> The streaming component is not respecting this interface. Hence, 
> HadoopOutputFormats aren't writing their final data into the desired 
> destination.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (FLINK-1722) Streaming not respecting FinalizeOnMaster for output formats

2019-06-03 Thread Haibo Sun (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Sun reassigned FLINK-1722:


Assignee: Haibo Sun

> Streaming not respecting FinalizeOnMaster for output formats
> 
>
> Key: FLINK-1722
> URL: https://issues.apache.org/jira/browse/FLINK-1722
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Reporter: Robert Metzger
>Assignee: Haibo Sun
>Priority: Major
>
> The Hadoop output formats execute a process in the end to move the produced 
> files from a temp directory to the final location.
> The batch API is annotating output formats that execute something in the end 
> with the {{FinalizeOnMaster}} interface.
> The streaming component is not respecting this interface. Hence, 
> HadoopOutputFormats aren't writing their final data into the desired 
> destination.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] Armstrongya commented on issue #8597: [FLINK-12438][doc-zh]Translate Task Lifecycle into Chinese

2019-06-03 Thread GitBox

Armstrongya commented on issue #8597: [FLINK-12438][doc-zh]Translate Task 
Lifecycle into Chinese
URL: https://github.com/apache/flink/pull/8597#issuecomment-498487430
 
 
   Hi @klion26 ，I'm confused with this Travis CI build check. I have already 
rebase and change my commit history, it looks like one line as follow picture. 
But why it's still check failed.
   
![image](https://user-images.githubusercontent.com/5070956/58845324-49f88a00-86ad-11e9-8ed0-4a65cdd67a6c.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (FLINK-12714) confusion about flink time window TimeWindow#getWindowStartWithOffset

2019-06-03 Thread Wenshuai Hou (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenshuai Hou updated FLINK-12714:
-
Description: 
 

hi flink team, i think the flink doc on how time windows are created and the 
javadoc of related method is a little confusing. They give me the impression 
that the window-start equals to the timestamp of the first event assigned to 
that window, however the window-start is actually quantized by the windowSize, 
see links below. Is this the intention or is this a mistake? If it's intended, 
can we please leave a comment in flink doc somewhere to make this more clear? I 
spent 5 hours wondering why my flink tests fail until i find that method. 

 

Thanks 

Wen 

 

links: 



 

The flink doc from here : 
[https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#window-lifecycle]
 

> In a nutshell, a window is *created* as soon as the first element that should 
> belong to this window arrives

 

the java doc of this method:  
org.apache.flink.streaming.api.windowing.windows.TimeWindow#getWindowStartWithOffset

 

 
{code:java}
/**
 * Method to get the window start for a timestamp.
 *
 * @param timestamp epoch millisecond to get the window start.
 * @param offset The offset which window start would be shifted by.
 * @param windowSize The size of the generated windows.
 * @return window start
 */
public static long getWindowStartWithOffset(long timestamp, long offset, long 
windowSize) {

{code}
 

 

 

 

  was:
 

hi flink team, i think the flink doc on how time windows are created an the 
javadoc of related method is a little confusing. They give me the impression 
that the window-start equals to the timestamp of the first event assigned to 
that window, however the window-start is actually quantized by the windowSize, 
see links below. Is this the intention or is this a mistake? can we please 
leave a comment in flink doc somewhere to make this more clear? I spent 5 hours 
wondering why my flink tests fail until i find that method. 

 

Thanks 

Wen 

 

links: 



 

The flink doc from here : 
[https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#window-lifecycle]
 

> In a nutshell, a window is *created* as soon as the first element that should 
> belong to this window arrives

 

the java doc of this method:  
org.apache.flink.streaming.api.windowing.windows.TimeWindow#getWindowStartWithOffset

 

 
{code:java}
/**
 * Method to get the window start for a timestamp.
 *
 * @param timestamp epoch millisecond to get the window start.
 * @param offset The offset which window start would be shifted by.
 * @param windowSize The size of the generated windows.
 * @return window start
 */
public static long getWindowStartWithOffset(long timestamp, long offset, long 
windowSize) {

{code}
 

 

 

 


> confusion about flink time window TimeWindow#getWindowStartWithOffset
> -
>
> Key: FLINK-12714
> URL: https://issues.apache.org/jira/browse/FLINK-12714
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib
>Affects Versions: 1.8.0
>Reporter: Wenshuai Hou
>Priority: Minor
>
>  
> hi flink team, i think the flink doc on how time windows are created and the 
> javadoc of related method is a little confusing. They give me the impression 
> that the window-start equals to the timestamp of the first event assigned to 
> that window, however the window-start is actually quantized by the 
> windowSize, see links below. Is this the intention or is this a mistake? If 
> it's intended, can we please leave a comment in flink doc somewhere to make 
> this more clear? I spent 5 hours wondering why my flink tests fail until i 
> find that method. 
>  
> Thanks 
> Wen 
>  
> links: 
> 
>  
> The flink doc from here : 
> [https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#window-lifecycle]
>  
> > In a nutshell, a window is *created* as soon as the first element that 
> > should belong to this window arrives
>  
> the java doc of this method:  
> org.apache.flink.streaming.api.windowing.windows.TimeWindow#getWindowStartWithOffset
>  
>  
> {code:java}
> /**
>  * Method to get the window start for a timestamp.
>  *
>  * @param timestamp epoch millisecond to get the window start.
>  * @param offset The offset which window start would be shifted by.
>  * @param windowSize The size of the generated windows.
>  * @return window start
>  */
> public static long getWindowStartWithOffset(long timestamp, long offset, long 
> windowSize) {
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (FLINK-12712) deprecate ExternalCatalog and its subclasses and impls

2019-06-03 Thread Bowen Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li closed FLINK-12712.

Resolution: Fixed

merged in 1.9.0: 7682248f6dca971eae1d076dbf0282358331a0e7

> deprecate ExternalCatalog and its subclasses and impls
> --
>
> Key: FLINK-12712
> URL: https://issues.apache.org/jira/browse/FLINK-12712
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] asfgit closed pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog

2019-06-03 Thread GitBox

asfgit closed pull request #8601: [FLINK-12713][table] deprecate descriptor, 
validator, and factory of ExternalCatalog
URL: https://github.com/apache/flink/pull/8601
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Closed] (FLINK-12713) deprecate descriptor, validator, and factory of ExternalCatalog

2019-06-03 Thread Bowen Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li closed FLINK-12713.

Resolution: Fixed

merged in 1.9.0: 61d8916a35470d6c122d9b78c1f3ae7aa9996949

> deprecate descriptor, validator, and factory of ExternalCatalog
> ---
>
> Key: FLINK-12713
> URL: https://issues.apache.org/jira/browse/FLINK-12713
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Client
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12714) confusion about flink time window TimeWindow#getWindowStartWithOffset

2019-06-03 Thread Wenshuai Hou (JIRA)

Wenshuai Hou created FLINK-12714:


 Summary: confusion about flink time window 
TimeWindow#getWindowStartWithOffset
 Key: FLINK-12714
 URL: https://issues.apache.org/jira/browse/FLINK-12714
 Project: Flink
  Issue Type: Improvement
  Components: flink-contrib
Affects Versions: 1.8.0
Reporter: Wenshuai Hou


 

hi flink team, i think the flink doc on how time windows are created an the 
javadoc of related method is a little confusing. They give me the impression 
that the window-start equals to the timestamp of the first event assigned to 
that window, however the window-start is actually quantized by the windowSize, 
see links below. Is this the intention or is this a mistake? can we please 
leave a comment in flink doc somewhere to make this more clear? I spent 5 hours 
wondering why my flink tests fail until i find that method. 

 

Thanks 

Wen 

 

links: 



 

The flink doc from here : 
[https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#window-lifecycle]
 

> In a nutshell, a window is *created* as soon as the first element that should 
> belong to this window arrives

 

the java doc of this method:  
org.apache.flink.streaming.api.windowing.windows.TimeWindow#getWindowStartWithOffset

 

 
{code:java}
/**
 * Method to get the window start for a timestamp.
 *
 * @param timestamp epoch millisecond to get the window start.
 * @param offset The offset which window start would be shifted by.
 * @param windowSize The size of the generated windows.
 * @return window start
 */
public static long getWindowStartWithOffset(long timestamp, long offset, long 
windowSize) {

{code}
 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] asfgit closed pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, and related util classes

2019-06-03 Thread GitBox

asfgit closed pull request #8600: [FLINK-12712][table] deprecate 
ExternalCatalog and its subclasses, implementations, and related util classes
URL: https://github.com/apache/flink/pull/8600
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on issue #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests

2019-06-03 Thread GitBox

bowenli86 commented on issue #8600: [FLINK-12712][table] deprecate 
ExternalCatalog and its subclasses, implementations, related util classes and 
tests
URL: https://github.com/apache/flink/pull/8600#issuecomment-498459861
 
 
   @xuefuz thanks for your review. Merging


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on issue #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog

2019-06-03 Thread GitBox

bowenli86 commented on issue #8601: [FLINK-12713][table] deprecate descriptor, 
validator, and factory of ExternalCatalog
URL: https://github.com/apache/flink/pull/8601#issuecomment-498459824
 
 
   @xuefuz thanks for your review. Addressed all comments. Merging


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables

2019-06-03 Thread GitBox

xuefuz commented on a change in pull request #8549: 
[FLINK-12604][table-api][table-planner] Register TableSource/Sink as 
CatalogTables
URL: https://github.com/apache/flink/pull/8549#discussion_r290072835
 
 

 ##
 File path: 
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/table/operations/DataStreamTableOperation.java
 ##
 @@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.operations;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.table.api.TableSchema;
+
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Describes a relational operation that reads from a {@link DataStream}.
+ *
+ * This operation may expose only part, or change the order of the fields 
available in a
+ * {@link org.apache.flink.api.common.typeutils.CompositeType} of the 
underlying {@link DataStream}.
+ * The {@link DataStreamTableOperation#getFieldIndices()} describes the 
mapping between fields of the
+ * {@link TableSchema} to the {@link 
org.apache.flink.api.common.typeutils.CompositeType}.
+ */
+@Internal
+public class DataStreamTableOperation extends TableOperation {
 
 Review comment:
   javadoc for ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables

2019-06-03 Thread GitBox

xuefuz commented on a change in pull request #8549: 
[FLINK-12604][table-api][table-planner] Register TableSource/Sink as 
CatalogTables
URL: https://github.com/apache/flink/pull/8549#discussion_r290072478
 
 

 ##
 File path: 
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/table/operations/DataSetTableOperation.java
 ##
 @@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.operations;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.table.api.TableSchema;
+
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Describes a relational operation that reads from a {@link DataSet}.
+ *
+ * This operation may expose only part, or change the order of the fields 
available in a
+ * {@link org.apache.flink.api.common.typeutils.CompositeType} of the 
underlying {@link DataSet}.
+ * The {@link DataSetTableOperation#getFieldIndices()} describes the mapping 
between fields of the
+ * {@link TableSchema} to the {@link 
org.apache.flink.api.common.typeutils.CompositeType}.
+ */
+@Internal
+public class DataSetTableOperation extends TableOperation {
+
+   private final DataSet dataStream;
 
 Review comment:
   Is there a special reason that we name the variable as dataStream instead of 
dataSet?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables

2019-06-03 Thread GitBox

xuefuz commented on a change in pull request #8549: 
[FLINK-12604][table-api][table-planner] Register TableSource/Sink as 
CatalogTables
URL: https://github.com/apache/flink/pull/8549#discussion_r290072031
 
 

 ##
 File path: 
flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/table/operations/DataSetTableOperation.java
 ##
 @@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.operations;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.table.api.TableSchema;
+
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Describes a relational operation that reads from a {@link DataSet}.
+ *
+ * This operation may expose only part, or change the order of the fields 
available in a
+ * {@link org.apache.flink.api.common.typeutils.CompositeType} of the 
underlying {@link DataSet}.
+ * The {@link DataSetTableOperation#getFieldIndices()} describes the mapping 
between fields of the
+ * {@link TableSchema} to the {@link 
org.apache.flink.api.common.typeutils.CompositeType}.
+ */
+@Internal
+public class DataSetTableOperation extends TableOperation {
 
 Review comment:
   javadoc for ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog

2019-06-03 Thread GitBox

xuefuz commented on a change in pull request #8601: [FLINK-12713][table] 
deprecate descriptor, validator, and factory of ExternalCatalog
URL: https://github.com/apache/flink/pull/8601#discussion_r290068591
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/descriptors/ExternalCatalogDescriptorValidator.java
 ##
 @@ -18,12 +18,12 @@
 
 package org.apache.flink.table.descriptors;
 
-import org.apache.flink.annotation.Internal;
-
 /**
  * Validator for {@link ExternalCatalogDescriptor}.
+ *
+ * @deprecated use {@link CatalogDescriptorValidator} instead.
  */
-@Internal
+@Deprecated
 
 Review comment:
   Could we keep @Internal?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (FLINK-12626) Client should not register table-source-sink twice in TableEnvironment

2019-06-03 Thread Bowen Li (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855095#comment-16855095
 ] 

Bowen Li commented on FLINK-12626:
--

[~twalthr] by taking a look at the 
{{TableEnvImpl.registerTableSourceInternal()}}, I think it should solve this 
JIRA. I will do a followup testing and add unit tests after FLINK-12604 is 
merged.

> Client should not register table-source-sink twice in TableEnvironment
> --
>
> Key: FLINK-12626
> URL: https://issues.apache.org/jira/browse/FLINK-12626
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Client
>Reporter: Bowen Li
>Priority: Major
> Fix For: 1.9.0
>
>
> Currently for a table specified in SQL CLI yaml file, if it's with type: 
> source-sink-table, it will be registered twice (as source and as sink) to 
> TableEnv.
> As we've moved table management in TableEnv to Catalogs which doesn't allow 
> registering dup named table, we need to come up with a solution to fix this 
> problem.
> cc [~xuefuz] [~tiwalter] [~dawidwys]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] bowenli86 commented on issue #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog

2019-06-03 Thread GitBox

bowenli86 commented on issue #8601: [FLINK-12713][table] deprecate descriptor, 
validator, and factory of ExternalCatalog
URL: https://github.com/apache/flink/pull/8601#issuecomment-498431690
 
 
   @xuefuz as we discussed offline, I removed annotations from tests and leave 
them for production code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on issue #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests

2019-06-03 Thread GitBox

bowenli86 commented on issue #8600: [FLINK-12712][table] deprecate 
ExternalCatalog and its subclasses, implementations, related util classes and 
tests
URL: https://github.com/apache/flink/pull/8600#issuecomment-498431177
 
 
   @xuefuz as we discussed offline, I removed annotations from tests and leave 
them for production code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #8602: [FLINK-12313] Add workaround to avoid race condition in SynchronousCheckpointITCase test

2019-06-03 Thread GitBox

flinkbot commented on issue #8602: [FLINK-12313] Add workaround to avoid race 
condition in SynchronousCheckpointITCase test
URL: https://github.com/apache/flink/pull/8602#issuecomment-498423773
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of 
the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Updated] (FLINK-12313) SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints is unstable

2019-06-03 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-12313:
---
Labels: pull-request-available test-stability  (was: test-stability)

> SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints
>  is unstable
> ---
>
> Key: FLINK-12313
> URL: https://issues.apache.org/jira/browse/FLINK-12313
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Reporter: Ufuk Celebi
>Assignee: Alex
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> {{SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints}}
>  fails and prints the Thread stack traces due to no output on Travis 
> occasionally.
> {code}
> ==
> Printing stack trace of Java process 10071
> ==
> 2019-04-24 07:55:29
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.151-b12 mixed mode):
> "Attach Listener" #17 daemon prio=9 os_prio=0 tid=0x7f294892 
> nid=0x2cf5 waiting on condition [0x]
>java.lang.Thread.State: RUNNABLE
> "Async calls on Test Task (1/1)" #15 daemon prio=5 os_prio=0 
> tid=0x7f2948dd1800 nid=0x27a9 waiting on condition [0x7f292cea9000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x8bb5e558> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> "Async calls on Test Task (1/1)" #14 daemon prio=5 os_prio=0 
> tid=0x7f2948dce800 nid=0x27a8 in Object.wait() [0x7f292cfaa000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x8bac58f8> (a java.lang.Object)
>   at java.lang.Object.wait(Object.java:502)
>   at 
> org.apache.flink.streaming.runtime.tasks.SynchronousSavepointLatch.blockUntilCheckpointIsAcknowledged(SynchronousSavepointLatch.java:66)
>   - locked <0x8bac58f8> (a java.lang.Object)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:726)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:604)
>   at 
> org.apache.flink.streaming.runtime.tasks.SynchronousCheckpointITCase$SynchronousCheckpointTestingTask.triggerCheckpoint(SynchronousCheckpointITCase.java:174)
>   at org.apache.flink.runtime.taskmanager.Task$1.run(Task.java:1182)
>   at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> "CloseableReaperThread" #13 daemon prio=5 os_prio=0 tid=0x7f2948d9b800 
> nid=0x27a7 in Object.wait() [0x7f292d0ab000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x8bbe3990> (a java.lang.ref.ReferenceQueue$Lock)
>   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
>   - locked <0x8bbe3990> (a java.lang.ref.ReferenceQueue$Lock)
>   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
>   at 
> org.apache.flink.core.fs.SafetyNetCloseableRegistry$CloseableReaperThread.run(SafetyNetCloseableRegistry.java:193)
> "Test Task (1/1)" #12 prio=5 os_prio=0 tid=0x7f2948d97000 nid=0x27a6 in 
> Object.wait() [0x7f292d1ac000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x8e63f7d8> (a java.lang.Object)
>   at java.lang.Object.wait(Object.java:502)
>   at 
> org.apache.flink.core.testutils.OneShotLatch.await(OneShotLatch.java:63)
>   - locked <0x8e63f7d8> (a java.lang.Object)
>   at 
> org.

[GitHub] [flink] 1u0 opened a new pull request #8602: [FLINK-12313] Add workaround to avoid race condition in SynchronousCheckpointITCase test

2019-06-03 Thread GitBox

1u0 opened a new pull request #8602: [FLINK-12313] Add workaround to avoid race 
condition in SynchronousCheckpointITCase test
URL: https://github.com/apache/flink/pull/8602
 
 
   ## What is the purpose of the change
   
   The `SynchronousCheckpointITCase` has a race condition and with some chance 
may fail on random tests on CI. This PR adds an additional synchronization 
point as a workaround to avoid the issue.
   
   Additionally, the PR simplifies (refactors) the test, although the race 
condition is present in both refactored and non-refactored versions.
   
   ## Brief change log
   
 - The test is simplified to not use dedicated locks for general flow 
checks. The test run set to limited by a timeout.
 - Added a hacky synchronization point in `advanceToEndOfEventTime` method, 
to make sure that `triggerCheckpoint` thread has progressed far enough, before 
triggering `notifyCheckpointComplete`.
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Assigned] (FLINK-12313) SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints is unstable

2019-06-03 Thread Alex (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex reassigned FLINK-12313:


Assignee: Alex

> SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints
>  is unstable
> ---
>
> Key: FLINK-12313
> URL: https://issues.apache.org/jira/browse/FLINK-12313
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Reporter: Ufuk Celebi
>Assignee: Alex
>Priority: Critical
>  Labels: test-stability
>
> {{SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints}}
>  fails and prints the Thread stack traces due to no output on Travis 
> occasionally.
> {code}
> ==
> Printing stack trace of Java process 10071
> ==
> 2019-04-24 07:55:29
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.151-b12 mixed mode):
> "Attach Listener" #17 daemon prio=9 os_prio=0 tid=0x7f294892 
> nid=0x2cf5 waiting on condition [0x]
>java.lang.Thread.State: RUNNABLE
> "Async calls on Test Task (1/1)" #15 daemon prio=5 os_prio=0 
> tid=0x7f2948dd1800 nid=0x27a9 waiting on condition [0x7f292cea9000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x8bb5e558> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>   at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> "Async calls on Test Task (1/1)" #14 daemon prio=5 os_prio=0 
> tid=0x7f2948dce800 nid=0x27a8 in Object.wait() [0x7f292cfaa000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x8bac58f8> (a java.lang.Object)
>   at java.lang.Object.wait(Object.java:502)
>   at 
> org.apache.flink.streaming.runtime.tasks.SynchronousSavepointLatch.blockUntilCheckpointIsAcknowledged(SynchronousSavepointLatch.java:66)
>   - locked <0x8bac58f8> (a java.lang.Object)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:726)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:604)
>   at 
> org.apache.flink.streaming.runtime.tasks.SynchronousCheckpointITCase$SynchronousCheckpointTestingTask.triggerCheckpoint(SynchronousCheckpointITCase.java:174)
>   at org.apache.flink.runtime.taskmanager.Task$1.run(Task.java:1182)
>   at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> "CloseableReaperThread" #13 daemon prio=5 os_prio=0 tid=0x7f2948d9b800 
> nid=0x27a7 in Object.wait() [0x7f292d0ab000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x8bbe3990> (a java.lang.ref.ReferenceQueue$Lock)
>   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
>   - locked <0x8bbe3990> (a java.lang.ref.ReferenceQueue$Lock)
>   at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
>   at 
> org.apache.flink.core.fs.SafetyNetCloseableRegistry$CloseableReaperThread.run(SafetyNetCloseableRegistry.java:193)
> "Test Task (1/1)" #12 prio=5 os_prio=0 tid=0x7f2948d97000 nid=0x27a6 in 
> Object.wait() [0x7f292d1ac000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x8e63f7d8> (a java.lang.Object)
>   at java.lang.Object.wait(Object.java:502)
>   at 
> org.apache.flink.core.testutils.OneShotLatch.await(OneShotLatch.java:63)
>   - locked <0x8e63f7d8> (a java.lang.Object)
>   at 
> org.apache.flink.streaming.runtime.tasks.SynchronousCheckpointITCase$SynchronousCheckpointTesti

[GitHub] [flink] asfgit closed pull request #8536: [FLINK-12568][hive] Implement OutputFormat to write Hive tables

2019-06-03 Thread GitBox

asfgit closed pull request #8536: [FLINK-12568][hive] Implement OutputFormat to 
write Hive tables
URL: https://github.com/apache/flink/pull/8536
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on issue #8536: [FLINK-12568][hive] Implement OutputFormat to write Hive tables

2019-06-03 Thread GitBox

bowenli86 commented on issue #8536: [FLINK-12568][hive] Implement OutputFormat 
to write Hive tables
URL: https://github.com/apache/flink/pull/8536#issuecomment-498416544
 
 
   moreover, please have correct commit message in the future. E.g. the first 
commit's msg should be "[FLINK-12568][hive] Implement OutputFormat to write 
Hive tables"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[jira] [Commented] (FLINK-12070) Make blocking result partitions consumable multiple times

2019-06-03 Thread Stefan Richter (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855013#comment-16855013
 ] 

Stefan Richter commented on FLINK-12070:


I have one more additional question: can you share the heap size of your JVM 
and also the ratio of JVM heap to total physical memory? Possible that the new 
implementation could perform better with different memory settings, i.e. only 
the required size for JVM heap and keeping more physical memory available for 
mmap - please note that mmap does not account for heap size, but for process 
memory.

> Make blocking result partitions consumable multiple times
> -
>
> Key: FLINK-12070
> URL: https://issues.apache.org/jira/browse/FLINK-12070
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Affects Versions: 1.9.0
>Reporter: Till Rohrmann
>Assignee: Stephan Ewen
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.9.0
>
> Attachments: image-2019-04-18-17-38-24-949.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to avoid writing produced results multiple times for multiple 
> consumers and in order to speed up batch recoveries, we should make the 
> blocking result partitions to be consumable multiple times. At the moment a 
> blocking result partition will be released once the consumers has processed 
> all data. Instead the result partition should be released once the next 
> blocking result has been produced and all consumers of a blocking result 
> partition have terminated. Moreover, blocking results should not hold on slot 
> resources like network buffers or memory as it is currently the case with 
> {{SpillableSubpartitions}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] [flink] bowenli86 commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables

2019-06-03 Thread GitBox

bowenli86 commented on a change in pull request #8549: 
[FLINK-12604][table-api][table-planner] Register TableSource/Sink as 
CatalogTables
URL: https://github.com/apache/flink/pull/8549#discussion_r289974528
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/ConnectorCatalogTable.java
 ##
 @@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.catalog;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.table.sinks.TableSink;
+import org.apache.flink.table.sources.DefinedProctimeAttribute;
+import org.apache.flink.table.sources.DefinedRowtimeAttributes;
+import org.apache.flink.table.sources.RowtimeAttributeDescriptor;
+import org.apache.flink.table.sources.TableSource;
+import org.apache.flink.table.typeutils.TimeIndicatorTypeInfo;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+
+/**
+ * A {@link CatalogTable} that wraps a {@link TableSource} and/or {@link 
TableSink}.
+ * This allows registering those in a {@link Catalog}. It can not be persisted 
as the
+ * source and/or sink might be inline implementations and not be representable 
in a
+ * property based form.
+ *
+ * @param  type of the produced elements by the {@link TableSource}
+ * @param  type of the expected elements by the {@link TableSink}
+ */
+@Internal
+public class ConnectorCatalogTable extends AbstractCatalogTable {
+   private final TableSource tableSource;
+   private final TableSink tableSink;
+   private final boolean isBatch;
 
 Review comment:
   just want to double check: unlike a persistent catalog table that can be 
both batch and streaming, `ConnectorCatalogTable` can only be either batch or 
streaming, right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables

2019-06-03 Thread GitBox

bowenli86 commented on a change in pull request #8549: 
[FLINK-12604][table-api][table-planner] Register TableSource/Sink as 
CatalogTables
URL: https://github.com/apache/flink/pull/8549#discussion_r289972929
 
 

 ##
 File path: 
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/ConnectorCatalogTable.java
 ##
 @@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.catalog;
+
+import org.apache.flink.annotation.Internal;
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.table.sinks.TableSink;
+import org.apache.flink.table.sources.DefinedProctimeAttribute;
+import org.apache.flink.table.sources.DefinedRowtimeAttributes;
+import org.apache.flink.table.sources.RowtimeAttributeDescriptor;
+import org.apache.flink.table.sources.TableSource;
+import org.apache.flink.table.typeutils.TimeIndicatorTypeInfo;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+
+/**
+ * A {@link CatalogTable} that wraps a {@link TableSource} and/or {@link 
TableSink}.
+ * This allows registering those in a {@link Catalog}. It can not be persisted 
as the
+ * source and/or sink might be inline implementations and not be representable 
in a
 
 Review comment:
   a bit confused be the "might be" here.
   
   Are some `ConnectorCatalogTable`s inline impl, and some are not and can be 
converted to properties? The exception thrown in `toProperties()` indicates 
that all `ConnectorCatalogTable` cannot be converted to properties


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog

2019-06-03 Thread GitBox

xuefuz commented on a change in pull request #8601: [FLINK-12713][table] 
deprecate descriptor, validator, and factory of ExternalCatalog
URL: https://github.com/apache/flink/pull/8601#discussion_r289984057
 
 

 ##
 File path: 
flink-table/flink-table-common/src/test/java/org/apache/flink/table/descriptors/ExternalCatalogDescriptorTest.java
 ##
 @@ -36,6 +36,7 @@
  * Tests for the {@link ExternalCatalogDescriptor} descriptor and
  * {@link ExternalCatalogDescriptorValidator} validator.
  */
+@Deprecated
 
 Review comment:
   This is not user facing, so I don't think we need to deprecate it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog

2019-06-03 Thread GitBox

xuefuz commented on a change in pull request #8601: [FLINK-12713][table] 
deprecate descriptor, validator, and factory of ExternalCatalog
URL: https://github.com/apache/flink/pull/8601#discussion_r289984145
 
 

 ##
 File path: 
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/factories/ExternalCatalogFactoryServiceTest.scala
 ##
 @@ -31,6 +31,7 @@ import org.junit.Test
   * Tests for testing external catalog discovery using 
[[TableFactoryService]]. The tests assume the
   * external catalog factory [[TestExternalCatalogFactory]] is registered.
   */
+@Deprecated
 
 Review comment:
   Same as above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog

2019-06-03 Thread GitBox

xuefuz commented on a change in pull request #8601: [FLINK-12713][table] 
deprecate descriptor, validator, and factory of ExternalCatalog
URL: https://github.com/apache/flink/pull/8601#discussion_r289984202
 
 

 ##
 File path: 
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/factories/utils/TestExternalCatalogFactory.scala
 ##
 @@ -37,6 +37,7 @@ import org.apache.flink.table.runtime.utils.CommonTestData
   * The catalog produces tables intended for either a streaming or batch 
environment,
   * based on the descriptor property {{{ is-streaming }}}.
   */
+@Deprecated
 
 Review comment:
   Same...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog

2019-06-03 Thread GitBox

xuefuz commented on a change in pull request #8601: [FLINK-12713][table] 
deprecate descriptor, validator, and factory of ExternalCatalog
URL: https://github.com/apache/flink/pull/8601#discussion_r289983564
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/descriptors/ExternalCatalogDescriptorValidator.java
 ##
 @@ -18,12 +18,12 @@
 
 package org.apache.flink.table.descriptors;
 
-import org.apache.flink.annotation.Internal;
-
 /**
  * Validator for {@link ExternalCatalogDescriptor}.
+ *
+ * @deprecated use {@link CatalogDescriptorValidator} instead.
  */
-@Internal
+@Deprecated
 
 Review comment:
   We can probably keep @internal while adding @Deprecated


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog

2019-06-03 Thread GitBox

xuefuz commented on a change in pull request #8601: [FLINK-12713][table] 
deprecate descriptor, validator, and factory of ExternalCatalog
URL: https://github.com/apache/flink/pull/8601#discussion_r289983564
 
 

 ##
 File path: 
flink-table/flink-table-common/src/main/java/org/apache/flink/table/descriptors/ExternalCatalogDescriptorValidator.java
 ##
 @@ -18,12 +18,12 @@
 
 package org.apache.flink.table.descriptors;
 
-import org.apache.flink.annotation.Internal;
-
 /**
  * Validator for {@link ExternalCatalogDescriptor}.
+ *
+ * @deprecated use {@link CatalogDescriptorValidator} instead.
  */
-@Internal
+@Deprecated
 
 Review comment:
   We can probably keep @internal


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests

2019-06-03 Thread GitBox

xuefuz commented on a change in pull request #8600: [FLINK-12712][table] 
deprecate ExternalCatalog and its subclasses, implementations, related util 
classes and tests
URL: https://github.com/apache/flink/pull/8600#discussion_r289982158
 
 

 ##
 File path: 
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/catalog/ExternalCatalogSchemaTest.scala
 ##
 @@ -36,6 +36,7 @@ import org.junit.{Before, Test}
 
 import scala.collection.JavaConverters._
 
+@deprecated
 
 Review comment:
   Same


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] bowenli86 commented on a change in pull request #8589: [FLINK-12677][hive][sql-client] Add descriptor, validator, and factory for HiveCatalog

2019-06-03 Thread GitBox

bowenli86 commented on a change in pull request #8589: 
[FLINK-12677][hive][sql-client] Add descriptor, validator, and factory for 
HiveCatalog
URL: https://github.com/apache/flink/pull/8589#discussion_r289982183
 
 

 ##
 File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/factories/HiveCatalogFactory.java
 ##
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.catalog.hive.factories;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.table.catalog.Catalog;
+import org.apache.flink.table.catalog.hive.HiveCatalog;
+import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator;
+import org.apache.flink.table.descriptors.DescriptorProperties;
+import org.apache.flink.table.factories.CatalogFactory;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import static 
org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator.CATALOG_PROPERTOES_HIVE_METASTORE_URIS;
+import static 
org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator.CATALOG_TYPE_VALUE_HIVE;
+import static 
org.apache.flink.table.descriptors.CatalogDescriptorValidator.CATALOG_DEFAULT_DATABASE;
+import static 
org.apache.flink.table.descriptors.CatalogDescriptorValidator.CATALOG_PROPERTIES;
+import static 
org.apache.flink.table.descriptors.CatalogDescriptorValidator.CATALOG_PROPERTY_VERSION;
+import static 
org.apache.flink.table.descriptors.CatalogDescriptorValidator.CATALOG_TYPE;
+
+/**
+ * Catalog factory for {@link HiveCatalog}.
+ */
+public class HiveCatalogFactory implements CatalogFactory {
+
+   @Override
+   public Map requiredContext() {
+   Map context = new HashMap<>();
+   context.put(CATALOG_TYPE, CATALOG_TYPE_VALUE_HIVE); // hive
+   context.put(CATALOG_PROPERTY_VERSION, "1"); // backwards 
compatibility
+   return context;
+   }
+
+   @Override
+   public List supportedProperties() {
+   List properties = new ArrayList<>();
+
+   // default database
+   properties.add(CATALOG_DEFAULT_DATABASE);
+
+   properties.add(CATALOG_PROPERTIES);
 
 Review comment:
   removed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests

2019-06-03 Thread GitBox

xuefuz commented on a change in pull request #8600: [FLINK-12712][table] 
deprecate ExternalCatalog and its subclasses, implementations, related util 
classes and tests
URL: https://github.com/apache/flink/pull/8600#discussion_r289982243
 
 

 ##
 File path: 
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/catalog/InMemoryExternalCatalogTest.scala
 ##
 @@ -24,6 +24,7 @@ import 
org.apache.flink.table.descriptors.{ConnectorDescriptor, Schema}
 import org.junit.Assert._
 import org.junit.{Before, Test}
 
+@deprecated
 
 Review comment:
   Same.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests

2019-06-03 Thread GitBox

xuefuz commented on a change in pull request #8600: [FLINK-12712][table] 
deprecate ExternalCatalog and its subclasses, implementations, related util 
classes and tests
URL: https://github.com/apache/flink/pull/8600#discussion_r289982051
 
 

 ##
 File path: 
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/api/ExternalCatalogTest.scala
 ##
 @@ -27,6 +27,7 @@ import org.junit.Test
 /**
   * Test for external catalog query plan.
   */
+@deprecated
 
 Review comment:
   Same as above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests

2019-06-03 Thread GitBox

xuefuz commented on a change in pull request #8600: [FLINK-12712][table] 
deprecate ExternalCatalog and its subclasses, implementations, related util 
classes and tests
URL: https://github.com/apache/flink/pull/8600#discussion_r289981952
 
 

 ##
 File path: 
flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/api/ExternalCatalogInsertTest.scala
 ##
 @@ -29,6 +29,7 @@ import org.junit.Test
 /**
   * Test for inserting into tables from external catalog.
   */
+@deprecated
 
 Review comment:
   Same as above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests

2019-06-03 Thread GitBox

xuefuz commented on a change in pull request #8600: [FLINK-12712][table] 
deprecate ExternalCatalog and its subclasses, implementations, related util 
classes and tests
URL: https://github.com/apache/flink/pull/8600#discussion_r289981786
 
 

 ##
 File path: 
flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/catalog/ExternalCatalogSchema.scala
 ##
 @@ -37,7 +37,10 @@ import scala.collection.JavaConverters._
   *
   * @param catalogIdentifier external catalog name
   * @param catalog   external catalog
+  *
+  * @deprecated use [[CatalogCalciteSchema]] instead.
   */
+@deprecated
 
 Review comment:
   Do we need this? I think we don't have to unless this is user facing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

1 2 3 4 >

1 - 100 of 334 matches

Mail list logo