[jira] [Updated] (FLINK-12715) Hive-1.2.1 build is broken
[ https://issues.apache.org/jira/browse/FLINK-12715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated FLINK-12715: --- Description: Some Hive util methods used are not compatible between Hive-2.3.4 and Hive-1.2.1. (was: Some Hive util methods used are not compatible between Hive-2.3.4 and Hive-1.2.1. Seems we have to implement these method ourselves.) > Hive-1.2.1 build is broken > -- > > Key: FLINK-12715 > URL: https://issues.apache.org/jira/browse/FLINK-12715 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: Rui Li >Assignee: Rui Li >Priority: Major > > Some Hive util methods used are not compatible between Hive-2.3.4 and > Hive-1.2.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10855) CheckpointCoordinator does not delete checkpoint directory of late/failed checkpoints
[ https://issues.apache.org/jira/browse/FLINK-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855394#comment-16855394 ] vinoyang commented on FLINK-10855: -- [~till.rohrmann] I have submitted a design draft, if you have time you can have a look, thanks. > CheckpointCoordinator does not delete checkpoint directory of late/failed > checkpoints > - > > Key: FLINK-10855 > URL: https://issues.apache.org/jira/browse/FLINK-10855 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.5.5, 1.6.2, 1.7.0 >Reporter: Till Rohrmann >Assignee: vinoyang >Priority: Major > > In case that an acknowledge checkpoint message is late or a checkpoint cannot > be acknowledged, we discard the subtask state in the > {{CheckpointCoordinator}}. What's not happening in this case is that we > delete the parent directory of the checkpoint. This only happens when we > dispose a {{PendingCheckpoint#dispose}}. > Due to this behaviour it can happen that a checkpoint fails (e.g. a task not > being ready) and we delete the checkpoint directory. Next another task writes > its checkpoint data to the checkpoint directory (thereby creating it again) > and sending an acknowledge message back to the {{CheckpointCoordinator}}. The > {{CheckpointCoordinator}} will realize that there is no longer a > {{PendingCheckpoint}} and will discard the sub task state. This will remove > the state files from the checkpoint directory but will leave the checkpoint > directory untouched. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] dawidwys commented on issue #8585: [FLINK-12690][table-api] Introduce a Planner interface
dawidwys commented on issue #8585: [FLINK-12690][table-api] Introduce a Planner interface URL: https://github.com/apache/flink/pull/8585#issuecomment-498543605 You are right, the `Planner` will need the `CatalogManager` (not only for parse, but also for Table scans). I don't want to add an `init` method though as this is rather bad design to have objects that you can construct but they are unusable unitl some method call. Therefore we should just add the `CatalogManager` to the ctor of the `Planner`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] twalthr commented on issue #8521: [FLINK-12601][table] Make the DataStream & DataSet conversion to a Table independent from Calcite
twalthr commented on issue #8521: [FLINK-12601][table] Make the DataStream & DataSet conversion to a Table independent from Calcite URL: https://github.com/apache/flink/pull/8521#issuecomment-498543106 @flinkbot approve all This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot commented on issue #8607: [FLINK-12652] [documentation] add first version of a glossary
flinkbot commented on issue #8607: [FLINK-12652] [documentation] add first version of a glossary URL: https://github.com/apache/flink/pull/8607#issuecomment-498542685 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12652) Add Glossary to Concepts Section of Documentation
[ https://issues.apache.org/jira/browse/FLINK-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12652: --- Labels: pull-request-available (was: ) > Add Glossary to Concepts Section of Documentation > - > > Key: FLINK-12652 > URL: https://issues.apache.org/jira/browse/FLINK-12652 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: Konstantin Knauf >Assignee: Konstantin Knauf >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12724) Add Links to new Concepts Section to Glossary
Konstantin Knauf created FLINK-12724: Summary: Add Links to new Concepts Section to Glossary Key: FLINK-12724 URL: https://issues.apache.org/jira/browse/FLINK-12724 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Konstantin Knauf Once we have reworked the Concepts section, we should add references to to the glossary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] knaufk opened a new pull request #8607: [FLINK-12652] [documentation] add first version of a glossary
knaufk opened a new pull request #8607: [FLINK-12652] [documentation] add first version of a glossary URL: https://github.com/apache/flink/pull/8607 ## What is the purpose of the change * Part of FLIP-42 * Add a glossary section to the documentation for further reference and to establish a common terminology for common terms ## Brief change log * add first version of glossary ## Verifying this change * build docs and proof-read ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not applicable This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (FLINK-10455) Potential Kafka producer leak in case of failures
[ https://issues.apache.org/jira/browse/FLINK-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855384#comment-16855384 ] sunjincheng edited comment on FLINK-10455 at 6/4/19 6:31 AM: - Currently, we have added an exception capture for the `FlinkKafkaProducer011#commit` method to ensure that the `recycleTransactionalProducer` will be executed, so do we also need to add an exception capture for `FlinkKafkaProducer011#abort`? !image-2019-06-04-14-30-55-985.png! was (Author: sunjincheng121): Currently, we have added an exception capture for the `FlinkKafkaProducer011#commit` method to ensure that the `recycleTransactionalProducer` will be executed, so do we also need to add an exception capture for `FlinkKafkaProducer011#abort`? !image-2019-06-04-14-25-16-916.png! > Potential Kafka producer leak in case of failures > - > > Key: FLINK-10455 > URL: https://issues.apache.org/jira/browse/FLINK-10455 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.5.2 >Reporter: Nico Kruber >Assignee: Jiangjie Qin >Priority: Critical > Labels: pull-request-available > Fix For: 1.5.6, 1.7.0, 1.7.3, 1.9.0, 1.8.1 > > Attachments: image-2019-06-04-14-25-16-916.png, > image-2019-06-04-14-30-55-985.png > > > If the Kafka brokers' timeout is too low for our checkpoint interval [1], we > may get an {{ProducerFencedException}}. Documentation around > {{ProducerFencedException}} explicitly states that we should close the > producer after encountering it. > By looking at the code, it doesn't seem like this is actually done in > {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in > {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an > exception, we don't clean up (nor try to commit) any other transaction. > -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} > simply iterates over the {{pendingCommitTransactions}} which is not touched > during {{close()}} > Now if we restart the failing job on the same Flink cluster, any resources > from the previous attempt will still linger around. > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10455) Potential Kafka producer leak in case of failures
[ https://issues.apache.org/jira/browse/FLINK-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855384#comment-16855384 ] sunjincheng commented on FLINK-10455: - Currently, we have added an exception capture for the `FlinkKafkaProducer011#commit` method to ensure that the `recycleTransactionalProducer` will be executed, so do we also need to add an exception capture for `FlinkKafkaProducer011#abort`? !image-2019-06-04-14-25-16-916.png! > Potential Kafka producer leak in case of failures > - > > Key: FLINK-10455 > URL: https://issues.apache.org/jira/browse/FLINK-10455 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.5.2 >Reporter: Nico Kruber >Assignee: Jiangjie Qin >Priority: Critical > Labels: pull-request-available > Fix For: 1.5.6, 1.7.0, 1.7.3, 1.9.0, 1.8.1 > > Attachments: image-2019-06-04-14-25-16-916.png > > > If the Kafka brokers' timeout is too low for our checkpoint interval [1], we > may get an {{ProducerFencedException}}. Documentation around > {{ProducerFencedException}} explicitly states that we should close the > producer after encountering it. > By looking at the code, it doesn't seem like this is actually done in > {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in > {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an > exception, we don't clean up (nor try to commit) any other transaction. > -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} > simply iterates over the {{pendingCommitTransactions}} which is not touched > during {{close()}} > Now if we restart the failing job on the same Flink cluster, any resources > from the previous attempt will still linger around. > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-10455) Potential Kafka producer leak in case of failures
[ https://issues.apache.org/jira/browse/FLINK-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated FLINK-10455: Attachment: image-2019-06-04-14-25-16-916.png > Potential Kafka producer leak in case of failures > - > > Key: FLINK-10455 > URL: https://issues.apache.org/jira/browse/FLINK-10455 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.5.2 >Reporter: Nico Kruber >Assignee: Jiangjie Qin >Priority: Critical > Labels: pull-request-available > Fix For: 1.5.6, 1.7.0, 1.7.3, 1.9.0, 1.8.1 > > Attachments: image-2019-06-04-14-25-16-916.png > > > If the Kafka brokers' timeout is too low for our checkpoint interval [1], we > may get an {{ProducerFencedException}}. Documentation around > {{ProducerFencedException}} explicitly states that we should close the > producer after encountering it. > By looking at the code, it doesn't seem like this is actually done in > {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in > {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an > exception, we don't clean up (nor try to commit) any other transaction. > -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} > simply iterates over the {{pendingCommitTransactions}} which is not touched > during {{close()}} > Now if we restart the failing job on the same Flink cluster, any resources > from the previous attempt will still linger around. > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6755) Allow triggering Checkpoints through command line client
[ https://issues.apache.org/jira/browse/FLINK-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855369#comment-16855369 ] vinoyang commented on FLINK-6755: - Hi [~till.rohrmann] [~aljoscha] [~gyfora] It seems there is another issue try to work for trigger checkpoint manually(FLINK-12619). I think we need to agree with this idea before we start these issues so that we can reduce unnecessary work. cc [~klion26] > Allow triggering Checkpoints through command line client > > > Key: FLINK-6755 > URL: https://issues.apache.org/jira/browse/FLINK-6755 > Project: Flink > Issue Type: New Feature > Components: Command Line Client, Runtime / Checkpointing >Affects Versions: 1.3.0 >Reporter: Gyula Fora >Assignee: vinoyang >Priority: Major > > The command line client currently only allows triggering (and canceling with) > Savepoints. > While this is good if we want to fork or modify the pipelines in a > non-checkpoint compatible way, now with incremental checkpoints this becomes > wasteful for simple job restarts/pipeline updates. > I suggest we add a new command: > ./bin/flink checkpoint [checkpointDirectory] > and a new flag -c for the cancel command to indicate we want to trigger a > checkpoint: > ./bin/flink cancel -c [targetDirectory] > Otherwise this can work similar to the current savepoint taking logic, we > could probably even piggyback on the current messages by adding boolean flag > indicating whether it should be a savepoint or a checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12713) deprecate descriptor, validator, and factory of ExternalCatalog
[ https://issues.apache.org/jira/browse/FLINK-12713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855362#comment-16855362 ] Timo Walther commented on FLINK-12713: -- [~phoenixjiangnan] there is no need to deprecate those classes. They have been merged recently in this release. I thought we just update them once the new interfaces are ready. > deprecate descriptor, validator, and factory of ExternalCatalog > --- > > Key: FLINK-12713 > URL: https://issues.apache.org/jira/browse/FLINK-12713 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Client >Reporter: Bowen Li >Assignee: Bowen Li >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12723) Adds a wiki page about setting up a Python Table API development environment
Dian Fu created FLINK-12723: --- Summary: Adds a wiki page about setting up a Python Table API development environment Key: FLINK-12723 URL: https://issues.apache.org/jira/browse/FLINK-12723 Project: Flink Issue Type: Sub-task Reporter: Dian Fu We should add a wiki page showing how to set up a Python Table API development environment to help contributors who are interested in the Python Table API to join in easily. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-12719) Add the Python catalog API
[ https://issues.apache.org/jira/browse/FLINK-12719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated FLINK-12719: Summary: Add the Python catalog API (was: Add the catalog API for the Python Table API) > Add the Python catalog API > -- > > Key: FLINK-12719 > URL: https://issues.apache.org/jira/browse/FLINK-12719 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Dian Fu >Assignee: Dian Fu >Priority: Major > > The new catalog API is almost ready. We should add the corresponding Python > catalog API. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] dawidwys commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables
dawidwys commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables URL: https://github.com/apache/flink/pull/8549#discussion_r290133398 ## File path: flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/ConnectorCatalogTable.java ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.catalog; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.table.api.TableSchema; +import org.apache.flink.table.sinks.TableSink; +import org.apache.flink.table.sources.DefinedProctimeAttribute; +import org.apache.flink.table.sources.DefinedRowtimeAttributes; +import org.apache.flink.table.sources.RowtimeAttributeDescriptor; +import org.apache.flink.table.sources.TableSource; +import org.apache.flink.table.typeutils.TimeIndicatorTypeInfo; + +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; + +/** + * A {@link CatalogTable} that wraps a {@link TableSource} and/or {@link TableSink}. + * This allows registering those in a {@link Catalog}. It can not be persisted as the + * source and/or sink might be inline implementations and not be representable in a Review comment: It says the `TableSource` might be a inline implementation. You can either use `TableEnvironment#connect` than theoretically the table source is property serializable. The other possibility is just to use `TableSource` explicitly: `TableEnvironment#fromTableSource`. The `ConnectorCatalogTable` therefore is always inline as we don't know which case we are handling. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-12722) Adds Python Table API tutorial
Dian Fu created FLINK-12722: --- Summary: Adds Python Table API tutorial Key: FLINK-12722 URL: https://issues.apache.org/jira/browse/FLINK-12722 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Dian Fu Assignee: Dian Fu We should add a tutorial for Python Table API in the docs to help beginners of Python Table API to get a basic knowledge of how to create a simple Python Table API job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] flinkbot commented on issue #8606: [FLINK-12721] Process integer type in json format
flinkbot commented on issue #8606: [FLINK-12721] Process integer type in json format URL: https://github.com/apache/flink/pull/8606#issuecomment-498529338 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] aloyszhang opened a new pull request #8606: Process integer type in json format
aloyszhang opened a new pull request #8606: Process integer type in json format URL: https://github.com/apache/flink/pull/8606 ## What is the purpose of the change This PR aims at providing a more precisely way to process `integer` type in JSON format. ## Brief change log - convert `integer` type in JSON Schema to `Types.INT` ## Verifying this change This change is already covered by existing tests, such as test in flink-json module. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (**yes** / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12721) make flink-json more precisely when handle integer type
[ https://issues.apache.org/jira/browse/FLINK-12721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] aloyszhang updated FLINK-12721: --- Description: At present, flink-json convert integer type to `Types.BIG_DEC` which will make some mismatch error when sink to external storage system like MySql, we can make it more precisely when by process integer as `Types.INT` was: At present, flink-json convert integer type to `Types.BIG_DEC` which will make some mismatch error when sink to some external storage system like MySql, we can make it more precisely when by process integer as `Types.INT` > make flink-json more precisely when handle integer type > --- > > Key: FLINK-12721 > URL: https://issues.apache.org/jira/browse/FLINK-12721 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.8.0 >Reporter: aloyszhang >Assignee: aloyszhang >Priority: Major > > At present, flink-json convert integer type to `Types.BIG_DEC` which will > make some mismatch error when sink to external storage system like MySql, we > can make it more precisely when by process integer as `Types.INT` > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12721) make flink-json more precisely when handle integer type
aloyszhang created FLINK-12721: -- Summary: make flink-json more precisely when handle integer type Key: FLINK-12721 URL: https://issues.apache.org/jira/browse/FLINK-12721 Project: Flink Issue Type: Improvement Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Affects Versions: 1.8.0 Reporter: aloyszhang Assignee: aloyszhang At present, flink-json convert integer type to `Types.BIG_DEC` which will make some mismatch error when sink to some external storage system like MySql, we can make it more precisely when by process integer as `Types.INT` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12720) Add the Python Table API Sphinx docs
Dian Fu created FLINK-12720: --- Summary: Add the Python Table API Sphinx docs Key: FLINK-12720 URL: https://issues.apache.org/jira/browse/FLINK-12720 Project: Flink Issue Type: Sub-task Components: API / Python, Documentation Reporter: Dian Fu As the Python Table API is added, we should add the Python Table API Sphinx docs. This includes the following work: 1) Add scripts to build the Sphinx docs 2) Add a link in the main page to the generated doc -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] godfreyhe commented on issue #8585: [FLINK-12690][table-api] Introduce a Planner interface
godfreyhe commented on issue #8585: [FLINK-12690][table-api] Introduce a Planner interface URL: https://github.com/apache/flink/pull/8585#issuecomment-498524833 hi @dawidwys, there is a minor question: `parse` method in `Planner` class returns validated `TableOperation`, so the `Planner` instance should hold `CatalogManager` instance i think. My question is does `Planner` need methods like `setCatalogManager` or `registerXXX` to connect `TableEnvironment` and `Planner` by service discovery. thanks~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot commented on issue #8605: [FLINK-9363] bump up flink-shaded-jackson version to 2.9.8-7.0
flinkbot commented on issue #8605: [FLINK-9363] bump up flink-shaded-jackson version to 2.9.8-7.0 URL: https://github.com/apache/flink/pull/8605#issuecomment-498522976 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-12719) Add the catalog API for the Python Table API
Dian Fu created FLINK-12719: --- Summary: Add the catalog API for the Python Table API Key: FLINK-12719 URL: https://issues.apache.org/jira/browse/FLINK-12719 Project: Flink Issue Type: Sub-task Components: API / Python Reporter: Dian Fu Assignee: Dian Fu The new catalog API is almost ready. We should add the corresponding Python catalog API. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] aloyszhang opened a new pull request #8605: [FLINK-9363] bump up flink-shaded-jackson version to 2.9.8-7.0
aloyszhang opened a new pull request #8605: [FLINK-9363] bump up flink-shaded-jackson version to 2.9.8-7.0 URL: https://github.com/apache/flink/pull/8605 ## What is the purpose of the change This PR is to bump up flink-shaded-jackson version to 2.9.8-7.0, since the flink-shaded project has release 7.0. ## Brief change log - bump up flink-shaded-jackson version to 2.9.8-7.0 ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (**yes** / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (**yes** / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-12426) TM occasionally hang in deploying state
[ https://issues.apache.org/jira/browse/FLINK-12426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855306#comment-16855306 ] Qi commented on FLINK-12426: [~sunhaibotb] Sure, closed. > TM occasionally hang in deploying state > --- > > Key: FLINK-12426 > URL: https://issues.apache.org/jira/browse/FLINK-12426 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: Qi >Priority: Major > > Hi all, > > We use Flink batch and start thousands of jobs per day. Occasionally we > observed some stuck jobs, due to some TM hang in “DEPLOYING” state. > > It seems that the TM is calling BlobClient to download jars from > JM/BlobServer. Under hood it’s calling Socket.connect() and then > Socket.read() to retrieve results. > > These jobs usually have many TM slots (1~2k). We checked the TM log and > dumped the TM thread. It indeed hung on socket read to download jar from Blob > server. > > We're using Flink 1.5 but this may also affect later versions since related > code are not changed much. We've tried to add socket timeout in BlobClient, > but still no luck. > > > TM log > > ... > INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Received task > DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) > (184/2000). > INFO org.apache.flink.runtime.taskmanager.Task - DataSource (at > createInput(ExecutionEnvironment.java:548) (our.code)) (184/2000) switched > from CREATED to DEPLOYING. > INFO org.apache.flink.runtime.taskmanager.Task - Creating FileSystem stream > leak safety net for task DataSource (at > createInput(ExecutionEnvironment.java:548) (our.code)) (184/2000) [DEPLOYING] > INFO org.apache.flink.runtime.taskmanager.Task - Loading JAR files for task > DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) > (184/2000) [DEPLOYING]. > INFO org.apache.flink.runtime.blob.BlobClient - Downloading > 19e65c0caa41f264f9ffe4ca2a48a434/p-3ecd6341bf97d5512b14c93f6c9f51f682b6db26-37d5e69d156ee00a924c1ebff0c0d280 > from some-host-ip-port > {color:#22}no more logs...{color} > > > TM thread dump: > > _"DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) > (1999/2000)" #72 prio=5 os_prio=0 tid=0x7fb9a1521000 nid=0xa0994 runnable > [0x7fb97cfbf000]_ > _java.lang.Thread.State: RUNNABLE_ > _at java.net.SocketInputStream.socketRead0(Native Method)_ > _at > java.net.SocketInputStream.socketRead(SocketInputStream.java:116)_ > _at java.net.SocketInputStream.read(SocketInputStream.java:171)_ > _at java.net.SocketInputStream.read(SocketInputStream.java:141)_ > _at > org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:152)_ > _at > org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:140)_ > _at > org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:170)_ > _at > org.apache.flink.runtime.blob.AbstractBlobCache.getFileInternal(AbstractBlobCache.java:181)_ > _at > org.apache.flink.runtime.blob.PermanentBlobCache.getFile(PermanentBlobCache.java:206)_ > _at > org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:120)_ > _- locked <0x00078ab60ba8> (a java.lang.Object)_ > _at > org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:893)_ > _at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585)_ > _at java.lang.Thread.run(Thread.java:748)_ > __ > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-12426) TM occasionally hang in deploying state
[ https://issues.apache.org/jira/browse/FLINK-12426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi closed FLINK-12426. -- Resolution: Duplicate > TM occasionally hang in deploying state > --- > > Key: FLINK-12426 > URL: https://issues.apache.org/jira/browse/FLINK-12426 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: Qi >Priority: Major > > Hi all, > > We use Flink batch and start thousands of jobs per day. Occasionally we > observed some stuck jobs, due to some TM hang in “DEPLOYING” state. > > It seems that the TM is calling BlobClient to download jars from > JM/BlobServer. Under hood it’s calling Socket.connect() and then > Socket.read() to retrieve results. > > These jobs usually have many TM slots (1~2k). We checked the TM log and > dumped the TM thread. It indeed hung on socket read to download jar from Blob > server. > > We're using Flink 1.5 but this may also affect later versions since related > code are not changed much. We've tried to add socket timeout in BlobClient, > but still no luck. > > > TM log > > ... > INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Received task > DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) > (184/2000). > INFO org.apache.flink.runtime.taskmanager.Task - DataSource (at > createInput(ExecutionEnvironment.java:548) (our.code)) (184/2000) switched > from CREATED to DEPLOYING. > INFO org.apache.flink.runtime.taskmanager.Task - Creating FileSystem stream > leak safety net for task DataSource (at > createInput(ExecutionEnvironment.java:548) (our.code)) (184/2000) [DEPLOYING] > INFO org.apache.flink.runtime.taskmanager.Task - Loading JAR files for task > DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) > (184/2000) [DEPLOYING]. > INFO org.apache.flink.runtime.blob.BlobClient - Downloading > 19e65c0caa41f264f9ffe4ca2a48a434/p-3ecd6341bf97d5512b14c93f6c9f51f682b6db26-37d5e69d156ee00a924c1ebff0c0d280 > from some-host-ip-port > {color:#22}no more logs...{color} > > > TM thread dump: > > _"DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) > (1999/2000)" #72 prio=5 os_prio=0 tid=0x7fb9a1521000 nid=0xa0994 runnable > [0x7fb97cfbf000]_ > _java.lang.Thread.State: RUNNABLE_ > _at java.net.SocketInputStream.socketRead0(Native Method)_ > _at > java.net.SocketInputStream.socketRead(SocketInputStream.java:116)_ > _at java.net.SocketInputStream.read(SocketInputStream.java:171)_ > _at java.net.SocketInputStream.read(SocketInputStream.java:141)_ > _at > org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:152)_ > _at > org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:140)_ > _at > org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:170)_ > _at > org.apache.flink.runtime.blob.AbstractBlobCache.getFileInternal(AbstractBlobCache.java:181)_ > _at > org.apache.flink.runtime.blob.PermanentBlobCache.getFile(PermanentBlobCache.java:206)_ > _at > org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:120)_ > _- locked <0x00078ab60ba8> (a java.lang.Object)_ > _at > org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:893)_ > _at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585)_ > _at java.lang.Thread.run(Thread.java:748)_ > __ > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] ifndef-SleePy commented on issue #8582: [FLINK-12681][metrics] Introduce a built-in thread-safe counter
ifndef-SleePy commented on issue #8582: [FLINK-12681][metrics] Introduce a built-in thread-safe counter URL: https://github.com/apache/flink/pull/8582#issuecomment-498512073 Travis checking has been passed. @flinkbot attention @zentol Would you mind take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-12718) allow users to specify hive-site.xml location to configure hive metastore client in HiveCatalog
Bowen Li created FLINK-12718: Summary: allow users to specify hive-site.xml location to configure hive metastore client in HiveCatalog Key: FLINK-12718 URL: https://issues.apache.org/jira/browse/FLINK-12718 Project: Flink Issue Type: Sub-task Components: Connectors / Hive Reporter: Bowen Li Assignee: Bowen Li Fix For: 1.9.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-12649) Add a shim layer to support multiple versions of HMS
[ https://issues.apache.org/jira/browse/FLINK-12649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Li closed FLINK-12649. Resolution: Fixed merged in 1.9.0: 038ab385c6f9af129b5eda7fe05d8b39d6122077 > Add a shim layer to support multiple versions of HMS > > > Key: FLINK-12649 > URL: https://issues.apache.org/jira/browse/FLINK-12649 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: Rui Li >Assignee: Rui Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We need a shim layer of HMS client to talk to different versions of HMS -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-12310) shade user facing common libraries in flink-connector-hive
[ https://issues.apache.org/jira/browse/FLINK-12310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Li closed FLINK-12310. Resolution: Fixed closed since we've changed hive jars' dependencies from "compile" to "provided" > shade user facing common libraries in flink-connector-hive > -- > > Key: FLINK-12310 > URL: https://issues.apache.org/jira/browse/FLINK-12310 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: Bowen Li >Assignee: Bowen Li >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > > As discussed with Stephan and Timo, we will need to shade user facing common > libraries in flink-connector-hive but can do it later in a separate PR than > FLINK-12238 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12686) Disentangle StreamGraph/StreamNode and StreamGraphGenerator from StreamExecutionEnvironment
[ https://issues.apache.org/jira/browse/FLINK-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855282#comment-16855282 ] Biao Liu commented on FLINK-12686: -- Hi [~aljoscha], yes, there is no setting for slot sharing currently. Let's work on the disentangling first. I think it's clear enough. > Disentangle StreamGraph/StreamNode and StreamGraphGenerator from > StreamExecutionEnvironment > --- > > Key: FLINK-12686 > URL: https://issues.apache.org/jira/browse/FLINK-12686 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: Biao Liu >Assignee: Biao Liu >Priority: Major > Fix For: 1.9.0 > > > This is a part of merging Blink batch runner. I would like to improvement the > StreamGraph to support both streaming job and batch job. There is a google > doc to describe this proposal, here is the > [link|https://docs.google.com/document/d/17X6csUWKUdVn55c47YbOTEipHDWFt_1upyTPrBD6Fjc/edit?usp=sharing]. > This proposal is under discussion, any feedback is welcome! > The new Table API runner directly wants to translate to {{StreamGraphs}} and > therefore doesn't have a {{StreamExecutionEnvironment}}, we therefore need to > disentangle them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-12686) Disentangle StreamGraph/StreamNode and StreamGraphGenerator from StreamExecutionEnvironment
[ https://issues.apache.org/jira/browse/FLINK-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Biao Liu reassigned FLINK-12686: Assignee: Biao Liu > Disentangle StreamGraph/StreamNode and StreamGraphGenerator from > StreamExecutionEnvironment > --- > > Key: FLINK-12686 > URL: https://issues.apache.org/jira/browse/FLINK-12686 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: Biao Liu >Assignee: Biao Liu >Priority: Major > Fix For: 1.9.0 > > > This is a part of merging Blink batch runner. I would like to improvement the > StreamGraph to support both streaming job and batch job. There is a google > doc to describe this proposal, here is the > [link|https://docs.google.com/document/d/17X6csUWKUdVn55c47YbOTEipHDWFt_1upyTPrBD6Fjc/edit?usp=sharing]. > This proposal is under discussion, any feedback is welcome! > The new Table API runner directly wants to translate to {{StreamGraphs}} and > therefore doesn't have a {{StreamExecutionEnvironment}}, we therefore need to > disentangle them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12717) Add windows support for the Python shell script
Dian Fu created FLINK-12717: --- Summary: Add windows support for the Python shell script Key: FLINK-12717 URL: https://issues.apache.org/jira/browse/FLINK-12717 Project: Flink Issue Type: Sub-task Components: API / Python Reporter: Dian Fu We should add a windows shell script for pyflink-gateway-server.sh. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] danny0405 edited a comment on issue #8548: [FLINK-6962] [table] Add create(drop) table SQL DDL
danny0405 edited a comment on issue #8548: [FLINK-6962] [table] Add create(drop) table SQL DDL URL: https://github.com/apache/flink/pull/8548#issuecomment-498509086 @twalthr 1. I agree that remove the colon between ROW type field name and type is more in line with the sql standard 2. I agree that use left/right paren is more SQL compliant 3. I agree to add comments for row type item 4. The parameterized VARCHARs is supported as a builtin type in Calcite, and our test already extends it. 5. I don't think we should support type keyword like `STRING` or `BYTES`, same reason for SQL compliant 6. REAL is supported by CALCTE, it's hard to say we do not support it, exclude the REAL type will bring a lot of extra work For other data types that are listed in FLIP-37 but not in this PR, i would rather fire a new JIRA issue for tracing them, after all, support the types Flink implement in query execution has higher priority than those that are not totally supported. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] danny0405 commented on issue #8548: [FLINK-6962] [table] Add create(drop) table SQL DDL
danny0405 commented on issue #8548: [FLINK-6962] [table] Add create(drop) table SQL DDL URL: https://github.com/apache/flink/pull/8548#issuecomment-498509086 > @danny0405 Maybe the merging happened too quickly for this PR. It is still not 100% aligned with FLIP-37. We should make sure to do that before the release. In particular: > > * no colon between ROW field name and type > * alternative syntax `ROW(fieldName fieldType)` to be SQL compliant > * support for comments in rows like `ROW` > * no tests for parameterized VARCHARs like `VARCHAR(100)` > * alternative syntax for `VARCHAR(INT_MAX)` as `STRING` > * alternative syntax for `VARBINARY(INT_MAX)` as `BYTES` > * REAL is not supported > > Furthermore it would be great if the parser would already support all data types defined in FLIP-37 otherwise we need to check for compatibility in every corner of the stack again. > > Missing are (among others): CHAR, BINARY, user-defined type identifiers, MULTISET > Please ensure consistency here. 1. I agree that remove the colon between ROW type field name and type is more in line with the sql standard 2. I agree that use left/right paren is more SQL compliant 3. I agree to add comments for row type item 4. The parameterized VARCHARs is supported as a builtin type in Calcite, and our test already extends it. 5. I don't think we should support type keyword like `STRING` or `BYTES`, same reason for SQL compliant 6. REAL is supported by CALCTE, it's hard to say we do not support it, exclude the REAL type will bring a lot of extra work For other data types that are listed in FLIP-37 but not in this PR, i would rather fire a new JIRA issue for tracing them, after all, support the types Flink implement in query execution has higher priority than those that are not totally supported. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-12716) Add an interactive shell for Python Table API
Dian Fu created FLINK-12716: --- Summary: Add an interactive shell for Python Table API Key: FLINK-12716 URL: https://issues.apache.org/jira/browse/FLINK-12716 Project: Flink Issue Type: Sub-task Components: API / Python Reporter: Dian Fu We should add an interactive shell for the Python Table API. It will have the similar functionality like the Scala Shell. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-12715) Hive-1.2.1 build is broken
[ https://issues.apache.org/jira/browse/FLINK-12715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated FLINK-12715: --- Component/s: Connectors / Hive > Hive-1.2.1 build is broken > -- > > Key: FLINK-12715 > URL: https://issues.apache.org/jira/browse/FLINK-12715 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: Rui Li >Assignee: Rui Li >Priority: Major > > Some Hive util methods used are not compatible between Hive-2.3.4 and > Hive-1.2.1. Seems we have to implement these method ourselves. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] lirui-apache commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables
lirui-apache commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables URL: https://github.com/apache/flink/pull/8522#discussion_r290112999 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableInputFormat.java ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.batch.connectors.hive; + +import org.apache.flink.api.common.io.LocatableInputSplitAssigner; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.hadoop.common.HadoopInputFormatCommonBase; +import org.apache.flink.api.java.hadoop.mapred.wrapper.HadoopDummyReporter; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.core.io.InputSplitAssigner; +import org.apache.flink.table.catalog.hive.util.HiveTableUtil; +import org.apache.flink.types.Row; + +import org.apache.hadoop.conf.Configurable; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.serde2.Deserializer; +import org.apache.hadoop.hive.serde2.SerDeUtils; +import org.apache.hadoop.hive.serde2.objectinspector.StructField; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.InputFormat; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.JobConfigurable; +import org.apache.hadoop.mapred.RecordReader; +import org.apache.hadoop.security.Credentials; +import org.apache.hadoop.security.UserGroupInformation; +import org.apache.hadoop.util.ReflectionUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.util.ArrayList; +import java.util.List; +import java.util.Properties; + +import static org.apache.flink.util.Preconditions.checkNotNull; +import static org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR; + +/** + * The HiveTableInputFormat are inspired by the HCatInputFormat and HadoopInputFormatBase. + * It's used to read from hive partition/non-partition table. + */ +public class HiveTableInputFormat extends HadoopInputFormatCommonBase + implements ResultTypeQueryable { + private static final long serialVersionUID = 6351448428766433164L; + private static Logger logger = LoggerFactory.getLogger(HiveTableInputFormat.class); + + private JobConf jobConf; + + protected transient Writable key; + protected transient Writable value; + + private transient RecordReader recordReader; + protected transient boolean fetched = false; + protected transient boolean hasNext; + + private Boolean isPartitioned; + private RowTypeInfo rowTypeInfo; + + //Necessary info to init deserializer + private String[] partitionColNames; + //For non-partition hive table, partitions only contains one partition which partitionValues is empty. + private List partitions; + private transient Deserializer deserializer; + //Hive StructField list contain all related info for specific serde. + private transient List structFields; + //StructObjectInspector in hive helps us to look into the internal structure of a struct object. + private transient StructObjectInspector structObjectInspector; + private transient InputFormat mapredInputFormat; + private transient HiveTablePartition hiveTablePartition; + + public HiveTableInputFormat( + JobConf jobConf, + Boolean isPartitioned, + String[] partitionColNames, + List partitions, + RowTypeInfo rowTypeInfo) { + super(jobConf.getCredentials()); + this.rowTypeInfo = checkNotNull(rowTypeInfo, "rowTypeInfo can not be null."); +
[jira] [Commented] (FLINK-12070) Make blocking result partitions consumable multiple times
[ https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855260#comment-16855260 ] Yingjie Cao commented on FLINK-12070: - [~StephanEwen] There is no swap partition in my testing OS. The data of mmaped region was flushed to disk at the very begining, that is, no lazy swaping. However, the disk writing speed is far lower than the memory writing speeding, so memory will be exhausted sooner or latter so as long the data volume is large enough. As for the reason why the machine froze up, I guess it is because that flushing mmaped region to disk also need memory while no enough pages left. I'd like to perform some more tests following your suggestion and will post the results out latter. BTW, we made Blink spill to file directly and read from file directly and there is no significant performance regression. [~srichter] The max heap memory in my test setting is fairly large, but most of it is never used (not allocated), so it should not have much impact. Maybe reducing max heap memory can improve the performance, but the problem will not be solved. > Make blocking result partitions consumable multiple times > - > > Key: FLINK-12070 > URL: https://issues.apache.org/jira/browse/FLINK-12070 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network >Affects Versions: 1.9.0 >Reporter: Till Rohrmann >Assignee: Stephan Ewen >Priority: Blocker > Labels: pull-request-available > Fix For: 1.9.0 > > Attachments: image-2019-04-18-17-38-24-949.png > > Time Spent: 20m > Remaining Estimate: 0h > > In order to avoid writing produced results multiple times for multiple > consumers and in order to speed up batch recoveries, we should make the > blocking result partitions to be consumable multiple times. At the moment a > blocking result partition will be released once the consumers has processed > all data. Instead the result partition should be released once the next > blocking result has been produced and all consumers of a blocking result > partition have terminated. Moreover, blocking results should not hold on slot > resources like network buffers or memory as it is currently the case with > {{SpillableSubpartitions}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12715) Hive-1.2.1 build is broken
Rui Li created FLINK-12715: -- Summary: Hive-1.2.1 build is broken Key: FLINK-12715 URL: https://issues.apache.org/jira/browse/FLINK-12715 Project: Flink Issue Type: Sub-task Reporter: Rui Li Assignee: Rui Li Some Hive util methods used are not compatible between Hive-2.3.4 and Hive-1.2.1. Seems we have to implement these method ourselves. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] zjuwangg commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables
zjuwangg commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables URL: https://github.com/apache/flink/pull/8522#discussion_r290110383 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableInputFormat.java ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.batch.connectors.hive; + +import org.apache.flink.api.common.io.LocatableInputSplitAssigner; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.hadoop.common.HadoopInputFormatCommonBase; +import org.apache.flink.api.java.hadoop.mapred.wrapper.HadoopDummyReporter; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.core.io.InputSplitAssigner; +import org.apache.flink.table.catalog.hive.util.HiveTableUtil; +import org.apache.flink.types.Row; + +import org.apache.hadoop.conf.Configurable; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.serde2.Deserializer; +import org.apache.hadoop.hive.serde2.SerDeUtils; +import org.apache.hadoop.hive.serde2.objectinspector.StructField; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.InputFormat; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.JobConfigurable; +import org.apache.hadoop.mapred.RecordReader; +import org.apache.hadoop.security.Credentials; +import org.apache.hadoop.security.UserGroupInformation; +import org.apache.hadoop.util.ReflectionUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.util.ArrayList; +import java.util.List; +import java.util.Properties; + +import static org.apache.flink.util.Preconditions.checkNotNull; +import static org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR; + +/** + * The HiveTableInputFormat are inspired by the HCatInputFormat and HadoopInputFormatBase. + * It's used to read from hive partition/non-partition table. + */ +public class HiveTableInputFormat extends HadoopInputFormatCommonBase + implements ResultTypeQueryable { + private static final long serialVersionUID = 6351448428766433164L; + private static Logger logger = LoggerFactory.getLogger(HiveTableInputFormat.class); + + private JobConf jobConf; + + protected transient Writable key; + protected transient Writable value; + + private transient RecordReader recordReader; + protected transient boolean fetched = false; + protected transient boolean hasNext; + + private Boolean isPartitioned; + private RowTypeInfo rowTypeInfo; + + //Necessary info to init deserializer + private String[] partitionColNames; + //For non-partition hive table, partitions only contains one partition which partitionValues is empty. + private List partitions; + private transient Deserializer deserializer; + //Hive StructField list contain all related info for specific serde. + private transient List structFields; + //StructObjectInspector in hive helps us to look into the internal structure of a struct object. + private transient StructObjectInspector structObjectInspector; + private transient InputFormat mapredInputFormat; + private transient HiveTablePartition hiveTablePartition; + + public HiveTableInputFormat( + JobConf jobConf, + Boolean isPartitioned, + String[] partitionColNames, + List partitions, + RowTypeInfo rowTypeInfo) { + super(jobConf.getCredentials()); + this.rowTypeInfo = checkNotNull(rowTypeInfo, "rowTypeInfo can not be null."); + this
[GitHub] [flink] zjuwangg commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables
zjuwangg commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables URL: https://github.com/apache/flink/pull/8522#discussion_r290110408 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableInputFormat.java ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.batch.connectors.hive; + +import org.apache.flink.api.common.io.LocatableInputSplitAssigner; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.hadoop.common.HadoopInputFormatCommonBase; +import org.apache.flink.api.java.hadoop.mapred.wrapper.HadoopDummyReporter; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.core.io.InputSplitAssigner; +import org.apache.flink.table.catalog.hive.util.HiveTableUtil; +import org.apache.flink.types.Row; + +import org.apache.hadoop.conf.Configurable; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.serde2.Deserializer; +import org.apache.hadoop.hive.serde2.SerDeUtils; +import org.apache.hadoop.hive.serde2.objectinspector.StructField; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.InputFormat; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.JobConfigurable; +import org.apache.hadoop.mapred.RecordReader; +import org.apache.hadoop.security.Credentials; +import org.apache.hadoop.security.UserGroupInformation; +import org.apache.hadoop.util.ReflectionUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.util.ArrayList; +import java.util.List; +import java.util.Properties; + +import static org.apache.flink.util.Preconditions.checkNotNull; +import static org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR; + +/** + * The HiveTableInputFormat are inspired by the HCatInputFormat and HadoopInputFormatBase. Review comment: yes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] zjuwangg commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables
zjuwangg commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables URL: https://github.com/apache/flink/pull/8522#discussion_r290110147 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableInputFormat.java ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.batch.connectors.hive; + +import org.apache.flink.api.common.io.LocatableInputSplitAssigner; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.hadoop.common.HadoopInputFormatCommonBase; +import org.apache.flink.api.java.hadoop.mapred.wrapper.HadoopDummyReporter; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.core.io.InputSplitAssigner; +import org.apache.flink.table.catalog.hive.util.HiveTableUtil; +import org.apache.flink.types.Row; + +import org.apache.hadoop.conf.Configurable; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.serde2.Deserializer; +import org.apache.hadoop.hive.serde2.SerDeUtils; +import org.apache.hadoop.hive.serde2.objectinspector.StructField; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.InputFormat; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.JobConfigurable; +import org.apache.hadoop.mapred.RecordReader; +import org.apache.hadoop.security.Credentials; +import org.apache.hadoop.security.UserGroupInformation; +import org.apache.hadoop.util.ReflectionUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.util.ArrayList; +import java.util.List; +import java.util.Properties; + +import static org.apache.flink.util.Preconditions.checkNotNull; +import static org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR; + +/** + * The HiveTableInputFormat are inspired by the HCatInputFormat and HadoopInputFormatBase. + * It's used to read from hive partition/non-partition table. + */ +public class HiveTableInputFormat extends HadoopInputFormatCommonBase + implements ResultTypeQueryable { + private static final long serialVersionUID = 6351448428766433164L; + private static Logger logger = LoggerFactory.getLogger(HiveTableInputFormat.class); + + private JobConf jobConf; + + protected transient Writable key; + protected transient Writable value; + + private transient RecordReader recordReader; + protected transient boolean fetched = false; + protected transient boolean hasNext; + + private Boolean isPartitioned; + private RowTypeInfo rowTypeInfo; + + //Necessary info to init deserializer + private String[] partitionColNames; + //For non-partition hive table, partitions only contains one partition which partitionValues is empty. + private List partitions; + private transient Deserializer deserializer; + //Hive StructField list contain all related info for specific serde. + private transient List structFields; + //StructObjectInspector in hive helps us to look into the internal structure of a struct object. + private transient StructObjectInspector structObjectInspector; + private transient InputFormat mapredInputFormat; + private transient HiveTablePartition hiveTablePartition; + + public HiveTableInputFormat( + JobConf jobConf, + Boolean isPartitioned, + String[] partitionColNames, + List partitions, + RowTypeInfo rowTypeInfo) { + super(jobConf.getCredentials()); + this.rowTypeInfo = checkNotNull(rowTypeInfo, "rowTypeInfo can not be null."); + this
[jira] [Comment Edited] (FLINK-12070) Make blocking result partitions consumable multiple times
[ https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855252#comment-16855252 ] zhijiang edited comment on FLINK-12070 at 6/4/19 2:59 AM: -- Agree with Stephan and Stefan's opinions. We could add the batch blocking case for micro benchmark, then it could verify the performance changes for every merged commit for flink-1.9. I also think it is no need to write to mapped region as now. We actually do not confirm the mmap regions would be consumed immediately by downstream side after producer finishes, because it is up to scheduler decision and whether it has enough resource to schedule consumers. It is necessary to maintain different ways for reading files. Based on my previous lucene index experience, it also provides three ways for reading index files. * Files.newByteChannel for simple way. * Java nio FileChannel way. * Mmap way for large files in 64bit system, and with more free physical memory for mmap as Stefan mentioned. Then we could compare the behaviors for different ways and also provide more choices for users. was (Author: zjwang): Agree with Stephan and Stefan's opinions. We could add the batch blocking case for micro benchmark, then it could verify the performance changes for every merged commit for flink-1.9. I also think it is no need to write to mapped region as now. We actually do not confirm the mmap regions would be consumed immediately by downstream side after producer finishes, becuase it is up to scheduler decision and whether it has enough resource to schedule consumers. It is necessary to maintain different ways for reading files. Based on my previous lucene index experience, it also provides three ways for reading index files. * Files.newByteChannel for simple way. * Java nio FileChannel way. * Mmap way for large files in 64bit system, and with more free physical memory for mmap as Stefan mentioned. Then we could compare the behaviors for different ways and also provide more choices for users. > Make blocking result partitions consumable multiple times > - > > Key: FLINK-12070 > URL: https://issues.apache.org/jira/browse/FLINK-12070 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network >Affects Versions: 1.9.0 >Reporter: Till Rohrmann >Assignee: Stephan Ewen >Priority: Blocker > Labels: pull-request-available > Fix For: 1.9.0 > > Attachments: image-2019-04-18-17-38-24-949.png > > Time Spent: 20m > Remaining Estimate: 0h > > In order to avoid writing produced results multiple times for multiple > consumers and in order to speed up batch recoveries, we should make the > blocking result partitions to be consumable multiple times. At the moment a > blocking result partition will be released once the consumers has processed > all data. Instead the result partition should be released once the next > blocking result has been produced and all consumers of a blocking result > partition have terminated. Moreover, blocking results should not hold on slot > resources like network buffers or memory as it is currently the case with > {{SpillableSubpartitions}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] zjuwangg commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables
zjuwangg commented on a change in pull request #8522: [FLINK-12572][hive]Implement HiveInputFormat to read Hive tables URL: https://github.com/apache/flink/pull/8522#discussion_r290109929 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/batch/connectors/hive/HiveTableInputFormat.java ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.batch.connectors.hive; + +import org.apache.flink.api.common.io.LocatableInputSplitAssigner; +import org.apache.flink.api.common.io.statistics.BaseStatistics; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.api.java.hadoop.common.HadoopInputFormatCommonBase; +import org.apache.flink.api.java.hadoop.mapred.wrapper.HadoopDummyReporter; +import org.apache.flink.api.java.typeutils.ResultTypeQueryable; +import org.apache.flink.api.java.typeutils.RowTypeInfo; +import org.apache.flink.core.io.InputSplitAssigner; +import org.apache.flink.table.catalog.hive.util.HiveTableUtil; +import org.apache.flink.types.Row; + +import org.apache.hadoop.conf.Configurable; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.metastore.api.StorageDescriptor; +import org.apache.hadoop.hive.serde2.Deserializer; +import org.apache.hadoop.hive.serde2.SerDeUtils; +import org.apache.hadoop.hive.serde2.objectinspector.StructField; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.InputFormat; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.JobConfigurable; +import org.apache.hadoop.mapred.RecordReader; +import org.apache.hadoop.security.Credentials; +import org.apache.hadoop.security.UserGroupInformation; +import org.apache.hadoop.util.ReflectionUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.util.ArrayList; +import java.util.List; +import java.util.Properties; + +import static org.apache.flink.util.Preconditions.checkNotNull; +import static org.apache.hadoop.mapreduce.lib.input.FileInputFormat.INPUT_DIR; + +/** + * The HiveTableInputFormat are inspired by the HCatInputFormat and HadoopInputFormatBase. + * It's used to read from hive partition/non-partition table. + */ +public class HiveTableInputFormat extends HadoopInputFormatCommonBase + implements ResultTypeQueryable { + private static final long serialVersionUID = 6351448428766433164L; + private static Logger logger = LoggerFactory.getLogger(HiveTableInputFormat.class); + + private JobConf jobConf; + + protected transient Writable key; + protected transient Writable value; + + private transient RecordReader recordReader; + protected transient boolean fetched = false; + protected transient boolean hasNext; + + private Boolean isPartitioned; + private RowTypeInfo rowTypeInfo; + + //Necessary info to init deserializer + private String[] partitionColNames; + //For non-partition hive table, partitions only contains one partition which partitionValues is empty. + private List partitions; + private transient Deserializer deserializer; + //Hive StructField list contain all related info for specific serde. + private transient List structFields; + //StructObjectInspector in hive helps us to look into the internal structure of a struct object. + private transient StructObjectInspector structObjectInspector; + private transient InputFormat mapredInputFormat; + private transient HiveTablePartition hiveTablePartition; + + public HiveTableInputFormat( + JobConf jobConf, + Boolean isPartitioned, + String[] partitionColNames, + List partitions, + RowTypeInfo rowTypeInfo) { + super(jobConf.getCredentials()); + this.rowTypeInfo = checkNotNull(rowTypeInfo, "rowTypeInfo can not be null."); + this
[jira] [Comment Edited] (FLINK-12070) Make blocking result partitions consumable multiple times
[ https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855252#comment-16855252 ] zhijiang edited comment on FLINK-12070 at 6/4/19 2:58 AM: -- Agree with Stephan and Stefan's opinions. We could add the batch blocking case for micro benchmark, then it could verify the performance changes for every merged commit for flink-1.9. I also think it is no need to write to mapped region as now. We actually do not confirm the mmap regions would be consumed immediately by downstream side after producer finishes, becuase it is up to scheduler decision and whether it has enough resource to schedule consumers. It is necessary to maintain different ways for reading files. Based on my previous lucene index experience, it also provides three ways for reading index files. * Files.newByteChannel for simple way. * Java nio FileChannel way. * Mmap way for large files in 64bit system, and with more free physical memory for mmap as Stefan mentioned. Then we could compare the behaviors for different ways and also provide more choices for users. was (Author: zjwang): Agree with Stephan and Stefan's opinions. We could add the batch blocking case for micro benchmark, then it could verify the performance changes for every merged commit for flink-1.9. I also think it is no need to write to mapped region as now, because we actually do not confirm the mmap regions would be consumed immediately by downstream side after producer finishes, and it is up to scheduler decision and whether it has enough resource to schedule consumers. It is necessary to maintain different ways for reading files. Based on my previous lucene index experience, it also provides three ways for reading index files. * Files.newByteChannel for simple way. * Java nio FileChannel way. * Mmap way for large files in 64bit system, and with more free physical memory for mmap as Stefan mentioned. Then we could compare the behaviors for different ways and also provide more choices for users. > Make blocking result partitions consumable multiple times > - > > Key: FLINK-12070 > URL: https://issues.apache.org/jira/browse/FLINK-12070 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network >Affects Versions: 1.9.0 >Reporter: Till Rohrmann >Assignee: Stephan Ewen >Priority: Blocker > Labels: pull-request-available > Fix For: 1.9.0 > > Attachments: image-2019-04-18-17-38-24-949.png > > Time Spent: 20m > Remaining Estimate: 0h > > In order to avoid writing produced results multiple times for multiple > consumers and in order to speed up batch recoveries, we should make the > blocking result partitions to be consumable multiple times. At the moment a > blocking result partition will be released once the consumers has processed > all data. Instead the result partition should be released once the next > blocking result has been produced and all consumers of a blocking result > partition have terminated. Moreover, blocking results should not hold on slot > resources like network buffers or memory as it is currently the case with > {{SpillableSubpartitions}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12070) Make blocking result partitions consumable multiple times
[ https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855252#comment-16855252 ] zhijiang commented on FLINK-12070: -- Agree with Stephan and Stefan's opinions. We could add the batch blocking case for micro benchmark, then it could verify the performance changes for every merged commit for flink-1.9. I also think it is no need to write to mapped region as now, because we actually do not confirm the mmap regions would be consumed immediately by downstream side after producer finishes, and it is up to scheduler decision and whether it has enough resource to schedule consumers. It is necessary to maintain different ways for reading files. Based on my previous lucene index experience, it also provides three ways for reading index files. * Files.newByteChannel for simple way. * Java nio FileChannel way. * Mmap way for large files in 64bit system, and with more free physical memory for mmap as Stefan mentioned. Then we could compare the behaviors for different ways and also provide more choices for users. > Make blocking result partitions consumable multiple times > - > > Key: FLINK-12070 > URL: https://issues.apache.org/jira/browse/FLINK-12070 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network >Affects Versions: 1.9.0 >Reporter: Till Rohrmann >Assignee: Stephan Ewen >Priority: Blocker > Labels: pull-request-available > Fix For: 1.9.0 > > Attachments: image-2019-04-18-17-38-24-949.png > > Time Spent: 20m > Remaining Estimate: 0h > > In order to avoid writing produced results multiple times for multiple > consumers and in order to speed up batch recoveries, we should make the > blocking result partitions to be consumable multiple times. At the moment a > blocking result partition will be released once the consumers has processed > all data. Instead the result partition should be released once the next > blocking result has been produced and all consumers of a blocking result > partition have terminated. Moreover, blocking results should not hold on slot > resources like network buffers or memory as it is currently the case with > {{SpillableSubpartitions}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] lirui-apache commented on a change in pull request #8589: [FLINK-12677][hive][sql-client] Add descriptor, validator, and factory for HiveCatalog
lirui-apache commented on a change in pull request #8589: [FLINK-12677][hive][sql-client] Add descriptor, validator, and factory for HiveCatalog URL: https://github.com/apache/flink/pull/8589#discussion_r290109158 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/descriptors/HiveCatalogDescriptor.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.catalog.hive.descriptors; + +import org.apache.flink.table.catalog.hive.HiveCatalog; +import org.apache.flink.table.descriptors.CatalogDescriptor; +import org.apache.flink.table.descriptors.DescriptorProperties; +import org.apache.flink.util.Preconditions; +import org.apache.flink.util.StringUtils; + +import java.util.Map; + +import static org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator.CATALOG_PROPERTOES_HIVE_METASTORE_URIS; +import static org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator.CATALOG_TYPE_VALUE_HIVE; + +/** + * Catalog descriptor for {@link HiveCatalog}. + */ +public class HiveCatalogDescriptor extends CatalogDescriptor { + + private String hiveMetastoreUris; + + // TODO : set default database + public HiveCatalogDescriptor(String hiveMetastoreUris) { + super(CATALOG_TYPE_VALUE_HIVE, 1); + + Preconditions.checkArgument(!StringUtils.isNullOrWhitespaceOnly(hiveMetastoreUris)); + this.hiveMetastoreUris = hiveMetastoreUris; Review comment: I'd prefer to let user explicitly specify the URIs, especially if we allow multiple hive catalogs talking to different HMS instances in the yaml file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] sunjincheng121 commented on issue #8550: [FLINK-12401][table] Support incremental emit under AccRetract mode for non-window streaming FlatAggregate on Table API
sunjincheng121 commented on issue #8550: [FLINK-12401][table] Support incremental emit under AccRetract mode for non-window streaming FlatAggregate on Table API URL: https://github.com/apache/flink/pull/8550#issuecomment-498498965 HI @hequn8128, Thanks for the PR. Overall, I think I should think about the interface method again, due to in this PR there is some change(I think it's a good point) between the design doc of FLIP-29. Then review this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-12426) TM occasionally hang in deploying state
[ https://issues.apache.org/jira/browse/FLINK-12426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855247#comment-16855247 ] Haibo Sun commented on FLINK-12426: --- [~QiLuo], the pull request of FLINK-12547 has been merged into the master branch. If this JIRA is the same problem as it, can you close this issue as a duplicate? > TM occasionally hang in deploying state > --- > > Key: FLINK-12426 > URL: https://issues.apache.org/jira/browse/FLINK-12426 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: Qi >Priority: Major > > Hi all, > > We use Flink batch and start thousands of jobs per day. Occasionally we > observed some stuck jobs, due to some TM hang in “DEPLOYING” state. > > It seems that the TM is calling BlobClient to download jars from > JM/BlobServer. Under hood it’s calling Socket.connect() and then > Socket.read() to retrieve results. > > These jobs usually have many TM slots (1~2k). We checked the TM log and > dumped the TM thread. It indeed hung on socket read to download jar from Blob > server. > > We're using Flink 1.5 but this may also affect later versions since related > code are not changed much. We've tried to add socket timeout in BlobClient, > but still no luck. > > > TM log > > ... > INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Received task > DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) > (184/2000). > INFO org.apache.flink.runtime.taskmanager.Task - DataSource (at > createInput(ExecutionEnvironment.java:548) (our.code)) (184/2000) switched > from CREATED to DEPLOYING. > INFO org.apache.flink.runtime.taskmanager.Task - Creating FileSystem stream > leak safety net for task DataSource (at > createInput(ExecutionEnvironment.java:548) (our.code)) (184/2000) [DEPLOYING] > INFO org.apache.flink.runtime.taskmanager.Task - Loading JAR files for task > DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) > (184/2000) [DEPLOYING]. > INFO org.apache.flink.runtime.blob.BlobClient - Downloading > 19e65c0caa41f264f9ffe4ca2a48a434/p-3ecd6341bf97d5512b14c93f6c9f51f682b6db26-37d5e69d156ee00a924c1ebff0c0d280 > from some-host-ip-port > {color:#22}no more logs...{color} > > > TM thread dump: > > _"DataSource (at createInput(ExecutionEnvironment.java:548) (our.code)) > (1999/2000)" #72 prio=5 os_prio=0 tid=0x7fb9a1521000 nid=0xa0994 runnable > [0x7fb97cfbf000]_ > _java.lang.Thread.State: RUNNABLE_ > _at java.net.SocketInputStream.socketRead0(Native Method)_ > _at > java.net.SocketInputStream.socketRead(SocketInputStream.java:116)_ > _at java.net.SocketInputStream.read(SocketInputStream.java:171)_ > _at java.net.SocketInputStream.read(SocketInputStream.java:141)_ > _at > org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:152)_ > _at > org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:140)_ > _at > org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:170)_ > _at > org.apache.flink.runtime.blob.AbstractBlobCache.getFileInternal(AbstractBlobCache.java:181)_ > _at > org.apache.flink.runtime.blob.PermanentBlobCache.getFile(PermanentBlobCache.java:206)_ > _at > org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:120)_ > _- locked <0x00078ab60ba8> (a java.lang.Object)_ > _at > org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:893)_ > _at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585)_ > _at java.lang.Thread.run(Thread.java:748)_ > __ > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-12547) Deadlock when the task thread downloads jars using BlobClient
[ https://issues.apache.org/jira/browse/FLINK-12547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Sun closed FLINK-12547. - > Deadlock when the task thread downloads jars using BlobClient > - > > Key: FLINK-12547 > URL: https://issues.apache.org/jira/browse/FLINK-12547 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.8.0 >Reporter: Haibo Sun >Assignee: Haibo Sun >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0, 1.8.1 > > Time Spent: 20m > Remaining Estimate: 0h > > The jstack is as follows (this jstack is from an old Flink version, but the > master branch has the same problem). > {code:java} > "Source: Custom Source (76/400)" #68 prio=5 os_prio=0 tid=0x7f8139cd3000 > nid=0xe2 runnable [0x7f80da5fd000] > java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:170) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at > org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:152) > at > org.apache.flink.runtime.blob.BlobInputStream.read(BlobInputStream.java:140) > at > org.apache.flink.runtime.blob.BlobClient.downloadFromBlobServer(BlobClient.java:164) > at > org.apache.flink.runtime.blob.AbstractBlobCache.getFileInternal(AbstractBlobCache.java:181) > at > org.apache.flink.runtime.blob.PermanentBlobCache.getFile(PermanentBlobCache.java:206) > at > org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.registerTask(BlobLibraryCacheManager.java:120) > - locked <0x00062cf2a188> (a java.lang.Object) > at > org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:968) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:604) > at java.lang.Thread.run(Thread.java:834) > Locked ownable synchronizers: > - None > {code} > > The reason is that SO_TIMEOUT is not set in the socket connection of the blob > client. When the network packet loss seriously due to the high CPU load of > the machine, the blob client connection fails to perceive that the server has > been disconnected, which results in blocking in the native method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] lirui-apache commented on issue #8536: [FLINK-12568][hive] Implement OutputFormat to write Hive tables
lirui-apache commented on issue #8536: [FLINK-12568][hive] Implement OutputFormat to write Hive tables URL: https://github.com/apache/flink/pull/8536#issuecomment-498497073 > moreover, please have correct commit message in the future. E.g. the first commit's msg should be "[FLINK-12568][hive] Implement OutputFormat to write Hive tables" Got it. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot commented on issue #8604: [DOC] add ')' for the doc checkpoints.md
flinkbot commented on issue #8604: [DOC] add ')' for the doc checkpoints.md URL: https://github.com/apache/flink/pull/8604#issuecomment-498496138 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] hehuiyuan opened a new pull request #8604: [DOC] add ')' for the doc checkpoints.md
hehuiyuan opened a new pull request #8604: [DOC] add ')' for the doc checkpoints.md URL: https://github.com/apache/flink/pull/8604 For `Configure for per job when constructing the state backend`, add ')' This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-10455) Potential Kafka producer leak in case of failures
[ https://issues.apache.org/jira/browse/FLINK-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855241#comment-16855241 ] sunjincheng commented on FLINK-10455: - [~becket_qin] is there any update from your side? > Potential Kafka producer leak in case of failures > - > > Key: FLINK-10455 > URL: https://issues.apache.org/jira/browse/FLINK-10455 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.5.2 >Reporter: Nico Kruber >Assignee: Jiangjie Qin >Priority: Critical > Labels: pull-request-available > Fix For: 1.5.6, 1.7.0, 1.7.3, 1.9.0, 1.8.1 > > > If the Kafka brokers' timeout is too low for our checkpoint interval [1], we > may get an {{ProducerFencedException}}. Documentation around > {{ProducerFencedException}} explicitly states that we should close the > producer after encountering it. > By looking at the code, it doesn't seem like this is actually done in > {{FlinkKafkaProducer011}}. Also, in case one transaction's commit in > {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} fails with an > exception, we don't clean up (nor try to commit) any other transaction. > -> from what I see, {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} > simply iterates over the {{pendingCommitTransactions}} which is not touched > during {{close()}} > Now if we restart the failing job on the same Flink cluster, any resources > from the previous attempt will still linger around. > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kafka.html#kafka-011 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] sunjincheng121 commented on issue #8550: [FLINK-12401][table] Support incremental emit under AccRetract mode for non-window streaming FlatAggregate on Table API
sunjincheng121 commented on issue #8550: [FLINK-12401][table] Support incremental emit under AccRetract mode for non-window streaming FlatAggregate on Table API URL: https://github.com/apache/flink/pull/8550#issuecomment-498492312 @flinkbot approve description This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot commented on issue #8603: [FLINK-12608][runtime] Add getVertexOrThrow and
flinkbot commented on issue #8603: [FLINK-12608][runtime] Add getVertexOrThrow and URL: https://github.com/apache/flink/pull/8603#issuecomment-498492126 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] eaglewatcherwb opened a new pull request #8603: [FLINK-12608][runtime] Add getVertexOrThrow and
eaglewatcherwb opened a new pull request #8603: [FLINK-12608][runtime] Add getVertexOrThrow and URL: https://github.com/apache/flink/pull/8603 getResultPartitionOrThrow in SchedulingTopology to simplify code structure Change-Id: I33be923c3279755d360e95336199fb1bc3257f4f ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* ## Brief change log - *Add getVertexOrThrow and and getResultPartitionOrThrow in SchedulingTopology.* ## Verifying this change This change added tests and can be verified as follows: - *Added unit tests in ExecutionGraphToSchedulingTopologyAdapterTest. * ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12608) Add getVertex/ResultPartitionOrThrow(ExecutionVertexID/IntermediateResultPartitionID) to SchedulingTopology
[ https://issues.apache.org/jira/browse/FLINK-12608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12608: --- Labels: pull-request-available (was: ) > Add > getVertex/ResultPartitionOrThrow(ExecutionVertexID/IntermediateResultPartitionID) > to SchedulingTopology > --- > > Key: FLINK-12608 > URL: https://issues.apache.org/jira/browse/FLINK-12608 > Project: Flink > Issue Type: Sub-task >Reporter: BoWang >Assignee: BoWang >Priority: Minor > Labels: pull-request-available > > As discussed in > [PR#8309|https://github.com/apache/flink/pull/8309#discussion_r287190944], > need to add getVertexOrThrow getResultPartitionOrThrow in SchedulingTopology. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12619) Support TERMINATE/SUSPEND Job with Checkpoint
[ https://issues.apache.org/jira/browse/FLINK-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855236#comment-16855236 ] vinoyang commented on FLINK-12619: -- Hi [~klion26] and [~aljoscha] I think this issue needs to be considered together with FLINK-6755 which has tried to introduce trigger checkpoint by CLI a long time ago. I think we can move the further discussion to that issue. > Support TERMINATE/SUSPEND Job with Checkpoint > - > > Key: FLINK-12619 > URL: https://issues.apache.org/jira/browse/FLINK-12619 > Project: Flink > Issue Type: New Feature > Components: Runtime / State Backends >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > Inspired by the idea of FLINK-11458, we propose to support terminate/suspend > a job with checkpoint. This improvement cooperates with incremental and > external checkpoint features, that if checkpoint is retained and this feature > is configured, we will trigger a checkpoint before the job stops. It could > accelarate job recovery a lot since: > 1. No source rewinding required any more. > 2. It's much faster than taking a savepoint since incremental checkpoint is > enabled. > Please note that conceptually savepoints is different from checkpoint in a > similar way that backups are different from recovery logs in traditional > database systems. So we suggest using this feature only for job recovery, > while stick with FLINK-11458 for the > upgrading/cross-cluster-job-migration/state-backend-switch cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-1722) Streaming not respecting FinalizeOnMaster for output formats
[ https://issues.apache.org/jira/browse/FLINK-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855222#comment-16855222 ] Haibo Sun commented on FLINK-1722: -- OK, I will. > Streaming not respecting FinalizeOnMaster for output formats > > > Key: FLINK-1722 > URL: https://issues.apache.org/jira/browse/FLINK-1722 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Robert Metzger >Priority: Major > > The Hadoop output formats execute a process in the end to move the produced > files from a temp directory to the final location. > The batch API is annotating output formats that execute something in the end > with the {{FinalizeOnMaster}} interface. > The streaming component is not respecting this interface. Hence, > HadoopOutputFormats aren't writing their final data into the desired > destination. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-1722) Streaming not respecting FinalizeOnMaster for output formats
[ https://issues.apache.org/jira/browse/FLINK-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Sun reassigned FLINK-1722: Assignee: Haibo Sun > Streaming not respecting FinalizeOnMaster for output formats > > > Key: FLINK-1722 > URL: https://issues.apache.org/jira/browse/FLINK-1722 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: Robert Metzger >Assignee: Haibo Sun >Priority: Major > > The Hadoop output formats execute a process in the end to move the produced > files from a temp directory to the final location. > The batch API is annotating output formats that execute something in the end > with the {{FinalizeOnMaster}} interface. > The streaming component is not respecting this interface. Hence, > HadoopOutputFormats aren't writing their final data into the desired > destination. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] Armstrongya commented on issue #8597: [FLINK-12438][doc-zh]Translate Task Lifecycle into Chinese
Armstrongya commented on issue #8597: [FLINK-12438][doc-zh]Translate Task Lifecycle into Chinese URL: https://github.com/apache/flink/pull/8597#issuecomment-498487430 Hi @klion26 ,I'm confused with this Travis CI build check. I have already rebase and change my commit history, it looks like one line as follow picture. But why it's still check failed. ![image](https://user-images.githubusercontent.com/5070956/58845324-49f88a00-86ad-11e9-8ed0-4a65cdd67a6c.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12714) confusion about flink time window TimeWindow#getWindowStartWithOffset
[ https://issues.apache.org/jira/browse/FLINK-12714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenshuai Hou updated FLINK-12714: - Description: hi flink team, i think the flink doc on how time windows are created and the javadoc of related method is a little confusing. They give me the impression that the window-start equals to the timestamp of the first event assigned to that window, however the window-start is actually quantized by the windowSize, see links below. Is this the intention or is this a mistake? If it's intended, can we please leave a comment in flink doc somewhere to make this more clear? I spent 5 hours wondering why my flink tests fail until i find that method. Thanks Wen links: The flink doc from here : [https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#window-lifecycle] > In a nutshell, a window is *created* as soon as the first element that should > belong to this window arrives the java doc of this method: org.apache.flink.streaming.api.windowing.windows.TimeWindow#getWindowStartWithOffset {code:java} /** * Method to get the window start for a timestamp. * * @param timestamp epoch millisecond to get the window start. * @param offset The offset which window start would be shifted by. * @param windowSize The size of the generated windows. * @return window start */ public static long getWindowStartWithOffset(long timestamp, long offset, long windowSize) { {code} was: hi flink team, i think the flink doc on how time windows are created an the javadoc of related method is a little confusing. They give me the impression that the window-start equals to the timestamp of the first event assigned to that window, however the window-start is actually quantized by the windowSize, see links below. Is this the intention or is this a mistake? can we please leave a comment in flink doc somewhere to make this more clear? I spent 5 hours wondering why my flink tests fail until i find that method. Thanks Wen links: The flink doc from here : [https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#window-lifecycle] > In a nutshell, a window is *created* as soon as the first element that should > belong to this window arrives the java doc of this method: org.apache.flink.streaming.api.windowing.windows.TimeWindow#getWindowStartWithOffset {code:java} /** * Method to get the window start for a timestamp. * * @param timestamp epoch millisecond to get the window start. * @param offset The offset which window start would be shifted by. * @param windowSize The size of the generated windows. * @return window start */ public static long getWindowStartWithOffset(long timestamp, long offset, long windowSize) { {code} > confusion about flink time window TimeWindow#getWindowStartWithOffset > - > > Key: FLINK-12714 > URL: https://issues.apache.org/jira/browse/FLINK-12714 > Project: Flink > Issue Type: Improvement > Components: flink-contrib >Affects Versions: 1.8.0 >Reporter: Wenshuai Hou >Priority: Minor > > > hi flink team, i think the flink doc on how time windows are created and the > javadoc of related method is a little confusing. They give me the impression > that the window-start equals to the timestamp of the first event assigned to > that window, however the window-start is actually quantized by the > windowSize, see links below. Is this the intention or is this a mistake? If > it's intended, can we please leave a comment in flink doc somewhere to make > this more clear? I spent 5 hours wondering why my flink tests fail until i > find that method. > > Thanks > Wen > > links: > > > The flink doc from here : > [https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#window-lifecycle] > > > In a nutshell, a window is *created* as soon as the first element that > > should belong to this window arrives > > the java doc of this method: > org.apache.flink.streaming.api.windowing.windows.TimeWindow#getWindowStartWithOffset > > > {code:java} > /** > * Method to get the window start for a timestamp. > * > * @param timestamp epoch millisecond to get the window start. > * @param offset The offset which window start would be shifted by. > * @param windowSize The size of the generated windows. > * @return window start > */ > public static long getWindowStartWithOffset(long timestamp, long offset, long > windowSize) { > {code} > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-12712) deprecate ExternalCatalog and its subclasses and impls
[ https://issues.apache.org/jira/browse/FLINK-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Li closed FLINK-12712. Resolution: Fixed merged in 1.9.0: 7682248f6dca971eae1d076dbf0282358331a0e7 > deprecate ExternalCatalog and its subclasses and impls > -- > > Key: FLINK-12712 > URL: https://issues.apache.org/jira/browse/FLINK-12712 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Bowen Li >Assignee: Bowen Li >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] asfgit closed pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog
asfgit closed pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog URL: https://github.com/apache/flink/pull/8601 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (FLINK-12713) deprecate descriptor, validator, and factory of ExternalCatalog
[ https://issues.apache.org/jira/browse/FLINK-12713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bowen Li closed FLINK-12713. Resolution: Fixed merged in 1.9.0: 61d8916a35470d6c122d9b78c1f3ae7aa9996949 > deprecate descriptor, validator, and factory of ExternalCatalog > --- > > Key: FLINK-12713 > URL: https://issues.apache.org/jira/browse/FLINK-12713 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Client >Reporter: Bowen Li >Assignee: Bowen Li >Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12714) confusion about flink time window TimeWindow#getWindowStartWithOffset
Wenshuai Hou created FLINK-12714: Summary: confusion about flink time window TimeWindow#getWindowStartWithOffset Key: FLINK-12714 URL: https://issues.apache.org/jira/browse/FLINK-12714 Project: Flink Issue Type: Improvement Components: flink-contrib Affects Versions: 1.8.0 Reporter: Wenshuai Hou hi flink team, i think the flink doc on how time windows are created an the javadoc of related method is a little confusing. They give me the impression that the window-start equals to the timestamp of the first event assigned to that window, however the window-start is actually quantized by the windowSize, see links below. Is this the intention or is this a mistake? can we please leave a comment in flink doc somewhere to make this more clear? I spent 5 hours wondering why my flink tests fail until i find that method. Thanks Wen links: The flink doc from here : [https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#window-lifecycle] > In a nutshell, a window is *created* as soon as the first element that should > belong to this window arrives the java doc of this method: org.apache.flink.streaming.api.windowing.windows.TimeWindow#getWindowStartWithOffset {code:java} /** * Method to get the window start for a timestamp. * * @param timestamp epoch millisecond to get the window start. * @param offset The offset which window start would be shifted by. * @param windowSize The size of the generated windows. * @return window start */ public static long getWindowStartWithOffset(long timestamp, long offset, long windowSize) { {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] asfgit closed pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, and related util classes
asfgit closed pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, and related util classes URL: https://github.com/apache/flink/pull/8600 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on issue #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests
bowenli86 commented on issue #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests URL: https://github.com/apache/flink/pull/8600#issuecomment-498459861 @xuefuz thanks for your review. Merging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on issue #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog
bowenli86 commented on issue #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog URL: https://github.com/apache/flink/pull/8601#issuecomment-498459824 @xuefuz thanks for your review. Addressed all comments. Merging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables
xuefuz commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables URL: https://github.com/apache/flink/pull/8549#discussion_r290072835 ## File path: flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/table/operations/DataStreamTableOperation.java ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.operations; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.streaming.api.datastream.DataStream; +import org.apache.flink.table.api.TableSchema; + +import java.util.Collections; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; + +/** + * Describes a relational operation that reads from a {@link DataStream}. + * + * This operation may expose only part, or change the order of the fields available in a + * {@link org.apache.flink.api.common.typeutils.CompositeType} of the underlying {@link DataStream}. + * The {@link DataStreamTableOperation#getFieldIndices()} describes the mapping between fields of the + * {@link TableSchema} to the {@link org.apache.flink.api.common.typeutils.CompositeType}. + */ +@Internal +public class DataStreamTableOperation extends TableOperation { Review comment: javadoc for ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables
xuefuz commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables URL: https://github.com/apache/flink/pull/8549#discussion_r290072478 ## File path: flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/table/operations/DataSetTableOperation.java ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.operations; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.table.api.TableSchema; + +import java.util.Collections; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; + +/** + * Describes a relational operation that reads from a {@link DataSet}. + * + * This operation may expose only part, or change the order of the fields available in a + * {@link org.apache.flink.api.common.typeutils.CompositeType} of the underlying {@link DataSet}. + * The {@link DataSetTableOperation#getFieldIndices()} describes the mapping between fields of the + * {@link TableSchema} to the {@link org.apache.flink.api.common.typeutils.CompositeType}. + */ +@Internal +public class DataSetTableOperation extends TableOperation { + + private final DataSet dataStream; Review comment: Is there a special reason that we name the variable as dataStream instead of dataSet? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables
xuefuz commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables URL: https://github.com/apache/flink/pull/8549#discussion_r290072031 ## File path: flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/table/operations/DataSetTableOperation.java ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.operations; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.table.api.TableSchema; + +import java.util.Collections; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; + +/** + * Describes a relational operation that reads from a {@link DataSet}. + * + * This operation may expose only part, or change the order of the fields available in a + * {@link org.apache.flink.api.common.typeutils.CompositeType} of the underlying {@link DataSet}. + * The {@link DataSetTableOperation#getFieldIndices()} describes the mapping between fields of the + * {@link TableSchema} to the {@link org.apache.flink.api.common.typeutils.CompositeType}. + */ +@Internal +public class DataSetTableOperation extends TableOperation { Review comment: javadoc for ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog
xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog URL: https://github.com/apache/flink/pull/8601#discussion_r290068591 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/descriptors/ExternalCatalogDescriptorValidator.java ## @@ -18,12 +18,12 @@ package org.apache.flink.table.descriptors; -import org.apache.flink.annotation.Internal; - /** * Validator for {@link ExternalCatalogDescriptor}. + * + * @deprecated use {@link CatalogDescriptorValidator} instead. */ -@Internal +@Deprecated Review comment: Could we keep @Internal? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-12626) Client should not register table-source-sink twice in TableEnvironment
[ https://issues.apache.org/jira/browse/FLINK-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855095#comment-16855095 ] Bowen Li commented on FLINK-12626: -- [~twalthr] by taking a look at the {{TableEnvImpl.registerTableSourceInternal()}}, I think it should solve this JIRA. I will do a followup testing and add unit tests after FLINK-12604 is merged. > Client should not register table-source-sink twice in TableEnvironment > -- > > Key: FLINK-12626 > URL: https://issues.apache.org/jira/browse/FLINK-12626 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Client >Reporter: Bowen Li >Priority: Major > Fix For: 1.9.0 > > > Currently for a table specified in SQL CLI yaml file, if it's with type: > source-sink-table, it will be registered twice (as source and as sink) to > TableEnv. > As we've moved table management in TableEnv to Catalogs which doesn't allow > registering dup named table, we need to come up with a solution to fix this > problem. > cc [~xuefuz] [~tiwalter] [~dawidwys] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] bowenli86 commented on issue #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog
bowenli86 commented on issue #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog URL: https://github.com/apache/flink/pull/8601#issuecomment-498431690 @xuefuz as we discussed offline, I removed annotations from tests and leave them for production code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on issue #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests
bowenli86 commented on issue #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests URL: https://github.com/apache/flink/pull/8600#issuecomment-498431177 @xuefuz as we discussed offline, I removed annotations from tests and leave them for production code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] flinkbot commented on issue #8602: [FLINK-12313] Add workaround to avoid race condition in SynchronousCheckpointITCase test
flinkbot commented on issue #8602: [FLINK-12313] Add workaround to avoid race condition in SynchronousCheckpointITCase test URL: https://github.com/apache/flink/pull/8602#issuecomment-498423773 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-12313) SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints is unstable
[ https://issues.apache.org/jira/browse/FLINK-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-12313: --- Labels: pull-request-available test-stability (was: test-stability) > SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints > is unstable > --- > > Key: FLINK-12313 > URL: https://issues.apache.org/jira/browse/FLINK-12313 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Reporter: Ufuk Celebi >Assignee: Alex >Priority: Critical > Labels: pull-request-available, test-stability > > {{SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints}} > fails and prints the Thread stack traces due to no output on Travis > occasionally. > {code} > == > Printing stack trace of Java process 10071 > == > 2019-04-24 07:55:29 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.151-b12 mixed mode): > "Attach Listener" #17 daemon prio=9 os_prio=0 tid=0x7f294892 > nid=0x2cf5 waiting on condition [0x] >java.lang.Thread.State: RUNNABLE > "Async calls on Test Task (1/1)" #15 daemon prio=5 os_prio=0 > tid=0x7f2948dd1800 nid=0x27a9 waiting on condition [0x7f292cea9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x8bb5e558> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > "Async calls on Test Task (1/1)" #14 daemon prio=5 os_prio=0 > tid=0x7f2948dce800 nid=0x27a8 in Object.wait() [0x7f292cfaa000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x8bac58f8> (a java.lang.Object) > at java.lang.Object.wait(Object.java:502) > at > org.apache.flink.streaming.runtime.tasks.SynchronousSavepointLatch.blockUntilCheckpointIsAcknowledged(SynchronousSavepointLatch.java:66) > - locked <0x8bac58f8> (a java.lang.Object) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:726) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:604) > at > org.apache.flink.streaming.runtime.tasks.SynchronousCheckpointITCase$SynchronousCheckpointTestingTask.triggerCheckpoint(SynchronousCheckpointITCase.java:174) > at org.apache.flink.runtime.taskmanager.Task$1.run(Task.java:1182) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > "CloseableReaperThread" #13 daemon prio=5 os_prio=0 tid=0x7f2948d9b800 > nid=0x27a7 in Object.wait() [0x7f292d0ab000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x8bbe3990> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143) > - locked <0x8bbe3990> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164) > at > org.apache.flink.core.fs.SafetyNetCloseableRegistry$CloseableReaperThread.run(SafetyNetCloseableRegistry.java:193) > "Test Task (1/1)" #12 prio=5 os_prio=0 tid=0x7f2948d97000 nid=0x27a6 in > Object.wait() [0x7f292d1ac000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x8e63f7d8> (a java.lang.Object) > at java.lang.Object.wait(Object.java:502) > at > org.apache.flink.core.testutils.OneShotLatch.await(OneShotLatch.java:63) > - locked <0x8e63f7d8> (a java.lang.Object) > at > org.
[GitHub] [flink] 1u0 opened a new pull request #8602: [FLINK-12313] Add workaround to avoid race condition in SynchronousCheckpointITCase test
1u0 opened a new pull request #8602: [FLINK-12313] Add workaround to avoid race condition in SynchronousCheckpointITCase test URL: https://github.com/apache/flink/pull/8602 ## What is the purpose of the change The `SynchronousCheckpointITCase` has a race condition and with some chance may fail on random tests on CI. This PR adds an additional synchronization point as a workaround to avoid the issue. Additionally, the PR simplifies (refactors) the test, although the race condition is present in both refactored and non-refactored versions. ## Brief change log - The test is simplified to not use dedicated locks for general flow checks. The test run set to limited by a timeout. - Added a hacky synchronization point in `advanceToEndOfEventTime` method, to make sure that `triggerCheckpoint` thread has progressed far enough, before triggering `notifyCheckpointComplete`. ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Assigned] (FLINK-12313) SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints is unstable
[ https://issues.apache.org/jira/browse/FLINK-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex reassigned FLINK-12313: Assignee: Alex > SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints > is unstable > --- > > Key: FLINK-12313 > URL: https://issues.apache.org/jira/browse/FLINK-12313 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Reporter: Ufuk Celebi >Assignee: Alex >Priority: Critical > Labels: test-stability > > {{SynchronousCheckpointITCase.taskCachedThreadPoolAllowsForSynchronousCheckpoints}} > fails and prints the Thread stack traces due to no output on Travis > occasionally. > {code} > == > Printing stack trace of Java process 10071 > == > 2019-04-24 07:55:29 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.151-b12 mixed mode): > "Attach Listener" #17 daemon prio=9 os_prio=0 tid=0x7f294892 > nid=0x2cf5 waiting on condition [0x] >java.lang.Thread.State: RUNNABLE > "Async calls on Test Task (1/1)" #15 daemon prio=5 os_prio=0 > tid=0x7f2948dd1800 nid=0x27a9 waiting on condition [0x7f292cea9000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x8bb5e558> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > "Async calls on Test Task (1/1)" #14 daemon prio=5 os_prio=0 > tid=0x7f2948dce800 nid=0x27a8 in Object.wait() [0x7f292cfaa000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x8bac58f8> (a java.lang.Object) > at java.lang.Object.wait(Object.java:502) > at > org.apache.flink.streaming.runtime.tasks.SynchronousSavepointLatch.blockUntilCheckpointIsAcknowledged(SynchronousSavepointLatch.java:66) > - locked <0x8bac58f8> (a java.lang.Object) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:726) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:604) > at > org.apache.flink.streaming.runtime.tasks.SynchronousCheckpointITCase$SynchronousCheckpointTestingTask.triggerCheckpoint(SynchronousCheckpointITCase.java:174) > at org.apache.flink.runtime.taskmanager.Task$1.run(Task.java:1182) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > "CloseableReaperThread" #13 daemon prio=5 os_prio=0 tid=0x7f2948d9b800 > nid=0x27a7 in Object.wait() [0x7f292d0ab000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x8bbe3990> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143) > - locked <0x8bbe3990> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164) > at > org.apache.flink.core.fs.SafetyNetCloseableRegistry$CloseableReaperThread.run(SafetyNetCloseableRegistry.java:193) > "Test Task (1/1)" #12 prio=5 os_prio=0 tid=0x7f2948d97000 nid=0x27a6 in > Object.wait() [0x7f292d1ac000] >java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x8e63f7d8> (a java.lang.Object) > at java.lang.Object.wait(Object.java:502) > at > org.apache.flink.core.testutils.OneShotLatch.await(OneShotLatch.java:63) > - locked <0x8e63f7d8> (a java.lang.Object) > at > org.apache.flink.streaming.runtime.tasks.SynchronousCheckpointITCase$SynchronousCheckpointTesti
[GitHub] [flink] asfgit closed pull request #8536: [FLINK-12568][hive] Implement OutputFormat to write Hive tables
asfgit closed pull request #8536: [FLINK-12568][hive] Implement OutputFormat to write Hive tables URL: https://github.com/apache/flink/pull/8536 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on issue #8536: [FLINK-12568][hive] Implement OutputFormat to write Hive tables
bowenli86 commented on issue #8536: [FLINK-12568][hive] Implement OutputFormat to write Hive tables URL: https://github.com/apache/flink/pull/8536#issuecomment-498416544 moreover, please have correct commit message in the future. E.g. the first commit's msg should be "[FLINK-12568][hive] Implement OutputFormat to write Hive tables" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-12070) Make blocking result partitions consumable multiple times
[ https://issues.apache.org/jira/browse/FLINK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855013#comment-16855013 ] Stefan Richter commented on FLINK-12070: I have one more additional question: can you share the heap size of your JVM and also the ratio of JVM heap to total physical memory? Possible that the new implementation could perform better with different memory settings, i.e. only the required size for JVM heap and keeping more physical memory available for mmap - please note that mmap does not account for heap size, but for process memory. > Make blocking result partitions consumable multiple times > - > > Key: FLINK-12070 > URL: https://issues.apache.org/jira/browse/FLINK-12070 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network >Affects Versions: 1.9.0 >Reporter: Till Rohrmann >Assignee: Stephan Ewen >Priority: Blocker > Labels: pull-request-available > Fix For: 1.9.0 > > Attachments: image-2019-04-18-17-38-24-949.png > > Time Spent: 20m > Remaining Estimate: 0h > > In order to avoid writing produced results multiple times for multiple > consumers and in order to speed up batch recoveries, we should make the > blocking result partitions to be consumable multiple times. At the moment a > blocking result partition will be released once the consumers has processed > all data. Instead the result partition should be released once the next > blocking result has been produced and all consumers of a blocking result > partition have terminated. Moreover, blocking results should not hold on slot > resources like network buffers or memory as it is currently the case with > {{SpillableSubpartitions}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [flink] bowenli86 commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables
bowenli86 commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables URL: https://github.com/apache/flink/pull/8549#discussion_r289974528 ## File path: flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/ConnectorCatalogTable.java ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.catalog; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.table.api.TableSchema; +import org.apache.flink.table.sinks.TableSink; +import org.apache.flink.table.sources.DefinedProctimeAttribute; +import org.apache.flink.table.sources.DefinedRowtimeAttributes; +import org.apache.flink.table.sources.RowtimeAttributeDescriptor; +import org.apache.flink.table.sources.TableSource; +import org.apache.flink.table.typeutils.TimeIndicatorTypeInfo; + +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; + +/** + * A {@link CatalogTable} that wraps a {@link TableSource} and/or {@link TableSink}. + * This allows registering those in a {@link Catalog}. It can not be persisted as the + * source and/or sink might be inline implementations and not be representable in a + * property based form. + * + * @param type of the produced elements by the {@link TableSource} + * @param type of the expected elements by the {@link TableSink} + */ +@Internal +public class ConnectorCatalogTable extends AbstractCatalogTable { + private final TableSource tableSource; + private final TableSink tableSink; + private final boolean isBatch; Review comment: just want to double check: unlike a persistent catalog table that can be both batch and streaming, `ConnectorCatalogTable` can only be either batch or streaming, right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables
bowenli86 commented on a change in pull request #8549: [FLINK-12604][table-api][table-planner] Register TableSource/Sink as CatalogTables URL: https://github.com/apache/flink/pull/8549#discussion_r289972929 ## File path: flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/ConnectorCatalogTable.java ## @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.catalog; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.api.common.typeinfo.TypeInformation; +import org.apache.flink.table.api.TableSchema; +import org.apache.flink.table.sinks.TableSink; +import org.apache.flink.table.sources.DefinedProctimeAttribute; +import org.apache.flink.table.sources.DefinedRowtimeAttributes; +import org.apache.flink.table.sources.RowtimeAttributeDescriptor; +import org.apache.flink.table.sources.TableSource; +import org.apache.flink.table.typeutils.TimeIndicatorTypeInfo; + +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.Optional; +import java.util.stream.Collectors; + +/** + * A {@link CatalogTable} that wraps a {@link TableSource} and/or {@link TableSink}. + * This allows registering those in a {@link Catalog}. It can not be persisted as the + * source and/or sink might be inline implementations and not be representable in a Review comment: a bit confused be the "might be" here. Are some `ConnectorCatalogTable`s inline impl, and some are not and can be converted to properties? The exception thrown in `toProperties()` indicates that all `ConnectorCatalogTable` cannot be converted to properties This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog
xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog URL: https://github.com/apache/flink/pull/8601#discussion_r289984057 ## File path: flink-table/flink-table-common/src/test/java/org/apache/flink/table/descriptors/ExternalCatalogDescriptorTest.java ## @@ -36,6 +36,7 @@ * Tests for the {@link ExternalCatalogDescriptor} descriptor and * {@link ExternalCatalogDescriptorValidator} validator. */ +@Deprecated Review comment: This is not user facing, so I don't think we need to deprecate it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog
xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog URL: https://github.com/apache/flink/pull/8601#discussion_r289984145 ## File path: flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/factories/ExternalCatalogFactoryServiceTest.scala ## @@ -31,6 +31,7 @@ import org.junit.Test * Tests for testing external catalog discovery using [[TableFactoryService]]. The tests assume the * external catalog factory [[TestExternalCatalogFactory]] is registered. */ +@Deprecated Review comment: Same as above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog
xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog URL: https://github.com/apache/flink/pull/8601#discussion_r289984202 ## File path: flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/factories/utils/TestExternalCatalogFactory.scala ## @@ -37,6 +37,7 @@ import org.apache.flink.table.runtime.utils.CommonTestData * The catalog produces tables intended for either a streaming or batch environment, * based on the descriptor property {{{ is-streaming }}}. */ +@Deprecated Review comment: Same... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog
xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog URL: https://github.com/apache/flink/pull/8601#discussion_r289983564 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/descriptors/ExternalCatalogDescriptorValidator.java ## @@ -18,12 +18,12 @@ package org.apache.flink.table.descriptors; -import org.apache.flink.annotation.Internal; - /** * Validator for {@link ExternalCatalogDescriptor}. + * + * @deprecated use {@link CatalogDescriptorValidator} instead. */ -@Internal +@Deprecated Review comment: We can probably keep @internal while adding @Deprecated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog
xuefuz commented on a change in pull request #8601: [FLINK-12713][table] deprecate descriptor, validator, and factory of ExternalCatalog URL: https://github.com/apache/flink/pull/8601#discussion_r289983564 ## File path: flink-table/flink-table-common/src/main/java/org/apache/flink/table/descriptors/ExternalCatalogDescriptorValidator.java ## @@ -18,12 +18,12 @@ package org.apache.flink.table.descriptors; -import org.apache.flink.annotation.Internal; - /** * Validator for {@link ExternalCatalogDescriptor}. + * + * @deprecated use {@link CatalogDescriptorValidator} instead. */ -@Internal +@Deprecated Review comment: We can probably keep @internal This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests
xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests URL: https://github.com/apache/flink/pull/8600#discussion_r289982158 ## File path: flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/catalog/ExternalCatalogSchemaTest.scala ## @@ -36,6 +36,7 @@ import org.junit.{Before, Test} import scala.collection.JavaConverters._ +@deprecated Review comment: Same This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] bowenli86 commented on a change in pull request #8589: [FLINK-12677][hive][sql-client] Add descriptor, validator, and factory for HiveCatalog
bowenli86 commented on a change in pull request #8589: [FLINK-12677][hive][sql-client] Add descriptor, validator, and factory for HiveCatalog URL: https://github.com/apache/flink/pull/8589#discussion_r289982183 ## File path: flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/factories/HiveCatalogFactory.java ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.catalog.hive.factories; + +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.table.catalog.Catalog; +import org.apache.flink.table.catalog.hive.HiveCatalog; +import org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator; +import org.apache.flink.table.descriptors.DescriptorProperties; +import org.apache.flink.table.factories.CatalogFactory; + +import org.apache.hadoop.hive.conf.HiveConf; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Optional; + +import static org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator.CATALOG_PROPERTOES_HIVE_METASTORE_URIS; +import static org.apache.flink.table.catalog.hive.descriptors.HiveCatalogValidator.CATALOG_TYPE_VALUE_HIVE; +import static org.apache.flink.table.descriptors.CatalogDescriptorValidator.CATALOG_DEFAULT_DATABASE; +import static org.apache.flink.table.descriptors.CatalogDescriptorValidator.CATALOG_PROPERTIES; +import static org.apache.flink.table.descriptors.CatalogDescriptorValidator.CATALOG_PROPERTY_VERSION; +import static org.apache.flink.table.descriptors.CatalogDescriptorValidator.CATALOG_TYPE; + +/** + * Catalog factory for {@link HiveCatalog}. + */ +public class HiveCatalogFactory implements CatalogFactory { + + @Override + public Map requiredContext() { + Map context = new HashMap<>(); + context.put(CATALOG_TYPE, CATALOG_TYPE_VALUE_HIVE); // hive + context.put(CATALOG_PROPERTY_VERSION, "1"); // backwards compatibility + return context; + } + + @Override + public List supportedProperties() { + List properties = new ArrayList<>(); + + // default database + properties.add(CATALOG_DEFAULT_DATABASE); + + properties.add(CATALOG_PROPERTIES); Review comment: removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests
xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests URL: https://github.com/apache/flink/pull/8600#discussion_r289982243 ## File path: flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/catalog/InMemoryExternalCatalogTest.scala ## @@ -24,6 +24,7 @@ import org.apache.flink.table.descriptors.{ConnectorDescriptor, Schema} import org.junit.Assert._ import org.junit.{Before, Test} +@deprecated Review comment: Same. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests
xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests URL: https://github.com/apache/flink/pull/8600#discussion_r289982051 ## File path: flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/api/ExternalCatalogTest.scala ## @@ -27,6 +27,7 @@ import org.junit.Test /** * Test for external catalog query plan. */ +@deprecated Review comment: Same as above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests
xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests URL: https://github.com/apache/flink/pull/8600#discussion_r289981952 ## File path: flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/api/ExternalCatalogInsertTest.scala ## @@ -29,6 +29,7 @@ import org.junit.Test /** * Test for inserting into tables from external catalog. */ +@deprecated Review comment: Same as above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests
xuefuz commented on a change in pull request #8600: [FLINK-12712][table] deprecate ExternalCatalog and its subclasses, implementations, related util classes and tests URL: https://github.com/apache/flink/pull/8600#discussion_r289981786 ## File path: flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/catalog/ExternalCatalogSchema.scala ## @@ -37,7 +37,10 @@ import scala.collection.JavaConverters._ * * @param catalogIdentifier external catalog name * @param catalog external catalog + * + * @deprecated use [[CatalogCalciteSchema]] instead. */ +@deprecated Review comment: Do we need this? I think we don't have to unless this is user facing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services