[jira] [Commented] (FLINK-5517) Upgrade hbase version to 1.3.0
[ https://issues.apache.org/jira/browse/FLINK-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849595#comment-15849595 ] ASF GitHub Bot commented on FLINK-5517: --- Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3235 Thank you for looking into this @tedyu. In the ML(http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-using-HBase-with-Flink-1-1-4-td11094.html), Giuliano has also reported that the upgrade fixes the problem. +1 to merge. > Upgrade hbase version to 1.3.0 > -- > > Key: FLINK-5517 > URL: https://issues.apache.org/jira/browse/FLINK-5517 > Project: Flink > Issue Type: Improvement > Components: Streaming Connectors >Reporter: Ted Yu > > In the thread 'Help using HBase with Flink 1.1.4', Giuliano reported seeing: > {code} > java.lang.IllegalAccessError: tried to access method > com.google.common.base.Stopwatch.()V from class > org.apache.hadoop.hbase.zookeeper.MetaTableLocator > {code} > The above has been solved by HBASE-14963 > hbase 1.3.0 is being released. > We should upgrade hbase dependency to 1.3.0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3235: FLINK-5517 Upgrade hbase version to 1.3.0
Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3235 Thank you for looking into this @tedyu. In the ML(http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-using-HBase-with-Flink-1-1-4-td11094.html), Giuliano has also reported that the upgrade fixes the problem. +1 to merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3237: [FLINK-5474] [docs] Extend DC/OS documentation
Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3237#discussion_r99064341 --- Diff: docs/setup/mesos.md --- @@ -92,18 +92,25 @@ Universe. In the search prompt, just search for Flink. **Note**: At the time of this writing, Flink was not yet available in the Unvierse. Please use the following workaround in the meantime: --- End diff -- This isn't part of your change, but just spotted this typo :-) : "Universe" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5474) Extend DC/OS documentation
[ https://issues.apache.org/jira/browse/FLINK-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849585#comment-15849585 ] ASF GitHub Bot commented on FLINK-5474: --- Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3237#discussion_r99064341 --- Diff: docs/setup/mesos.md --- @@ -92,18 +92,25 @@ Universe. In the search prompt, just search for Flink. **Note**: At the time of this writing, Flink was not yet available in the Unvierse. Please use the following workaround in the meantime: --- End diff -- This isn't part of your change, but just spotted this typo :-) : "Universe" > Extend DC/OS documentation > -- > > Key: FLINK-5474 > URL: https://issues.apache.org/jira/browse/FLINK-5474 > Project: Flink > Issue Type: Sub-task > Components: Mesos >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Minor > Fix For: 1.2.0, 1.3.0 > > > We could extend the DC/OS documentation a little bit to include information > about how to submit a job (where to find the connection information) and that > one has to install the DC/OS cli in order to add the development universe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5474) Extend DC/OS documentation
[ https://issues.apache.org/jira/browse/FLINK-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849587#comment-15849587 ] ASF GitHub Bot commented on FLINK-5474: --- Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3237#discussion_r99064919 --- Diff: docs/setup/mesos.md --- @@ -92,18 +92,25 @@ Universe. In the search prompt, just search for Flink. **Note**: At the time of this writing, Flink was not yet available in the Unvierse. Please use the following workaround in the meantime: -1. Add the Development Universe +1. [Install the DC/OS CLI](https://dcos.io/docs/1.8/usage/cli/install/) -`dcos marathon app add https://raw.githubusercontent.com/mesosphere/dcos-flink-service/Makman2/quickstart/universe-server.json` +2. Add the Development Universe + +`./dcos marathon app add https://raw.githubusercontent.com/mesosphere/dcos-flink-service/Makman2/quickstart/universe-server.json` -2. Add the local Universe repository: +3. Add the local Universe repository: - `dcos package repo add --index=0 dev-universe http://universe.marathon.mesos:8085/repo` + `./dcos package repo add --index=0 dev-universe http://universe.marathon.mesos:8085/repo` -3. Install Flink through the Universe page or using the `dcos` command: +4. Install Flink through the Universe page or using the `dcos` command: - `dcos package install flink` + `./dcos package install flink` + +In order to execute a Flink job on a DC/OS hosted Flink cluster, you first have to find out the address of the launched job manager. +The job manager address can be found out by opening the Flink service, going to *Job Manager* and then using the address specified under `jobmanager.rpc.address` and `jobmanager.rpc.port`. --- End diff -- "JobManager" > Extend DC/OS documentation > -- > > Key: FLINK-5474 > URL: https://issues.apache.org/jira/browse/FLINK-5474 > Project: Flink > Issue Type: Sub-task > Components: Mesos >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Minor > Fix For: 1.2.0, 1.3.0 > > > We could extend the DC/OS documentation a little bit to include information > about how to submit a job (where to find the connection information) and that > one has to install the DC/OS cli in order to add the development universe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink pull request #3237: [FLINK-5474] [docs] Extend DC/OS documentation
Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3237#discussion_r99064510 --- Diff: docs/setup/mesos.md --- @@ -92,18 +92,25 @@ Universe. In the search prompt, just search for Flink. **Note**: At the time of this writing, Flink was not yet available in the Unvierse. Please use the following workaround in the meantime: -1. Add the Development Universe +1. [Install the DC/OS CLI](https://dcos.io/docs/1.8/usage/cli/install/) -`dcos marathon app add https://raw.githubusercontent.com/mesosphere/dcos-flink-service/Makman2/quickstart/universe-server.json` +2. Add the Development Universe + +`./dcos marathon app add https://raw.githubusercontent.com/mesosphere/dcos-flink-service/Makman2/quickstart/universe-server.json` -2. Add the local Universe repository: +3. Add the local Universe repository: - `dcos package repo add --index=0 dev-universe http://universe.marathon.mesos:8085/repo` + `./dcos package repo add --index=0 dev-universe http://universe.marathon.mesos:8085/repo` -3. Install Flink through the Universe page or using the `dcos` command: +4. Install Flink through the Universe page or using the `dcos` command: - `dcos package install flink` + `./dcos package install flink` + +In order to execute a Flink job on a DC/OS hosted Flink cluster, you first have to find out the address of the launched job manager. --- End diff -- job manager --> I think we mostly use "JobManager" in docs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3237: [FLINK-5474] [docs] Extend DC/OS documentation
Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3237#discussion_r99064919 --- Diff: docs/setup/mesos.md --- @@ -92,18 +92,25 @@ Universe. In the search prompt, just search for Flink. **Note**: At the time of this writing, Flink was not yet available in the Unvierse. Please use the following workaround in the meantime: -1. Add the Development Universe +1. [Install the DC/OS CLI](https://dcos.io/docs/1.8/usage/cli/install/) -`dcos marathon app add https://raw.githubusercontent.com/mesosphere/dcos-flink-service/Makman2/quickstart/universe-server.json` +2. Add the Development Universe + +`./dcos marathon app add https://raw.githubusercontent.com/mesosphere/dcos-flink-service/Makman2/quickstart/universe-server.json` -2. Add the local Universe repository: +3. Add the local Universe repository: - `dcos package repo add --index=0 dev-universe http://universe.marathon.mesos:8085/repo` + `./dcos package repo add --index=0 dev-universe http://universe.marathon.mesos:8085/repo` -3. Install Flink through the Universe page or using the `dcos` command: +4. Install Flink through the Universe page or using the `dcos` command: - `dcos package install flink` + `./dcos package install flink` + +In order to execute a Flink job on a DC/OS hosted Flink cluster, you first have to find out the address of the launched job manager. +The job manager address can be found out by opening the Flink service, going to *Job Manager* and then using the address specified under `jobmanager.rpc.address` and `jobmanager.rpc.port`. --- End diff -- "JobManager" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5474) Extend DC/OS documentation
[ https://issues.apache.org/jira/browse/FLINK-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849586#comment-15849586 ] ASF GitHub Bot commented on FLINK-5474: --- Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3237#discussion_r99064510 --- Diff: docs/setup/mesos.md --- @@ -92,18 +92,25 @@ Universe. In the search prompt, just search for Flink. **Note**: At the time of this writing, Flink was not yet available in the Unvierse. Please use the following workaround in the meantime: -1. Add the Development Universe +1. [Install the DC/OS CLI](https://dcos.io/docs/1.8/usage/cli/install/) -`dcos marathon app add https://raw.githubusercontent.com/mesosphere/dcos-flink-service/Makman2/quickstart/universe-server.json` +2. Add the Development Universe + +`./dcos marathon app add https://raw.githubusercontent.com/mesosphere/dcos-flink-service/Makman2/quickstart/universe-server.json` -2. Add the local Universe repository: +3. Add the local Universe repository: - `dcos package repo add --index=0 dev-universe http://universe.marathon.mesos:8085/repo` + `./dcos package repo add --index=0 dev-universe http://universe.marathon.mesos:8085/repo` -3. Install Flink through the Universe page or using the `dcos` command: +4. Install Flink through the Universe page or using the `dcos` command: - `dcos package install flink` + `./dcos package install flink` + +In order to execute a Flink job on a DC/OS hosted Flink cluster, you first have to find out the address of the launched job manager. --- End diff -- job manager --> I think we mostly use "JobManager" in docs > Extend DC/OS documentation > -- > > Key: FLINK-5474 > URL: https://issues.apache.org/jira/browse/FLINK-5474 > Project: Flink > Issue Type: Sub-task > Components: Mesos >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Minor > Fix For: 1.2.0, 1.3.0 > > > We could extend the DC/OS documentation a little bit to include information > about how to submit a job (where to find the connection information) and that > one has to install the DC/OS cli in order to add the development universe. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5494) Improve Mesos documentation
[ https://issues.apache.org/jira/browse/FLINK-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849574#comment-15849574 ] ASF GitHub Bot commented on FLINK-5494: --- Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3236#discussion_r99063368 --- Diff: docs/setup/mesos.md --- @@ -107,29 +107,58 @@ Unvierse. Please use the following workaround in the meantime: ## Mesos without DC/OS -Let's take a look at how to setup Flink on Mesos without DC/OS. +You can also run Mesos without DC/OS. -### Prerequisites +### Installing Mesos -Please follow the -[instructions on how to setup Mesos on the official website](http://mesos.apache.org/documentation/latest/). +Please follow the [instructions on how to setup Mesos on the official website](http://mesos.apache.org/documentation/latest/getting-started/). + +After installation you have to configure the set of master and agent nodes by creating the files `MESOS_HOME/etc/mesos/masters` and `MESOS_HOME/etc/mesos/slaves`. +These files contain in each row a single hostname on which the respective component will be started (assuming SSH access to these nodes). + +Next you have to create `MESOS_HOME/etc/mesos/mesos-master-env.sh` or use the template found in the same directory. +In this file, you have to define + +export MESOS_work_dir=WORK_DIRECTORY + +and it is recommended to uncommment + +export MESOS_log_dir=LOGGING_DIRECTORY + + +In order to configure the Mesos agents, you have to create `MESOS_HOME/etc/mesos/mesos-agent-env.sh` or use the template found in the same directory. +You have to configure + +export MESOS_master=MASTER_HOSTNAME:MASTER_PORT + +and uncomment + +export MESOS_log_dir=LOGGING_DIRECTORY +export MESOS_work_dir=WORK_DIRECTORY + + Mesos Library + +In order to run Java applications with Mesos you have to export `MESOS_NATIVE_JAVA_LIBRARY=MESOS_HOME/lib/libmesos.so`. + + Deploying Mesos + +In order to start your mesos cluster, use the deployment script `MESOS_HOME/sbin/mesos-start-cluster.sh`. +In order to stop your mesos cluster, use the deployment script `MESOS_HOME/sbin/mesos-stop-cluster.sh`. +More information about the deployment scripts can be found [here](http://mesos.apache.org/documentation/latest/deploy-scripts/). + +### Installing Marathon + +Optionally, you may also [install Marathon](https://mesosphere.github.io/marathon/docs/) which will be necessary to run Flink in high availability (HA) mode. ### Optional dependencies --- End diff -- This title is a bit not synced with the other title changes: "Installing Marathon" was singled out if "Optional dependencies", even though Marathon is still optional. Perhaps "Installing Maration (Optional)" and change this also to "Installing a Distributed Filesystem (Optional)". > Improve Mesos documentation > --- > > Key: FLINK-5494 > URL: https://issues.apache.org/jira/browse/FLINK-5494 > Project: Flink > Issue Type: Sub-task > Components: Mesos >Affects Versions: 1.2.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann > Fix For: 1.2.0 > > > Flink's Mesos documentation could benefit from more details how to set things > up and which parameters to use. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5494) Improve Mesos documentation
[ https://issues.apache.org/jira/browse/FLINK-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849573#comment-15849573 ] ASF GitHub Bot commented on FLINK-5494: --- Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3236#discussion_r99062197 --- Diff: docs/setup/mesos.md --- @@ -107,29 +107,58 @@ Unvierse. Please use the following workaround in the meantime: ## Mesos without DC/OS -Let's take a look at how to setup Flink on Mesos without DC/OS. +You can also run Mesos without DC/OS. -### Prerequisites +### Installing Mesos -Please follow the -[instructions on how to setup Mesos on the official website](http://mesos.apache.org/documentation/latest/). +Please follow the [instructions on how to setup Mesos on the official website](http://mesos.apache.org/documentation/latest/getting-started/). + +After installation you have to configure the set of master and agent nodes by creating the files `MESOS_HOME/etc/mesos/masters` and `MESOS_HOME/etc/mesos/slaves`. +These files contain in each row a single hostname on which the respective component will be started (assuming SSH access to these nodes). + +Next you have to create `MESOS_HOME/etc/mesos/mesos-master-env.sh` or use the template found in the same directory. +In this file, you have to define + +export MESOS_work_dir=WORK_DIRECTORY + +and it is recommended to uncommment + +export MESOS_log_dir=LOGGING_DIRECTORY + + +In order to configure the Mesos agents, you have to create `MESOS_HOME/etc/mesos/mesos-agent-env.sh` or use the template found in the same directory. +You have to configure + +export MESOS_master=MASTER_HOSTNAME:MASTER_PORT + +and uncomment + +export MESOS_log_dir=LOGGING_DIRECTORY +export MESOS_work_dir=WORK_DIRECTORY + + Mesos Library + +In order to run Java applications with Mesos you have to export `MESOS_NATIVE_JAVA_LIBRARY=MESOS_HOME/lib/libmesos.so`. --- End diff -- On Mac, the native library name seems to be `libmesos.dylib`. Might be good to inform in the docs what OS the instructions is assuming, or just add a small comment about MacOS here. > Improve Mesos documentation > --- > > Key: FLINK-5494 > URL: https://issues.apache.org/jira/browse/FLINK-5494 > Project: Flink > Issue Type: Sub-task > Components: Mesos >Affects Versions: 1.2.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann > Fix For: 1.2.0 > > > Flink's Mesos documentation could benefit from more details how to set things > up and which parameters to use. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5494) Improve Mesos documentation
[ https://issues.apache.org/jira/browse/FLINK-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849576#comment-15849576 ] ASF GitHub Bot commented on FLINK-5494: --- Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3236#discussion_r99061454 --- Diff: docs/setup/mesos.md --- @@ -145,60 +174,75 @@ If set to 'docker', specify the image name: In the `/bin` directory of the Flink distribution, you find two startup scripts which manage the Flink processes in a Mesos cluster: -1. mesos-appmaster.sh - This starts the Mesos application master which will register the Mesos - scheduler. It is also responsible for starting up the worker nodes. +1. `mesos-appmaster.sh` + This starts the Mesos application master which will register the Mesos scheduler. + It is also responsible for starting up the worker nodes. -2. mesos-taskmanager.sh - The entry point for the Mesos worker processes. You don't need to explicitly - execute this script. It is automatically launched by the Mesos worker node to - bring up a new TaskManager. +2. `mesos-taskmanager.sh` + The entry point for the Mesos worker processes. + You don't need to explicitly execute this script. + It is automatically launched by the Mesos worker node to bring up a new TaskManager. + +In order to run the `mesos-appmaster.sh` script you have to define `mesos.master` in the `flink-conf.yaml` or pass it via `-Dmesos.master=...` to the Java process. +Additionally, you should define the number of task managers which are started by Mesos via `mesos.initial-tasks`. +This value can also be defined in the `flink-conf.yaml` or passed as a Java property. + +When executing `mesos-appmaster.sh`, it will create a job manager on the machine where you executed the script. +In contrast to that, the task managers will be run as Mesos tasks in the Mesos cluster. + + General configuration + +It is possible to completely parameterize a Mesos application through Java properties passed to the Mesos application master. +This also allows to specify general Flink configuration parameters. +For example: + +bin/mesos-appmaster.sh \ +-Dmesos.master=master.foobar.org:5050 +-Djobmanager.heap.mb=1024 \ +-Djobmanager.rpc.port=6123 \ +-Djobmanager.web.port=8081 \ +-Dmesos.initial-tasks=10 \ +-Dmesos.resourcemanager.tasks.mem=4096 \ +-Dtaskmanager.heap.mb=3500 \ +-Dtaskmanager.numberOfTaskSlots=2 \ +-Dparallelism.default=10 ### High Availability -You will need to run a service like Marathon or Apache Aurora which takes care -of restarting the Flink master process in case of node or process failures. In -addition, Zookeeper needs to be configured like described in the -[High Availability section of the Flink docs]({{ site.baseurl }}/setup/jobmanager_high_availability.html) +You will need to run a service like Marathon or Apache Aurora which takes care of restarting the Flink master process in case of node or process failures. +In addition, Zookeeper needs to be configured like described in the [High Availability section of the Flink docs]({{ site.baseurl }}/setup/jobmanager_high_availability.html) -For the reconciliation of tasks to work correctly, please also set -`recovery.zookeeper.path.mesos-workers` to a valid Zookeeper path. +For the reconciliation of tasks to work correctly, please also set `recovery.zookeeper.path.mesos-workers` to a valid Zookeeper path. Marathon -Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. In -particular, it should also adjust any configuration parameters for the Flink -cluster. +Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. +In particular, it should also adjust any configuration parameters for the Flink cluster. Here is an example configuration for Marathon: { -"id": "basic-0", -"cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -DconfigEntry=configValue -DanotherEntry=anotherValue ...", +"id": "flink", +"cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Djobmanager.web.port=8081 -Dmesos.initial-tasks=1 -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2 -Dmesos.resourcemanager.tasks.cpus=1", "cpus": 1.0, -"mem": 2048, +"mem": 1024 --- End diff -- The keys in this configuration example is missing indentation. > Improve Mesos documentation > --- > >
[jira] [Commented] (FLINK-5494) Improve Mesos documentation
[ https://issues.apache.org/jira/browse/FLINK-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849577#comment-15849577 ] ASF GitHub Bot commented on FLINK-5494: --- Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3236#discussion_r99063127 --- Diff: docs/setup/mesos.md --- @@ -107,29 +107,58 @@ Unvierse. Please use the following workaround in the meantime: ## Mesos without DC/OS -Let's take a look at how to setup Flink on Mesos without DC/OS. +You can also run Mesos without DC/OS. -### Prerequisites +### Installing Mesos -Please follow the -[instructions on how to setup Mesos on the official website](http://mesos.apache.org/documentation/latest/). +Please follow the [instructions on how to setup Mesos on the official website](http://mesos.apache.org/documentation/latest/getting-started/). + +After installation you have to configure the set of master and agent nodes by creating the files `MESOS_HOME/etc/mesos/masters` and `MESOS_HOME/etc/mesos/slaves`. +These files contain in each row a single hostname on which the respective component will be started (assuming SSH access to these nodes). + +Next you have to create `MESOS_HOME/etc/mesos/mesos-master-env.sh` or use the template found in the same directory. +In this file, you have to define + +export MESOS_work_dir=WORK_DIRECTORY + +and it is recommended to uncommment + +export MESOS_log_dir=LOGGING_DIRECTORY + + +In order to configure the Mesos agents, you have to create `MESOS_HOME/etc/mesos/mesos-agent-env.sh` or use the template found in the same directory. +You have to configure + +export MESOS_master=MASTER_HOSTNAME:MASTER_PORT + +and uncomment + +export MESOS_log_dir=LOGGING_DIRECTORY +export MESOS_work_dir=WORK_DIRECTORY + + Mesos Library + +In order to run Java applications with Mesos you have to export `MESOS_NATIVE_JAVA_LIBRARY=MESOS_HOME/lib/libmesos.so`. + + Deploying Mesos + +In order to start your mesos cluster, use the deployment script `MESOS_HOME/sbin/mesos-start-cluster.sh`. +In order to stop your mesos cluster, use the deployment script `MESOS_HOME/sbin/mesos-stop-cluster.sh`. +More information about the deployment scripts can be found [here](http://mesos.apache.org/documentation/latest/deploy-scripts/). + +### Installing Marathon + +Optionally, you may also [install Marathon](https://mesosphere.github.io/marathon/docs/) which will be necessary to run Flink in high availability (HA) mode. ### Optional dependencies -Optionally, -you may also install [Marathon](https://mesosphere.github.io/marathon/) which -will be necessary if you want your Flink cluster to be highly available in the -presence of master node failures. Additionally, you probably want to install a -distributed file system to share data across nodes and make use of Flink's -checkpointing mechanism. +Additionally, you probably want to install a distributed file system to share data across nodes and make use of Flink's checkpointing mechanism. --- End diff -- I bumped a bit on this sentence. The relationship between a distributed file system and Flink's checkpointing is a bit unclear here. Perhaps change to something like "use filesystem backend for Flink's checkpointing", and also link to the related documents. > Improve Mesos documentation > --- > > Key: FLINK-5494 > URL: https://issues.apache.org/jira/browse/FLINK-5494 > Project: Flink > Issue Type: Sub-task > Components: Mesos >Affects Versions: 1.2.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann > Fix For: 1.2.0 > > > Flink's Mesos documentation could benefit from more details how to set things > up and which parameters to use. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink pull request #3236: [FLINK-5494] [docs] Add more details to the Mesos ...
Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3236#discussion_r99061454 --- Diff: docs/setup/mesos.md --- @@ -145,60 +174,75 @@ If set to 'docker', specify the image name: In the `/bin` directory of the Flink distribution, you find two startup scripts which manage the Flink processes in a Mesos cluster: -1. mesos-appmaster.sh - This starts the Mesos application master which will register the Mesos - scheduler. It is also responsible for starting up the worker nodes. +1. `mesos-appmaster.sh` + This starts the Mesos application master which will register the Mesos scheduler. + It is also responsible for starting up the worker nodes. -2. mesos-taskmanager.sh - The entry point for the Mesos worker processes. You don't need to explicitly - execute this script. It is automatically launched by the Mesos worker node to - bring up a new TaskManager. +2. `mesos-taskmanager.sh` + The entry point for the Mesos worker processes. + You don't need to explicitly execute this script. + It is automatically launched by the Mesos worker node to bring up a new TaskManager. + +In order to run the `mesos-appmaster.sh` script you have to define `mesos.master` in the `flink-conf.yaml` or pass it via `-Dmesos.master=...` to the Java process. +Additionally, you should define the number of task managers which are started by Mesos via `mesos.initial-tasks`. +This value can also be defined in the `flink-conf.yaml` or passed as a Java property. + +When executing `mesos-appmaster.sh`, it will create a job manager on the machine where you executed the script. +In contrast to that, the task managers will be run as Mesos tasks in the Mesos cluster. + + General configuration + +It is possible to completely parameterize a Mesos application through Java properties passed to the Mesos application master. +This also allows to specify general Flink configuration parameters. +For example: + +bin/mesos-appmaster.sh \ +-Dmesos.master=master.foobar.org:5050 +-Djobmanager.heap.mb=1024 \ +-Djobmanager.rpc.port=6123 \ +-Djobmanager.web.port=8081 \ +-Dmesos.initial-tasks=10 \ +-Dmesos.resourcemanager.tasks.mem=4096 \ +-Dtaskmanager.heap.mb=3500 \ +-Dtaskmanager.numberOfTaskSlots=2 \ +-Dparallelism.default=10 ### High Availability -You will need to run a service like Marathon or Apache Aurora which takes care -of restarting the Flink master process in case of node or process failures. In -addition, Zookeeper needs to be configured like described in the -[High Availability section of the Flink docs]({{ site.baseurl }}/setup/jobmanager_high_availability.html) +You will need to run a service like Marathon or Apache Aurora which takes care of restarting the Flink master process in case of node or process failures. +In addition, Zookeeper needs to be configured like described in the [High Availability section of the Flink docs]({{ site.baseurl }}/setup/jobmanager_high_availability.html) -For the reconciliation of tasks to work correctly, please also set -`recovery.zookeeper.path.mesos-workers` to a valid Zookeeper path. +For the reconciliation of tasks to work correctly, please also set `recovery.zookeeper.path.mesos-workers` to a valid Zookeeper path. Marathon -Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. In -particular, it should also adjust any configuration parameters for the Flink -cluster. +Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. +In particular, it should also adjust any configuration parameters for the Flink cluster. Here is an example configuration for Marathon: { -"id": "basic-0", -"cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -DconfigEntry=configValue -DanotherEntry=anotherValue ...", +"id": "flink", +"cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Djobmanager.web.port=8081 -Dmesos.initial-tasks=1 -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2 -Dmesos.resourcemanager.tasks.cpus=1", "cpus": 1.0, -"mem": 2048, +"mem": 1024 --- End diff -- The keys in this configuration example is missing indentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA
[jira] [Commented] (FLINK-5494) Improve Mesos documentation
[ https://issues.apache.org/jira/browse/FLINK-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849575#comment-15849575 ] ASF GitHub Bot commented on FLINK-5494: --- Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3236#discussion_r99063747 --- Diff: docs/setup/mesos.md --- @@ -145,60 +174,75 @@ If set to 'docker', specify the image name: In the `/bin` directory of the Flink distribution, you find two startup scripts which manage the Flink processes in a Mesos cluster: -1. mesos-appmaster.sh - This starts the Mesos application master which will register the Mesos - scheduler. It is also responsible for starting up the worker nodes. +1. `mesos-appmaster.sh` + This starts the Mesos application master which will register the Mesos scheduler. + It is also responsible for starting up the worker nodes. -2. mesos-taskmanager.sh - The entry point for the Mesos worker processes. You don't need to explicitly - execute this script. It is automatically launched by the Mesos worker node to - bring up a new TaskManager. +2. `mesos-taskmanager.sh` + The entry point for the Mesos worker processes. + You don't need to explicitly execute this script. + It is automatically launched by the Mesos worker node to bring up a new TaskManager. + +In order to run the `mesos-appmaster.sh` script you have to define `mesos.master` in the `flink-conf.yaml` or pass it via `-Dmesos.master=...` to the Java process. +Additionally, you should define the number of task managers which are started by Mesos via `mesos.initial-tasks`. +This value can also be defined in the `flink-conf.yaml` or passed as a Java property. + +When executing `mesos-appmaster.sh`, it will create a job manager on the machine where you executed the script. +In contrast to that, the task managers will be run as Mesos tasks in the Mesos cluster. + + General configuration + +It is possible to completely parameterize a Mesos application through Java properties passed to the Mesos application master. +This also allows to specify general Flink configuration parameters. +For example: + +bin/mesos-appmaster.sh \ +-Dmesos.master=master.foobar.org:5050 +-Djobmanager.heap.mb=1024 \ +-Djobmanager.rpc.port=6123 \ +-Djobmanager.web.port=8081 \ +-Dmesos.initial-tasks=10 \ +-Dmesos.resourcemanager.tasks.mem=4096 \ +-Dtaskmanager.heap.mb=3500 \ +-Dtaskmanager.numberOfTaskSlots=2 \ +-Dparallelism.default=10 ### High Availability -You will need to run a service like Marathon or Apache Aurora which takes care -of restarting the Flink master process in case of node or process failures. In -addition, Zookeeper needs to be configured like described in the -[High Availability section of the Flink docs]({{ site.baseurl }}/setup/jobmanager_high_availability.html) +You will need to run a service like Marathon or Apache Aurora which takes care of restarting the Flink master process in case of node or process failures. +In addition, Zookeeper needs to be configured like described in the [High Availability section of the Flink docs]({{ site.baseurl }}/setup/jobmanager_high_availability.html) -For the reconciliation of tasks to work correctly, please also set -`recovery.zookeeper.path.mesos-workers` to a valid Zookeeper path. +For the reconciliation of tasks to work correctly, please also set `recovery.zookeeper.path.mesos-workers` to a valid Zookeeper path. Marathon -Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. In -particular, it should also adjust any configuration parameters for the Flink -cluster. +Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. +In particular, it should also adjust any configuration parameters for the Flink cluster. Here is an example configuration for Marathon: { -"id": "basic-0", -"cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -DconfigEntry=configValue -DanotherEntry=anotherValue ...", +"id": "flink", +"cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Djobmanager.web.port=8081 -Dmesos.initial-tasks=1 -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2 -Dmesos.resourcemanager.tasks.cpus=1", "cpus": 1.0, -"mem": 2048, +"mem": 1024 } +When running Flink with Marathon, the whole Flink cluster including the job manager will be run as Mesos tasks in the Mesos cluster.
[GitHub] flink pull request #3236: [FLINK-5494] [docs] Add more details to the Mesos ...
Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3236#discussion_r99063127 --- Diff: docs/setup/mesos.md --- @@ -107,29 +107,58 @@ Unvierse. Please use the following workaround in the meantime: ## Mesos without DC/OS -Let's take a look at how to setup Flink on Mesos without DC/OS. +You can also run Mesos without DC/OS. -### Prerequisites +### Installing Mesos -Please follow the -[instructions on how to setup Mesos on the official website](http://mesos.apache.org/documentation/latest/). +Please follow the [instructions on how to setup Mesos on the official website](http://mesos.apache.org/documentation/latest/getting-started/). + +After installation you have to configure the set of master and agent nodes by creating the files `MESOS_HOME/etc/mesos/masters` and `MESOS_HOME/etc/mesos/slaves`. +These files contain in each row a single hostname on which the respective component will be started (assuming SSH access to these nodes). + +Next you have to create `MESOS_HOME/etc/mesos/mesos-master-env.sh` or use the template found in the same directory. +In this file, you have to define + +export MESOS_work_dir=WORK_DIRECTORY + +and it is recommended to uncommment + +export MESOS_log_dir=LOGGING_DIRECTORY + + +In order to configure the Mesos agents, you have to create `MESOS_HOME/etc/mesos/mesos-agent-env.sh` or use the template found in the same directory. +You have to configure + +export MESOS_master=MASTER_HOSTNAME:MASTER_PORT + +and uncomment + +export MESOS_log_dir=LOGGING_DIRECTORY +export MESOS_work_dir=WORK_DIRECTORY + + Mesos Library + +In order to run Java applications with Mesos you have to export `MESOS_NATIVE_JAVA_LIBRARY=MESOS_HOME/lib/libmesos.so`. + + Deploying Mesos + +In order to start your mesos cluster, use the deployment script `MESOS_HOME/sbin/mesos-start-cluster.sh`. +In order to stop your mesos cluster, use the deployment script `MESOS_HOME/sbin/mesos-stop-cluster.sh`. +More information about the deployment scripts can be found [here](http://mesos.apache.org/documentation/latest/deploy-scripts/). + +### Installing Marathon + +Optionally, you may also [install Marathon](https://mesosphere.github.io/marathon/docs/) which will be necessary to run Flink in high availability (HA) mode. ### Optional dependencies -Optionally, -you may also install [Marathon](https://mesosphere.github.io/marathon/) which -will be necessary if you want your Flink cluster to be highly available in the -presence of master node failures. Additionally, you probably want to install a -distributed file system to share data across nodes and make use of Flink's -checkpointing mechanism. +Additionally, you probably want to install a distributed file system to share data across nodes and make use of Flink's checkpointing mechanism. --- End diff -- I bumped a bit on this sentence. The relationship between a distributed file system and Flink's checkpointing is a bit unclear here. Perhaps change to something like "use filesystem backend for Flink's checkpointing", and also link to the related documents. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5494) Improve Mesos documentation
[ https://issues.apache.org/jira/browse/FLINK-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849572#comment-15849572 ] ASF GitHub Bot commented on FLINK-5494: --- Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3236#discussion_r99061517 --- Diff: docs/setup/mesos.md --- @@ -145,60 +174,75 @@ If set to 'docker', specify the image name: In the `/bin` directory of the Flink distribution, you find two startup scripts which manage the Flink processes in a Mesos cluster: -1. mesos-appmaster.sh - This starts the Mesos application master which will register the Mesos - scheduler. It is also responsible for starting up the worker nodes. +1. `mesos-appmaster.sh` + This starts the Mesos application master which will register the Mesos scheduler. + It is also responsible for starting up the worker nodes. -2. mesos-taskmanager.sh - The entry point for the Mesos worker processes. You don't need to explicitly - execute this script. It is automatically launched by the Mesos worker node to - bring up a new TaskManager. +2. `mesos-taskmanager.sh` + The entry point for the Mesos worker processes. + You don't need to explicitly execute this script. + It is automatically launched by the Mesos worker node to bring up a new TaskManager. + +In order to run the `mesos-appmaster.sh` script you have to define `mesos.master` in the `flink-conf.yaml` or pass it via `-Dmesos.master=...` to the Java process. +Additionally, you should define the number of task managers which are started by Mesos via `mesos.initial-tasks`. +This value can also be defined in the `flink-conf.yaml` or passed as a Java property. + +When executing `mesos-appmaster.sh`, it will create a job manager on the machine where you executed the script. +In contrast to that, the task managers will be run as Mesos tasks in the Mesos cluster. + + General configuration + +It is possible to completely parameterize a Mesos application through Java properties passed to the Mesos application master. +This also allows to specify general Flink configuration parameters. +For example: + +bin/mesos-appmaster.sh \ +-Dmesos.master=master.foobar.org:5050 +-Djobmanager.heap.mb=1024 \ +-Djobmanager.rpc.port=6123 \ +-Djobmanager.web.port=8081 \ +-Dmesos.initial-tasks=10 \ +-Dmesos.resourcemanager.tasks.mem=4096 \ +-Dtaskmanager.heap.mb=3500 \ +-Dtaskmanager.numberOfTaskSlots=2 \ +-Dparallelism.default=10 ### High Availability -You will need to run a service like Marathon or Apache Aurora which takes care -of restarting the Flink master process in case of node or process failures. In -addition, Zookeeper needs to be configured like described in the -[High Availability section of the Flink docs]({{ site.baseurl }}/setup/jobmanager_high_availability.html) +You will need to run a service like Marathon or Apache Aurora which takes care of restarting the Flink master process in case of node or process failures. +In addition, Zookeeper needs to be configured like described in the [High Availability section of the Flink docs]({{ site.baseurl }}/setup/jobmanager_high_availability.html) -For the reconciliation of tasks to work correctly, please also set -`recovery.zookeeper.path.mesos-workers` to a valid Zookeeper path. +For the reconciliation of tasks to work correctly, please also set `recovery.zookeeper.path.mesos-workers` to a valid Zookeeper path. Marathon -Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. In -particular, it should also adjust any configuration parameters for the Flink -cluster. +Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. +In particular, it should also adjust any configuration parameters for the Flink cluster. Here is an example configuration for Marathon: { -"id": "basic-0", -"cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -DconfigEntry=configValue -DanotherEntry=anotherValue ...", +"id": "flink", +"cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Djobmanager.web.port=8081 -Dmesos.initial-tasks=1 -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2 -Dmesos.resourcemanager.tasks.cpus=1", "cpus": 1.0, -"mem": 2048, +"mem": 1024 } +When running Flink with Marathon, the whole Flink cluster including the job manager will be run as Mesos tasks in the Mesos cluster.
[GitHub] flink pull request #3236: [FLINK-5494] [docs] Add more details to the Mesos ...
Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3236#discussion_r99062197 --- Diff: docs/setup/mesos.md --- @@ -107,29 +107,58 @@ Unvierse. Please use the following workaround in the meantime: ## Mesos without DC/OS -Let's take a look at how to setup Flink on Mesos without DC/OS. +You can also run Mesos without DC/OS. -### Prerequisites +### Installing Mesos -Please follow the -[instructions on how to setup Mesos on the official website](http://mesos.apache.org/documentation/latest/). +Please follow the [instructions on how to setup Mesos on the official website](http://mesos.apache.org/documentation/latest/getting-started/). + +After installation you have to configure the set of master and agent nodes by creating the files `MESOS_HOME/etc/mesos/masters` and `MESOS_HOME/etc/mesos/slaves`. +These files contain in each row a single hostname on which the respective component will be started (assuming SSH access to these nodes). + +Next you have to create `MESOS_HOME/etc/mesos/mesos-master-env.sh` or use the template found in the same directory. +In this file, you have to define + +export MESOS_work_dir=WORK_DIRECTORY + +and it is recommended to uncommment + +export MESOS_log_dir=LOGGING_DIRECTORY + + +In order to configure the Mesos agents, you have to create `MESOS_HOME/etc/mesos/mesos-agent-env.sh` or use the template found in the same directory. +You have to configure + +export MESOS_master=MASTER_HOSTNAME:MASTER_PORT + +and uncomment + +export MESOS_log_dir=LOGGING_DIRECTORY +export MESOS_work_dir=WORK_DIRECTORY + + Mesos Library + +In order to run Java applications with Mesos you have to export `MESOS_NATIVE_JAVA_LIBRARY=MESOS_HOME/lib/libmesos.so`. --- End diff -- On Mac, the native library name seems to be `libmesos.dylib`. Might be good to inform in the docs what OS the instructions is assuming, or just add a small comment about MacOS here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3236: [FLINK-5494] [docs] Add more details to the Mesos ...
Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3236#discussion_r99063368 --- Diff: docs/setup/mesos.md --- @@ -107,29 +107,58 @@ Unvierse. Please use the following workaround in the meantime: ## Mesos without DC/OS -Let's take a look at how to setup Flink on Mesos without DC/OS. +You can also run Mesos without DC/OS. -### Prerequisites +### Installing Mesos -Please follow the -[instructions on how to setup Mesos on the official website](http://mesos.apache.org/documentation/latest/). +Please follow the [instructions on how to setup Mesos on the official website](http://mesos.apache.org/documentation/latest/getting-started/). + +After installation you have to configure the set of master and agent nodes by creating the files `MESOS_HOME/etc/mesos/masters` and `MESOS_HOME/etc/mesos/slaves`. +These files contain in each row a single hostname on which the respective component will be started (assuming SSH access to these nodes). + +Next you have to create `MESOS_HOME/etc/mesos/mesos-master-env.sh` or use the template found in the same directory. +In this file, you have to define + +export MESOS_work_dir=WORK_DIRECTORY + +and it is recommended to uncommment + +export MESOS_log_dir=LOGGING_DIRECTORY + + +In order to configure the Mesos agents, you have to create `MESOS_HOME/etc/mesos/mesos-agent-env.sh` or use the template found in the same directory. +You have to configure + +export MESOS_master=MASTER_HOSTNAME:MASTER_PORT + +and uncomment + +export MESOS_log_dir=LOGGING_DIRECTORY +export MESOS_work_dir=WORK_DIRECTORY + + Mesos Library + +In order to run Java applications with Mesos you have to export `MESOS_NATIVE_JAVA_LIBRARY=MESOS_HOME/lib/libmesos.so`. + + Deploying Mesos + +In order to start your mesos cluster, use the deployment script `MESOS_HOME/sbin/mesos-start-cluster.sh`. +In order to stop your mesos cluster, use the deployment script `MESOS_HOME/sbin/mesos-stop-cluster.sh`. +More information about the deployment scripts can be found [here](http://mesos.apache.org/documentation/latest/deploy-scripts/). + +### Installing Marathon + +Optionally, you may also [install Marathon](https://mesosphere.github.io/marathon/docs/) which will be necessary to run Flink in high availability (HA) mode. ### Optional dependencies --- End diff -- This title is a bit not synced with the other title changes: "Installing Marathon" was singled out if "Optional dependencies", even though Marathon is still optional. Perhaps "Installing Maration (Optional)" and change this also to "Installing a Distributed Filesystem (Optional)". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3236: [FLINK-5494] [docs] Add more details to the Mesos ...
Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3236#discussion_r99063747 --- Diff: docs/setup/mesos.md --- @@ -145,60 +174,75 @@ If set to 'docker', specify the image name: In the `/bin` directory of the Flink distribution, you find two startup scripts which manage the Flink processes in a Mesos cluster: -1. mesos-appmaster.sh - This starts the Mesos application master which will register the Mesos - scheduler. It is also responsible for starting up the worker nodes. +1. `mesos-appmaster.sh` + This starts the Mesos application master which will register the Mesos scheduler. + It is also responsible for starting up the worker nodes. -2. mesos-taskmanager.sh - The entry point for the Mesos worker processes. You don't need to explicitly - execute this script. It is automatically launched by the Mesos worker node to - bring up a new TaskManager. +2. `mesos-taskmanager.sh` + The entry point for the Mesos worker processes. + You don't need to explicitly execute this script. + It is automatically launched by the Mesos worker node to bring up a new TaskManager. + +In order to run the `mesos-appmaster.sh` script you have to define `mesos.master` in the `flink-conf.yaml` or pass it via `-Dmesos.master=...` to the Java process. +Additionally, you should define the number of task managers which are started by Mesos via `mesos.initial-tasks`. +This value can also be defined in the `flink-conf.yaml` or passed as a Java property. + +When executing `mesos-appmaster.sh`, it will create a job manager on the machine where you executed the script. +In contrast to that, the task managers will be run as Mesos tasks in the Mesos cluster. + + General configuration + +It is possible to completely parameterize a Mesos application through Java properties passed to the Mesos application master. +This also allows to specify general Flink configuration parameters. +For example: + +bin/mesos-appmaster.sh \ +-Dmesos.master=master.foobar.org:5050 +-Djobmanager.heap.mb=1024 \ +-Djobmanager.rpc.port=6123 \ +-Djobmanager.web.port=8081 \ +-Dmesos.initial-tasks=10 \ +-Dmesos.resourcemanager.tasks.mem=4096 \ +-Dtaskmanager.heap.mb=3500 \ +-Dtaskmanager.numberOfTaskSlots=2 \ +-Dparallelism.default=10 ### High Availability -You will need to run a service like Marathon or Apache Aurora which takes care -of restarting the Flink master process in case of node or process failures. In -addition, Zookeeper needs to be configured like described in the -[High Availability section of the Flink docs]({{ site.baseurl }}/setup/jobmanager_high_availability.html) +You will need to run a service like Marathon or Apache Aurora which takes care of restarting the Flink master process in case of node or process failures. +In addition, Zookeeper needs to be configured like described in the [High Availability section of the Flink docs]({{ site.baseurl }}/setup/jobmanager_high_availability.html) -For the reconciliation of tasks to work correctly, please also set -`recovery.zookeeper.path.mesos-workers` to a valid Zookeeper path. +For the reconciliation of tasks to work correctly, please also set `recovery.zookeeper.path.mesos-workers` to a valid Zookeeper path. Marathon -Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. In -particular, it should also adjust any configuration parameters for the Flink -cluster. +Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. +In particular, it should also adjust any configuration parameters for the Flink cluster. Here is an example configuration for Marathon: { -"id": "basic-0", -"cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -DconfigEntry=configValue -DanotherEntry=anotherValue ...", +"id": "flink", +"cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Djobmanager.web.port=8081 -Dmesos.initial-tasks=1 -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2 -Dmesos.resourcemanager.tasks.cpus=1", "cpus": 1.0, -"mem": 2048, +"mem": 1024 } +When running Flink with Marathon, the whole Flink cluster including the job manager will be run as Mesos tasks in the Mesos cluster. + ### Configuration parameters Mesos configuration entries +`mesos.initial-tasks`: The initial workers to bring up when the master starts (**DEFAULT**: The number of workers specified at cluster startup).
[GitHub] flink pull request #3236: [FLINK-5494] [docs] Add more details to the Mesos ...
Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/3236#discussion_r99061517 --- Diff: docs/setup/mesos.md --- @@ -145,60 +174,75 @@ If set to 'docker', specify the image name: In the `/bin` directory of the Flink distribution, you find two startup scripts which manage the Flink processes in a Mesos cluster: -1. mesos-appmaster.sh - This starts the Mesos application master which will register the Mesos - scheduler. It is also responsible for starting up the worker nodes. +1. `mesos-appmaster.sh` + This starts the Mesos application master which will register the Mesos scheduler. + It is also responsible for starting up the worker nodes. -2. mesos-taskmanager.sh - The entry point for the Mesos worker processes. You don't need to explicitly - execute this script. It is automatically launched by the Mesos worker node to - bring up a new TaskManager. +2. `mesos-taskmanager.sh` + The entry point for the Mesos worker processes. + You don't need to explicitly execute this script. + It is automatically launched by the Mesos worker node to bring up a new TaskManager. + +In order to run the `mesos-appmaster.sh` script you have to define `mesos.master` in the `flink-conf.yaml` or pass it via `-Dmesos.master=...` to the Java process. +Additionally, you should define the number of task managers which are started by Mesos via `mesos.initial-tasks`. +This value can also be defined in the `flink-conf.yaml` or passed as a Java property. + +When executing `mesos-appmaster.sh`, it will create a job manager on the machine where you executed the script. +In contrast to that, the task managers will be run as Mesos tasks in the Mesos cluster. + + General configuration + +It is possible to completely parameterize a Mesos application through Java properties passed to the Mesos application master. +This also allows to specify general Flink configuration parameters. +For example: + +bin/mesos-appmaster.sh \ +-Dmesos.master=master.foobar.org:5050 +-Djobmanager.heap.mb=1024 \ +-Djobmanager.rpc.port=6123 \ +-Djobmanager.web.port=8081 \ +-Dmesos.initial-tasks=10 \ +-Dmesos.resourcemanager.tasks.mem=4096 \ +-Dtaskmanager.heap.mb=3500 \ +-Dtaskmanager.numberOfTaskSlots=2 \ +-Dparallelism.default=10 ### High Availability -You will need to run a service like Marathon or Apache Aurora which takes care -of restarting the Flink master process in case of node or process failures. In -addition, Zookeeper needs to be configured like described in the -[High Availability section of the Flink docs]({{ site.baseurl }}/setup/jobmanager_high_availability.html) +You will need to run a service like Marathon or Apache Aurora which takes care of restarting the Flink master process in case of node or process failures. +In addition, Zookeeper needs to be configured like described in the [High Availability section of the Flink docs]({{ site.baseurl }}/setup/jobmanager_high_availability.html) -For the reconciliation of tasks to work correctly, please also set -`recovery.zookeeper.path.mesos-workers` to a valid Zookeeper path. +For the reconciliation of tasks to work correctly, please also set `recovery.zookeeper.path.mesos-workers` to a valid Zookeeper path. Marathon -Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. In -particular, it should also adjust any configuration parameters for the Flink -cluster. +Marathon needs to be set up to launch the `bin/mesos-appmaster.sh` script. +In particular, it should also adjust any configuration parameters for the Flink cluster. Here is an example configuration for Marathon: { -"id": "basic-0", -"cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -DconfigEntry=configValue -DanotherEntry=anotherValue ...", +"id": "flink", +"cmd": "$FLINK_HOME/bin/mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Djobmanager.web.port=8081 -Dmesos.initial-tasks=1 -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2 -Dmesos.resourcemanager.tasks.cpus=1", "cpus": 1.0, -"mem": 2048, +"mem": 1024 } +When running Flink with Marathon, the whole Flink cluster including the job manager will be run as Mesos tasks in the Mesos cluster. + ### Configuration parameters Mesos configuration entries --- End diff -- Should we add the Mesos config to `setup/config.md` as well? --- If your project is set up for it, you can reply to this email
[GitHub] flink issue #3149: FLINK-2168 Add HBaseTableSource
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 @fhueske - Please have a look at the javadoc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2168) Add HBaseTableSource
[ https://issues.apache.org/jira/browse/FLINK-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849460#comment-15849460 ] ASF GitHub Bot commented on FLINK-2168: --- Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 @fhueske - Please have a look at the javadoc. > Add HBaseTableSource > > > Key: FLINK-2168 > URL: https://issues.apache.org/jira/browse/FLINK-2168 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: ramkrishna.s.vasudevan >Priority: Minor > > Add a {{HBaseTableSource}} to read data from a HBase table. The > {{HBaseTableSource}} should implement the {{ProjectableTableSource}} > (FLINK-3848) and {{FilterableTableSource}} (FLINK-3849) interfaces. > The implementation can be based on Flink's {{TableInputFormat}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-2168) Add HBaseTableSource
[ https://issues.apache.org/jira/browse/FLINK-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849443#comment-15849443 ] ASF GitHub Bot commented on FLINK-2168: --- Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 > For now I'd suggest to keep the scope of the PR as it is right now. A bit more Java documentation on HBaseTableSource to explain how it is used would be great. We can implement the NestedFieldsProjectableTableSource and the changes to HBaseTableSource in a follow up issue. +1 for this. I can add some more javadoc to it. BTW am trying to checkout these Projections and using the ProjectabletableSource. Will be back on it. Thanks to every one for all the comments and feedback. > Add HBaseTableSource > > > Key: FLINK-2168 > URL: https://issues.apache.org/jira/browse/FLINK-2168 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: ramkrishna.s.vasudevan >Priority: Minor > > Add a {{HBaseTableSource}} to read data from a HBase table. The > {{HBaseTableSource}} should implement the {{ProjectableTableSource}} > (FLINK-3848) and {{FilterableTableSource}} (FLINK-3849) interfaces. > The implementation can be based on Flink's {{TableInputFormat}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3149: FLINK-2168 Add HBaseTableSource
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 > For now I'd suggest to keep the scope of the PR as it is right now. A bit more Java documentation on HBaseTableSource to explain how it is used would be great. We can implement the NestedFieldsProjectableTableSource and the changes to HBaseTableSource in a follow up issue. +1 for this. I can add some more javadoc to it. BTW am trying to checkout these Projections and using the ProjectabletableSource. Will be back on it. Thanks to every one for all the comments and feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Comment Edited] (FLINK-5668) Reduce dependency on HDFS at job startup time
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849138#comment-15849138 ] Bill Liu edited comment on FLINK-5668 at 2/1/17 11:56 PM: -- thanks [~wheat9] for filling the full contexts. YARN's own fault tolerance and high availability relies on HDFS , but It doesn't mean Flink-on-Yarn has to depend on HDFS. Not to mention some of the HDFS dependency is not necessary at all. For the taskmanager configuration file, I take a deep look at the code, The taskmaster-config is cloned from baseConfig and then made a very slitty change on it. ``` final Configuration taskManagerConfig = BootstrapTools.generateTaskManagerConfiguration( config, akkaHostname, akkaPort, slotsPerTaskManager, TASKMANAGER_REGISTRATION_TIMEOUT); public static Configuration generateTaskManagerConfiguration( Configuration baseConfig, String jobManagerHostname, int jobManagerPort, int numSlots, FiniteDuration registrationTimeout) { Configuration cfg = baseConfig.clone(); cfg.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, jobManagerHostname); cfg.setInteger(ConfigConstants.JOB_MANAGER_IPC_PORT_KEY, jobManagerPort); cfg.setString(ConfigConstants.TASK_MANAGER_MAX_REGISTRATION_DURATION, registrationTimeout.toString()); if (numSlots != -1){ cfg.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, numSlots); } return cfg; } ``` [~StephanEwen], If JobManager web server is not a good place to share files, jobmanager don't need create a local taskmanager-config.yaml at all, it could just pass the the base config file and some dynamic properties to override the value in base config. was (Author: bill.liu8904): thanks [~wheat9] for filling the full contexts. YARN's own fault tolerance and high availability relies on HDFS , but It doesn't mean Flink-on-Yarn has to depend on HDFS. Especially some of the HDFS dependency is not necessary at all. For the taskmanager configuration file, I take a deep look at the code, The taskmaster-config is cloned from baseConfig and then made a very slitty change on it. ``` final Configuration taskManagerConfig = BootstrapTools.generateTaskManagerConfiguration( config, akkaHostname, akkaPort, slotsPerTaskManager, TASKMANAGER_REGISTRATION_TIMEOUT); public static Configuration generateTaskManagerConfiguration( Configuration baseConfig, String jobManagerHostname, int jobManagerPort, int numSlots, FiniteDuration registrationTimeout) { Configuration cfg = baseConfig.clone(); cfg.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, jobManagerHostname); cfg.setInteger(ConfigConstants.JOB_MANAGER_IPC_PORT_KEY, jobManagerPort); cfg.setString(ConfigConstants.TASK_MANAGER_MAX_REGISTRATION_DURATION, registrationTimeout.toString()); if (numSlots != -1){ cfg.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, numSlots); } return cfg; } ``` [~StephanEwen], If JobManager web server is not a good place to share files, jobmanager don't need create a local taskmanager-config.yaml at all, it could just pass the the base config file and some dynamic properties to override the value in base config. > Reduce dependency on HDFS at job startup time > - > > Key: FLINK-5668 > URL: https://issues.apache.org/jira/browse/FLINK-5668 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Bill Liu > Original Estimate: 48h > Remaining Estimate: 48h > > When create a Flink cluster on Yarn, JobManager depends on HDFS to share > taskmanager-conf.yaml with TaskManager. > It's better to share the taskmanager-conf.yaml on JobManager Web server > instead of HDFS, which could reduce the HDFS dependency at job startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (FLINK-5668) Reduce dependency on HDFS at job startup time
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849138#comment-15849138 ] Bill Liu edited comment on FLINK-5668 at 2/1/17 11:52 PM: -- thanks [~wheat9] for filling the full contexts. YARN's own fault tolerance and high availability relies on HDFS , but It doesn't mean Flink-on-Yarn has to depend on HDFS. Especially some of the HDFS dependency is not necessary at all. For the taskmanager configuration file, I take a deep look at the code, The taskmaster-config is cloned from baseConfig and then made a very slitty change on it. ``` final Configuration taskManagerConfig = BootstrapTools.generateTaskManagerConfiguration( config, akkaHostname, akkaPort, slotsPerTaskManager, TASKMANAGER_REGISTRATION_TIMEOUT); public static Configuration generateTaskManagerConfiguration( Configuration baseConfig, String jobManagerHostname, int jobManagerPort, int numSlots, FiniteDuration registrationTimeout) { Configuration cfg = baseConfig.clone(); cfg.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, jobManagerHostname); cfg.setInteger(ConfigConstants.JOB_MANAGER_IPC_PORT_KEY, jobManagerPort); cfg.setString(ConfigConstants.TASK_MANAGER_MAX_REGISTRATION_DURATION, registrationTimeout.toString()); if (numSlots != -1){ cfg.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, numSlots); } return cfg; } ``` [~StephanEwen], If JobManager web server is not a good place to share files, jobmanager don't need create a local taskmanager-config.yaml at all, it could just pass the the base config file and some dynamic properties to override the value in base config. was (Author: bill.liu8904): thanks [~wheat9] for filling the full contexts. YARN's own fault tolerance and high availability relies on HDFS , but It doesn't mean Flink-on-Yarn has to depend on HDFS. Especially some of the HDFS dependency is not necessary at all. For the taskmanager configuration file, I take a deep look at the code, The taskmaster-config is cloned from baseConfig and then made a very slitty change on it. ``` final Configuration taskManagerConfig = BootstrapTools.generateTaskManagerConfiguration( config, akkaHostname, akkaPort, slotsPerTaskManager, TASKMANAGER_REGISTRATION_TIMEOUT); public static Configuration generateTaskManagerConfiguration( Configuration baseConfig, String jobManagerHostname, int jobManagerPort, int numSlots, FiniteDuration registrationTimeout) { Configuration cfg = baseConfig.clone(); cfg.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, jobManagerHostname); cfg.setInteger(ConfigConstants.JOB_MANAGER_IPC_PORT_KEY, jobManagerPort); cfg.setString(ConfigConstants.TASK_MANAGER_MAX_REGISTRATION_DURATION, registrationTimeout.toString()); if (numSlots != -1){ cfg.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, numSlots); } return cfg; } ``` If JobManager web server is not a good place to share files, jobmanager don't need create a local taskmanager-config.yaml at all, it could just pass the the base config file and some dynamic properties to override the value in base config. > Reduce dependency on HDFS at job startup time > - > > Key: FLINK-5668 > URL: https://issues.apache.org/jira/browse/FLINK-5668 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Bill Liu > Original Estimate: 48h > Remaining Estimate: 48h > > When create a Flink cluster on Yarn, JobManager depends on HDFS to share > taskmanager-conf.yaml with TaskManager. > It's better to share the taskmanager-conf.yaml on JobManager Web server > instead of HDFS, which could reduce the HDFS dependency at job startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849138#comment-15849138 ] Bill Liu commented on FLINK-5668: - thanks [~wheat9] for filling the full contexts. YARN's own fault tolerance and high availability relies on HDFS , but It doesn't mean Flink-on-Yarn has to depend on HDFS. Especially some of the HDFS dependency is not necessary at all. For the taskmanager configuration file, I take a deep look at the code, The taskmaster-config is cloned from baseConfig and then made a very slitty change on it. ``` final Configuration taskManagerConfig = BootstrapTools.generateTaskManagerConfiguration( config, akkaHostname, akkaPort, slotsPerTaskManager, TASKMANAGER_REGISTRATION_TIMEOUT); public static Configuration generateTaskManagerConfiguration( Configuration baseConfig, String jobManagerHostname, int jobManagerPort, int numSlots, FiniteDuration registrationTimeout) { Configuration cfg = baseConfig.clone(); cfg.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, jobManagerHostname); cfg.setInteger(ConfigConstants.JOB_MANAGER_IPC_PORT_KEY, jobManagerPort); cfg.setString(ConfigConstants.TASK_MANAGER_MAX_REGISTRATION_DURATION, registrationTimeout.toString()); if (numSlots != -1){ cfg.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, numSlots); } return cfg; } ``` If JobManager web server is not a good place to share files, jobmanager don't need create a local taskmanager-config.yaml at all, it could just pass the the base config file and some dynamic properties to override the value in base config. > Reduce dependency on HDFS at job startup time > - > > Key: FLINK-5668 > URL: https://issues.apache.org/jira/browse/FLINK-5668 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Bill Liu > Original Estimate: 48h > Remaining Estimate: 48h > > When create a Flink cluster on Yarn, JobManager depends on HDFS to share > taskmanager-conf.yaml with TaskManager. > It's better to share the taskmanager-conf.yaml on JobManager Web server > instead of HDFS, which could reduce the HDFS dependency at job startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849048#comment-15849048 ] Haohui Mai commented on FLINK-5668: --- Please allow me to fill in some of the contexts here. The request is to have Flink support alternative filesystems (e.g., S3) in Flink-on-YARN so that our mission critical jobs can survive unavailability of HDFS. Flink-on-YARN still depends on the underlying distributed file systems to implement high availability and reliability requirements. This jira has no intentions of changing the current mechanisms in Flink. You are right on that YARN itself depends on a distributed file system to function correctly. It works well with HDFS, but in general it also works with any filesystems that implement the `FileSystem` API in Hadoop. There are multiple deployments in production that run YARN on S3. Essentially we would like to take the approach of FLINK-5631 in a more comprehensive way -- in many places the Flink-on-YARN implementation simply takes the default file system from YARN. In fact the {{Path}} objects specify the filesystem, it would be great to teach Flink to recognize the {{Path}} objects properly just as what FLINK-5631 has done, so that it becomes possible to run Flink-on-YARN on alternative filesystems such as S3. Does it make sense to you [~StephanEwen]? > Reduce dependency on HDFS at job startup time > - > > Key: FLINK-5668 > URL: https://issues.apache.org/jira/browse/FLINK-5668 > Project: Flink > Issue Type: Improvement > Components: YARN >Reporter: Bill Liu > Original Estimate: 48h > Remaining Estimate: 48h > > When create a Flink cluster on Yarn, JobManager depends on HDFS to share > taskmanager-conf.yaml with TaskManager. > It's better to share the taskmanager-conf.yaml on JobManager Web server > instead of HDFS, which could reduce the HDFS dependency at job startup. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3250: Eratosthenes Sieve, a helpful and interesting example of ...
Github user greghogan commented on the issue: https://github.com/apache/flink/pull/3250 Thank you for this contribution. Please open a [ticket](https://issues.apache.org/jira/browse/FLINK) describing your algorithm and its appropriateness as an example running on a distributed stream processor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5624) Support tumbling window on streaming tables in the SQL API
[ https://issues.apache.org/jira/browse/FLINK-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849010#comment-15849010 ] ASF GitHub Bot commented on FLINK-5624: --- GitHub user haohui opened a pull request: https://github.com/apache/flink/pull/3252 [FLINK-5624] Support tumbling window on streaming tables in the SQL API. This is a POC to add tumbling window support for streaming tables in SQL. Essentially it recognizes the `LogicalWindow` construct in Calcite and transform it to the `LogicalWindowAggregate` in flink. Feedbacks are highly appreciated. You can merge this pull request into a Git repository by running: $ git pull https://github.com/haohui/flink FLINK-5624 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3252.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3252 commit a8d4b5042e8bcd1b149f8915118c116e419690e0 Author: Haohui MaiDate: 2017-02-01T22:03:44Z [FLINK-5624] Support tumbling window on streaming tables in the SQL API. > Support tumbling window on streaming tables in the SQL API > -- > > Key: FLINK-5624 > URL: https://issues.apache.org/jira/browse/FLINK-5624 > Project: Flink > Issue Type: Bug > Components: Table API & SQL >Reporter: Haohui Mai >Assignee: Haohui Mai > > This is a follow up of FLINK-4691. > FLINK-4691 adds supports for group-windows for streaming tables. This jira > proposes to expose the functionality in the SQL layer via the {{GROUP BY}} > clauses, as described in > http://calcite.apache.org/docs/stream.html#tumbling-windows. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink pull request #3252: [FLINK-5624] Support tumbling window on streaming ...
GitHub user haohui opened a pull request: https://github.com/apache/flink/pull/3252 [FLINK-5624] Support tumbling window on streaming tables in the SQL API. This is a POC to add tumbling window support for streaming tables in SQL. Essentially it recognizes the `LogicalWindow` construct in Calcite and transform it to the `LogicalWindowAggregate` in flink. Feedbacks are highly appreciated. You can merge this pull request into a Git repository by running: $ git pull https://github.com/haohui/flink FLINK-5624 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3252.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3252 commit a8d4b5042e8bcd1b149f8915118c116e419690e0 Author: Haohui MaiDate: 2017-02-01T22:03:44Z [FLINK-5624] Support tumbling window on streaming tables in the SQL API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2908) Web interface redraw web plan when browser resized
[ https://issues.apache.org/jira/browse/FLINK-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848990#comment-15848990 ] ASF GitHub Bot commented on FLINK-2908: --- GitHub user greghogan opened a pull request: https://github.com/apache/flink/pull/3251 [FLINK-2908] [web frontend] Redraw web plan when browser resized Redraw the job plan visual graph when the browser width is increased. You can merge this pull request into a Git repository by running: $ git pull https://github.com/greghogan/flink 2908_web_interface_redraw_web_plan_when_resized Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3251.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3251 commit 085fcac0414f536806e7d9e48454fe457cfee4fd Author: Greg HoganDate: 2017-02-01T21:56:03Z [FLINK-2908] [web frontend] Redraw web plan when browser resized Redraw the job plan visual graph when the browser width is increased. > Web interface redraw web plan when browser resized > -- > > Key: FLINK-2908 > URL: https://issues.apache.org/jira/browse/FLINK-2908 > Project: Flink > Issue Type: Improvement > Components: Webfrontend >Affects Versions: 0.10.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > > The job plan graph does not resize when the user expands the browser window > (only a change in width matters). > To reproduce: 1) open the plan tab of a running or completed job in a > non-maximized browser window (not full width), 2) maximize the browser window. > Workaround: refresh the web page. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink pull request #3251: [FLINK-2908] [web frontend] Redraw web plan when b...
GitHub user greghogan opened a pull request: https://github.com/apache/flink/pull/3251 [FLINK-2908] [web frontend] Redraw web plan when browser resized Redraw the job plan visual graph when the browser width is increased. You can merge this pull request into a Git repository by running: $ git pull https://github.com/greghogan/flink 2908_web_interface_redraw_web_plan_when_resized Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3251.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3251 commit 085fcac0414f536806e7d9e48454fe457cfee4fd Author: Greg HoganDate: 2017-02-01T21:56:03Z [FLINK-2908] [web frontend] Redraw web plan when browser resized Redraw the job plan visual graph when the browser width is increased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-2908) Web interface redraw web plan when browser resized
[ https://issues.apache.org/jira/browse/FLINK-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan reassigned FLINK-2908: - Assignee: Greg Hogan > Web interface redraw web plan when browser resized > -- > > Key: FLINK-2908 > URL: https://issues.apache.org/jira/browse/FLINK-2908 > Project: Flink > Issue Type: Improvement > Components: Webfrontend >Affects Versions: 0.10.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > > The job plan graph does not resize when the user expands the browser window > (only a change in width matters). > To reproduce: 1) open the plan tab of a running or completed job in a > non-maximized browser window (not full width), 2) maximize the browser window. > Workaround: refresh the web page. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink pull request #3250: Eratosthenes Sieve, a helpful and interesting exam...
GitHub user dsnz opened a pull request: https://github.com/apache/flink/pull/3250 Eratosthenes Sieve, a helpful and interesting example of using DeltaIteration You can merge this pull request into a Git repository by running: $ git pull https://github.com/dsnz/flink master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3250.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3250 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5510) Replace Scala Future with FlinkFuture in QueryableStateClient
[ https://issues.apache.org/jira/browse/FLINK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848778#comment-15848778 ] Dawid Wysakowicz commented on FLINK-5510: - Just had a look into the code and been wondering how much of the futures do we want to replace. Just the return values in {{QueryableStateClient}} or every usage in QueryableStateClient/KvStateLookupService? If the former there is {{Future#recoverWith}} alternative missing in {{FlinkFuture}}. > Replace Scala Future with FlinkFuture in QueryableStateClient > - > > Key: FLINK-5510 > URL: https://issues.apache.org/jira/browse/FLINK-5510 > Project: Flink > Issue Type: Improvement > Components: Queryable State >Reporter: Ufuk Celebi >Assignee: Dawid Wysakowicz >Priority: Minor > > The entry point for queryable state users is the {{QueryableStateClient}} > which returns query results via Scala Futures. Since merging the initial > version of QueryableState we have introduced the FlinkFuture wrapper type in > order to not expose our Scala dependency via the API. > Since APIs tend to stick around longer than expected, it might be worthwhile > to port the exposed QueryableStateClient interface to use the FlinkFuture. > Early users can still get the Scala Future via FlinkFuture#getScalaFuture(). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (FLINK-5510) Replace Scala Future with FlinkFuture in QueryableStateClient
[ https://issues.apache.org/jira/browse/FLINK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wysakowicz reassigned FLINK-5510: --- Assignee: Dawid Wysakowicz > Replace Scala Future with FlinkFuture in QueryableStateClient > - > > Key: FLINK-5510 > URL: https://issues.apache.org/jira/browse/FLINK-5510 > Project: Flink > Issue Type: Improvement > Components: Queryable State >Reporter: Ufuk Celebi >Assignee: Dawid Wysakowicz >Priority: Minor > > The entry point for queryable state users is the {{QueryableStateClient}} > which returns query results via Scala Futures. Since merging the initial > version of QueryableState we have introduced the FlinkFuture wrapper type in > order to not expose our Scala dependency via the API. > Since APIs tend to stick around longer than expected, it might be worthwhile > to port the exposed QueryableStateClient interface to use the FlinkFuture. > Early users can still get the Scala Future via FlinkFuture#getScalaFuture(). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5510) Replace Scala Future with FlinkFuture in QueryableStateClient
[ https://issues.apache.org/jira/browse/FLINK-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848733#comment-15848733 ] Dawid Wysakowicz commented on FLINK-5510: - Hi I would be glad to work on that. > Replace Scala Future with FlinkFuture in QueryableStateClient > - > > Key: FLINK-5510 > URL: https://issues.apache.org/jira/browse/FLINK-5510 > Project: Flink > Issue Type: Improvement > Components: Queryable State >Reporter: Ufuk Celebi >Priority: Minor > > The entry point for queryable state users is the {{QueryableStateClient}} > which returns query results via Scala Futures. Since merging the initial > version of QueryableState we have introduced the FlinkFuture wrapper type in > order to not expose our Scala dependency via the API. > Since APIs tend to stick around longer than expected, it might be worthwhile > to port the exposed QueryableStateClient interface to use the FlinkFuture. > Early users can still get the Scala Future via FlinkFuture#getScalaFuture(). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5603) Use Flink's futures in QueryableStateClient
[ https://issues.apache.org/jira/browse/FLINK-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848732#comment-15848732 ] Dawid Wysakowicz commented on FLINK-5603: - I think this is a duplicate of [FLINK-5510] > Use Flink's futures in QueryableStateClient > --- > > Key: FLINK-5603 > URL: https://issues.apache.org/jira/browse/FLINK-5603 > Project: Flink > Issue Type: Sub-task > Components: Queryable State >Reporter: Ufuk Celebi >Priority: Minor > > The current {{QueryableStateClient}} exposes Scala's Futures as the return > type for queries. Since we are trying to get away from hard Scala > dependencies in the current master, we should proactively replace this with > Flink's Future interface. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (FLINK-3162) Configure number of TaskManager slots as ratio of available processors
[ https://issues.apache.org/jira/browse/FLINK-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-3162. - Resolution: Won't Fix Closing this issue since this feature is only useful for standalone clusters and implementation requires a discussion for potential refactoring of the startup scripts to also support clusters with heterogeneous memory. > Configure number of TaskManager slots as ratio of available processors > -- > > Key: FLINK-3162 > URL: https://issues.apache.org/jira/browse/FLINK-3162 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.0.0 >Reporter: Greg Hogan >Priority: Minor > > The number of TaskManager slots is currently only configurable by explicitly > setting {{taskmanager.numberOfTaskSlots}}. Make this configurable by a ratio > of the number of available processors (for example, "2", for hyperthreading). > This can work in the same way as {{taskmanager.memory.size}} and > {{taskmanager.memory.fraction}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink pull request #3249: [FLINK-3163] [scripts] Configure Flink for NUMA sy...
GitHub user greghogan opened a pull request: https://github.com/apache/flink/pull/3249 [FLINK-3163] [scripts] Configure Flink for NUMA systems Start a TaskManager on each NUMA node on each worker when the new configuration option 'taskmanager.compute.numa' is enabled. This does not affect the runtime process for the JobManager (or future ResourceManager) as the startup scripts do not provide a simple means of disambiguating masters and slaves. I expect most large clusters to run these master processes on separate machines, and for small clusters the JobManager can run alongside a TaskManager. You can merge this pull request into a Git repository by running: $ git pull https://github.com/greghogan/flink 3163_configure_flink_for_numa_systems Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3249.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3249 commit 57767e67dc7306d18df07d5224c81a8d359df620 Author: Greg HoganDate: 2017-02-01T17:13:49Z [FLINK-3163] [scripts] Configure Flink for NUMA systems Start a TaskManager on each NUMA node on each worker when the new configuration option 'taskmanager.compute.numa' is enabled. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-5696) Access to REAPER_THREAD lacks proper synchronization in SafetyNetCloseableRegistry#doRegister
Ted Yu created FLINK-5696: - Summary: Access to REAPER_THREAD lacks proper synchronization in SafetyNetCloseableRegistry#doRegister Key: FLINK-5696 URL: https://issues.apache.org/jira/browse/FLINK-5696 Project: Flink Issue Type: Bug Reporter: Ted Yu Here is related code: {code} PhantomDelegatingCloseableRef phantomRef = new PhantomDelegatingCloseableRef( wrappingProxyCloseable, this, REAPER_THREAD.referenceQueue); {code} Access to REAPER_THREAD should be guarded by REAPER_THREAD_LOCK -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-3163) Configure Flink for NUMA systems
[ https://issues.apache.org/jira/browse/FLINK-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848661#comment-15848661 ] ASF GitHub Bot commented on FLINK-3163: --- GitHub user greghogan opened a pull request: https://github.com/apache/flink/pull/3249 [FLINK-3163] [scripts] Configure Flink for NUMA systems Start a TaskManager on each NUMA node on each worker when the new configuration option 'taskmanager.compute.numa' is enabled. This does not affect the runtime process for the JobManager (or future ResourceManager) as the startup scripts do not provide a simple means of disambiguating masters and slaves. I expect most large clusters to run these master processes on separate machines, and for small clusters the JobManager can run alongside a TaskManager. You can merge this pull request into a Git repository by running: $ git pull https://github.com/greghogan/flink 3163_configure_flink_for_numa_systems Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3249.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3249 commit 57767e67dc7306d18df07d5224c81a8d359df620 Author: Greg HoganDate: 2017-02-01T17:13:49Z [FLINK-3163] [scripts] Configure Flink for NUMA systems Start a TaskManager on each NUMA node on each worker when the new configuration option 'taskmanager.compute.numa' is enabled. > Configure Flink for NUMA systems > > > Key: FLINK-3163 > URL: https://issues.apache.org/jira/browse/FLINK-3163 > Project: Flink > Issue Type: Improvement > Components: Startup Shell Scripts >Affects Versions: 1.0.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > > On NUMA systems Flink can be pinned to a single physical processor ("node") > using {{numactl --membind=$node --cpunodebind=$node }}. Commonly > available NUMA systems include the largest AWS and Google Compute instances. > For example, on an AWS c4.8xlarge system with 36 hyperthreads the user could > configure a single TaskManager with 36 slots or have Flink create two > TaskManagers bound to each of the NUMA nodes, each with 18 slots. > There may be some extra overhead in transferring network buffers between > TaskManagers on the same system, though the fraction of data shuffled in this > manner decreases with the size of the cluster. The performance improvement > from only accessing local memory looks to be significant though difficult to > benchmark. > The JobManagers may fit into NUMA nodes rather than requiring full systems. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5695) Optimize table type systems based on database semantics
[ https://issues.apache.org/jira/browse/FLINK-5695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848638#comment-15848638 ] Fabian Hueske commented on FLINK-5695: -- Hi [~sunjincheng121], I am not sure how this change would improve the Table API. IMO, it only merges the functionality of two classes and adds a condition to decide which translation path to take. I do not see the {{GroupWindowedTable}} as a problem. Instead it allows to present the user more accurate documentation (via Java / Scala Docs), for example to explain how the window alias can be used to access the window {{start}} and {{end}} properties. By merging the functionality of {{GroupedTable.select()}} and {{GroupWindowedTable.select()}} we lose that. Can you explain your motivation for this change? Thanks, Fabian > Optimize table type systems based on database semantics > --- > > Key: FLINK-5695 > URL: https://issues.apache.org/jira/browse/FLINK-5695 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: sunjincheng >Assignee: sunjincheng > > Optimize table type systems based on database semantics.As follows: > {code} > groupBy > > > Table GroupedTable > ∧ < ∧ >| select| >| | >| where| >| select| groupBy >| agg | >| ... | >| window | > ∨ -> > TableWindowedTable > <- > select > {code} > What do you think? [~fhueske] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-2168) Add HBaseTableSource
[ https://issues.apache.org/jira/browse/FLINK-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848626#comment-15848626 ] ASF GitHub Bot commented on FLINK-2168: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3149 Hi all, thanks for the feedback. Let's stick to the nested schema then. I think the best approach to support projections on nested fields is to implement a second interface (i.e., a trait without default implementation) called `NestedFieldsProjectableTableSource` as @tonycox suggested. Adding a method with default implementation to `ProjectableTableSource` would not work, because this would turn this class into a Java abstract class while it is an interface now. Using flat indicies is not a very nice solution either, IMO because it is not easy to parse. For now I'd suggest to keep the scope of the PR as it is right now. A bit more Java documentation on `HBaseTableSource` to explain how it is used would be great. We can implement the `NestedFieldsProjectableTableSource` and the changes to `HBaseTableSource` in a follow up issue. What do you think? > Add HBaseTableSource > > > Key: FLINK-2168 > URL: https://issues.apache.org/jira/browse/FLINK-2168 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: ramkrishna.s.vasudevan >Priority: Minor > > Add a {{HBaseTableSource}} to read data from a HBase table. The > {{HBaseTableSource}} should implement the {{ProjectableTableSource}} > (FLINK-3848) and {{FilterableTableSource}} (FLINK-3849) interfaces. > The implementation can be based on Flink's {{TableInputFormat}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3149: FLINK-2168 Add HBaseTableSource
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3149 Hi all, thanks for the feedback. Let's stick to the nested schema then. I think the best approach to support projections on nested fields is to implement a second interface (i.e., a trait without default implementation) called `NestedFieldsProjectableTableSource` as @tonycox suggested. Adding a method with default implementation to `ProjectableTableSource` would not work, because this would turn this class into a Java abstract class while it is an interface now. Using flat indicies is not a very nice solution either, IMO because it is not easy to parse. For now I'd suggest to keep the scope of the PR as it is right now. A bit more Java documentation on `HBaseTableSource` to explain how it is used would be great. We can implement the `NestedFieldsProjectableTableSource` and the changes to `HBaseTableSource` in a follow up issue. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-5695) Optimize table type systems based on database semantics
[ https://issues.apache.org/jira/browse/FLINK-5695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated FLINK-5695: --- Description: Optimize table type systems based on database semantics.As follows: {code} groupBy > Table GroupedTable ∧ < ∧ | select| | | | where| | select| groupBy | agg | | ... | | window | ∨ -> TableWindowedTable <- select {code} What do you think? [~fhueske] was: Optimize table type systems based on database semantics.As follows: {code} groupBy > Table GroupedTable ∧ < ∧ | select| | | | where| | select| groupBy | agg | | ... | | window | ∨ -> TableWindowedTable <- select {code} So, I want remove table Type "WindowGroupedTable". What do you think? [~fhueske] > Optimize table type systems based on database semantics > --- > > Key: FLINK-5695 > URL: https://issues.apache.org/jira/browse/FLINK-5695 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: sunjincheng >Assignee: sunjincheng > > Optimize table type systems based on database semantics.As follows: > {code} > groupBy > > > Table GroupedTable > ∧ < ∧ >| select| >| | >| where| >| select| groupBy >| agg | >| ... | >| window | > ∨ -> > TableWindowedTable > <- > select > {code} > What do you think? [~fhueske] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-5695) Optimize table type systems based on database semantics
[ https://issues.apache.org/jira/browse/FLINK-5695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated FLINK-5695: --- Description: Optimize table type systems based on database semantics.As follows: {code} groupBy > Table GroupedTable ∧ < ∧ | select| | | | where| | select| groupBy | agg | | ... | | window | ∨ -> TableWindowedTable <- select {code} So, I want remove table Type "WindowGroupedTable". What do you think? [~fhueske] was: Optimize table type systems based on database semantics.As follows: {code} groupBy > Table GroupedTable ∧ < ∧ | select| | | | where| | select| groupBy | agg | | ... | | window | ∨ -> TableWindowedTable <- select {code} So, I want remove table Type "WindowGroupedTable". > Optimize table type systems based on database semantics > --- > > Key: FLINK-5695 > URL: https://issues.apache.org/jira/browse/FLINK-5695 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: sunjincheng >Assignee: sunjincheng > > Optimize table type systems based on database semantics.As follows: > {code} > groupBy > > > Table GroupedTable > ∧ < ∧ >| select| >| | >| where| >| select| groupBy >| agg | >| ... | >| window | > ∨ -> > TableWindowedTable > <- > select > {code} > So, I want remove table Type "WindowGroupedTable". > What do you think? [~fhueske] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink pull request #3248: Optimize table type systems based on database sema...
GitHub user sunjincheng121 opened a pull request: https://github.com/apache/flink/pull/3248 Optimize table type systems based on database semantics Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html). In addition to going through the list, please provide a meaningful description of your changes. - [×] General - The pull request references the related JIRA issue ("[FLINK-5695] Optimize table type systems based on database semantics") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [ ] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [×] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/sunjincheng121/flink FLINK-5695-PR Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3248.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3248 commit 40917c6433a473f283614d9a928b60fd0a4836a6 Author: Jincheng SunDate: 2017-02-01T15:07:22Z Optimize table type systems based on database semantics --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5695) Optimize table type systems based on database semantics
[ https://issues.apache.org/jira/browse/FLINK-5695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848540#comment-15848540 ] ASF GitHub Bot commented on FLINK-5695: --- GitHub user sunjincheng121 opened a pull request: https://github.com/apache/flink/pull/3248 Optimize table type systems based on database semantics Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html). In addition to going through the list, please provide a meaningful description of your changes. - [×] General - The pull request references the related JIRA issue ("[FLINK-5695] Optimize table type systems based on database semantics") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [ ] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [×] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/sunjincheng121/flink FLINK-5695-PR Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3248.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3248 commit 40917c6433a473f283614d9a928b60fd0a4836a6 Author: Jincheng SunDate: 2017-02-01T15:07:22Z Optimize table type systems based on database semantics > Optimize table type systems based on database semantics > --- > > Key: FLINK-5695 > URL: https://issues.apache.org/jira/browse/FLINK-5695 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: sunjincheng >Assignee: sunjincheng > > Optimize table type systems based on database semantics.As follows: > {code} > groupBy > > > Table GroupedTable > ∧ < ∧ >| select| >| | >| where| >| select| groupBy >| agg | >| ... | >| window | > ∨ -> > TableWindowedTable > <- > select > {code} > So, I want remove table Type "WindowGroupedTable". -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-5695) Optimize table type systems based on database semantics
[ https://issues.apache.org/jira/browse/FLINK-5695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated FLINK-5695: --- Description: Optimize table type systems based on database semantics.As follows: {code} groupBy > Table GroupedTable ∧ < ∧ | select| | | | where| | select| groupBy | agg | | ... | | window | ∨ -> TableWindowedTable <- select {code} So, I want remove table Type "WindowGroupedTable". was: Optimize table type systems based on database semantics.As follows: {code} groupBy > Table GroupedTable ∧ < ∧ | select| | | | where| | select| groupBy | agg| | ... | | window | ∨ -> TableWindowedTable <- select So, I want remove table Type "WindowGroupedTable". {code} > Optimize table type systems based on database semantics > --- > > Key: FLINK-5695 > URL: https://issues.apache.org/jira/browse/FLINK-5695 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: sunjincheng >Assignee: sunjincheng > > Optimize table type systems based on database semantics.As follows: > {code} > groupBy > > > Table GroupedTable > ∧ < ∧ >| select| >| | >| where| >| select| groupBy >| agg | >| ... | >| window | > ∨ -> > TableWindowedTable > <- > select > {code} > So, I want remove table Type "WindowGroupedTable". -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-5695) Optimize table type systems based on database semantics
[ https://issues.apache.org/jira/browse/FLINK-5695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjincheng updated FLINK-5695: --- Description: Optimize table type systems based on database semantics.As follows: {code} groupBy > Table GroupedTable ∧ < ∧ | select| | | | where| | select| groupBy | agg| | ... | | window | ∨ -> TableWindowedTable <- select So, I want remove table Type "WindowGroupedTable". {code} was: Optimize table type systems based on database semantics.As follows: groupBy > Table GroupedTable ∧ < ∧ | select| | | | where| | select| groupBy | agg| | ... | | window | ∨ -> TableWindowedTable <- select So, I want remove table Type "WindowGroupedTable". > Optimize table type systems based on database semantics > --- > > Key: FLINK-5695 > URL: https://issues.apache.org/jira/browse/FLINK-5695 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL >Reporter: sunjincheng >Assignee: sunjincheng > > Optimize table type systems based on database semantics.As follows: > {code} > groupBy > > > Table GroupedTable > ∧ < ∧ >| select| >| | >| where| >| select| groupBy >| agg| >| ... | >| window | > ∨ -> > TableWindowedTable > <- > select > So, I want remove table Type "WindowGroupedTable". > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (FLINK-5695) Optimize table type systems based on database semantics
sunjincheng created FLINK-5695: -- Summary: Optimize table type systems based on database semantics Key: FLINK-5695 URL: https://issues.apache.org/jira/browse/FLINK-5695 Project: Flink Issue Type: Sub-task Components: Table API & SQL Reporter: sunjincheng Assignee: sunjincheng Optimize table type systems based on database semantics.As follows: groupBy > Table GroupedTable ∧ < ∧ | select| | | | where| | select| groupBy | agg| | ... | | window | ∨ -> TableWindowedTable <- select So, I want remove table Type "WindowGroupedTable". -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-2168) Add HBaseTableSource
[ https://issues.apache.org/jira/browse/FLINK-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848530#comment-15848530 ] ASF GitHub Bot commented on FLINK-2168: --- Github user wuchong commented on the issue: https://github.com/apache/flink/pull/3149 Sorry for the late response. Regarding to the `HBaseTableSchema`, I agree with that to move the `addColumn(...)` method into `HBaseTableSource`. Regarding to the nested vs flat schema, I prefer the nested schema. It is more intuitive to use. As for the nested schema doesn't support to push projections down, I think we should extend `ProjectableTableSource` to support push projections down to a composite type. We can keep the interface unchanged, i.e. `def projectFields(fields: Array[Int]): ProjectableTableSource[T]`. But the index of `fields` should be the flat index. We can use the flat field indexes to do projection pushdown even if it is a nested schema. For example, a table source with schema `a: Int, b: Row, c: Boolean`, the flat indexes of `a, b.b1, b.b2, c` are `0, 1, 2, 3`. So a project `SELECT b.b1, c FROM T` will result a `fields` `Array(1,3)`. What do you think ? For me the biggest drawback of a nested schema is the lacking support to push projections down. > Add HBaseTableSource > > > Key: FLINK-2168 > URL: https://issues.apache.org/jira/browse/FLINK-2168 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: ramkrishna.s.vasudevan >Priority: Minor > > Add a {{HBaseTableSource}} to read data from a HBase table. The > {{HBaseTableSource}} should implement the {{ProjectableTableSource}} > (FLINK-3848) and {{FilterableTableSource}} (FLINK-3849) interfaces. > The implementation can be based on Flink's {{TableInputFormat}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3149: FLINK-2168 Add HBaseTableSource
Github user wuchong commented on the issue: https://github.com/apache/flink/pull/3149 Sorry for the late response. Regarding to the `HBaseTableSchema`, I agree with that to move the `addColumn(...)` method into `HBaseTableSource`. Regarding to the nested vs flat schema, I prefer the nested schema. It is more intuitive to use. As for the nested schema doesn't support to push projections down, I think we should extend `ProjectableTableSource` to support push projections down to a composite type. We can keep the interface unchanged, i.e. `def projectFields(fields: Array[Int]): ProjectableTableSource[T]`. But the index of `fields` should be the flat index. We can use the flat field indexes to do projection pushdown even if it is a nested schema. For example, a table source with schema `a: Int, b: Row, c: Boolean`, the flat indexes of `a, b.b1, b.b2, c` are `0, 1, 2, 3`. So a project `SELECT b.b1, c FROM T` will result a `fields` `Array(1,3)`. What do you think ? For me the biggest drawback of a nested schema is the lacking support to push projections down. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3247: [FLINK-5680] [docs] Document env.ssh.opts
GitHub user greghogan opened a pull request: https://github.com/apache/flink/pull/3247 [FLINK-5680] [docs] Document env.ssh.opts Document env.ssh.opts in setup/config.html. You can merge this pull request into a Git repository by running: $ git pull https://github.com/greghogan/flink 5680_document_env_ssh_opts Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3247.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3247 commit 255d2e3feb772c1d6f822af0020e78b702937ab9 Author: Greg HoganDate: 2017-02-01T14:48:31Z [FLINK-5680] [docs] Document env.ssh.opts Document env.ssh.opts in setup/config.html. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5680) Document env.ssh.opts
[ https://issues.apache.org/jira/browse/FLINK-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848447#comment-15848447 ] ASF GitHub Bot commented on FLINK-5680: --- GitHub user greghogan opened a pull request: https://github.com/apache/flink/pull/3247 [FLINK-5680] [docs] Document env.ssh.opts Document env.ssh.opts in setup/config.html. You can merge this pull request into a Git repository by running: $ git pull https://github.com/greghogan/flink 5680_document_env_ssh_opts Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3247.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3247 commit 255d2e3feb772c1d6f822af0020e78b702937ab9 Author: Greg HoganDate: 2017-02-01T14:48:31Z [FLINK-5680] [docs] Document env.ssh.opts Document env.ssh.opts in setup/config.html. > Document env.ssh.opts > - > > Key: FLINK-5680 > URL: https://issues.apache.org/jira/browse/FLINK-5680 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.2.0, 1.3.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > Fix For: 1.3.0, 1.2.1 > > > Document {{env.ssh.opts}} in {{setup/config.html}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5441) Directly allow SQL queries on a Table
[ https://issues.apache.org/jira/browse/FLINK-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848421#comment-15848421 ] ASF GitHub Bot commented on FLINK-5441: --- Github user wuchong commented on the issue: https://github.com/apache/flink/pull/3107 What about using slf4j syntax `env.sql("SELECT * FROM {} JOIN {}", table1, table2)` ? > Directly allow SQL queries on a Table > - > > Key: FLINK-5441 > URL: https://issues.apache.org/jira/browse/FLINK-5441 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Jark Wu > > Right now a user has to register a table before it can be used in SQL > queries. In order to allow more fluent programming we propose calling SQL > directly on a table. An underscore can be used to reference the current table: > {code} > myTable.sql("SELECT a, b, c FROM _ WHERE d = 12") > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3107: [FLINK-5441] [table] Directly allow SQL queries on a Tabl...
Github user wuchong commented on the issue: https://github.com/apache/flink/pull/3107 What about using slf4j syntax `env.sql("SELECT * FROM {} JOIN {}", table1, table2)` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2168) Add HBaseTableSource
[ https://issues.apache.org/jira/browse/FLINK-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848324#comment-15848324 ] ASF GitHub Bot commented on FLINK-2168: --- Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 bq.ProjectableTableSource works in scan process. Ya got it. I was just trying to relate with this HBase thing and could find that we try to read all cols and then do a flatMap and then return the required cols alone. Just read that PR to understand better. > Add HBaseTableSource > > > Key: FLINK-2168 > URL: https://issues.apache.org/jira/browse/FLINK-2168 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: ramkrishna.s.vasudevan >Priority: Minor > > Add a {{HBaseTableSource}} to read data from a HBase table. The > {{HBaseTableSource}} should implement the {{ProjectableTableSource}} > (FLINK-3848) and {{FilterableTableSource}} (FLINK-3849) interfaces. > The implementation can be based on Flink's {{TableInputFormat}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3149: FLINK-2168 Add HBaseTableSource
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 bq.ProjectableTableSource works in scan process. Ya got it. I was just trying to relate with this HBase thing and could find that we try to read all cols and then do a flatMap and then return the required cols alone. Just read that PR to understand better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3149: FLINK-2168 Add HBaseTableSource
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 I think going through the PR for https://issues.apache.org/jira/browse/FLINK-3848 - I think we try to project only the required columns. Similarly we could do here also. So my and @tonycox 's suggestion of having a new way of ProjectableTableSource could help here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2168) Add HBaseTableSource
[ https://issues.apache.org/jira/browse/FLINK-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848322#comment-15848322 ] ASF GitHub Bot commented on FLINK-2168: --- Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 I think going through the PR for https://issues.apache.org/jira/browse/FLINK-3848 - I think we try to project only the required columns. Similarly we could do here also. So my and @tonycox 's suggestion of having a new way of ProjectableTableSource could help here. > Add HBaseTableSource > > > Key: FLINK-2168 > URL: https://issues.apache.org/jira/browse/FLINK-2168 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: ramkrishna.s.vasudevan >Priority: Minor > > Add a {{HBaseTableSource}} to read data from a HBase table. The > {{HBaseTableSource}} should implement the {{ProjectableTableSource}} > (FLINK-3848) and {{FilterableTableSource}} (FLINK-3849) interfaces. > The implementation can be based on Flink's {{TableInputFormat}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-2168) Add HBaseTableSource
[ https://issues.apache.org/jira/browse/FLINK-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848319#comment-15848319 ] ASF GitHub Bot commented on FLINK-2168: --- Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3149 @ramkrish86 ProjectableTableSource works in scan process. Without it TableScan is scanning all collumns and applying flatMap function on the whole data to project and filter. It's inefficient compared with pushing a projection and filtering right inside of scan proccess > Add HBaseTableSource > > > Key: FLINK-2168 > URL: https://issues.apache.org/jira/browse/FLINK-2168 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: ramkrishna.s.vasudevan >Priority: Minor > > Add a {{HBaseTableSource}} to read data from a HBase table. The > {{HBaseTableSource}} should implement the {{ProjectableTableSource}} > (FLINK-3848) and {{FilterableTableSource}} (FLINK-3849) interfaces. > The implementation can be based on Flink's {{TableInputFormat}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3149: FLINK-2168 Add HBaseTableSource
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3149 @ramkrish86 ProjectableTableSource works in scan process. Without it TableScan is scanning all collumns and applying flatMap function on the whole data to project and filter. It's inefficient compared with pushing a projection and filtering right inside of scan proccess --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #3149: FLINK-2168 Add HBaseTableSource
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 ` Table result = tableEnv .sql("SELECT test1.f1.q1, test1.f2.q2 FROM test1 where test1.f1.q1 < 103");` I just tried this query and it works with or without ProjectableTableSource. So just wanted to know when does the projection come into place. I thought without Projection things of this sort may not work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2168) Add HBaseTableSource
[ https://issues.apache.org/jira/browse/FLINK-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848311#comment-15848311 ] ASF GitHub Bot commented on FLINK-2168: --- Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/3149 ` Table result = tableEnv .sql("SELECT test1.f1.q1, test1.f2.q2 FROM test1 where test1.f1.q1 < 103");` I just tried this query and it works with or without ProjectableTableSource. So just wanted to know when does the projection come into place. I thought without Projection things of this sort may not work. > Add HBaseTableSource > > > Key: FLINK-2168 > URL: https://issues.apache.org/jira/browse/FLINK-2168 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: ramkrishna.s.vasudevan >Priority: Minor > > Add a {{HBaseTableSource}} to read data from a HBase table. The > {{HBaseTableSource}} should implement the {{ProjectableTableSource}} > (FLINK-3848) and {{FilterableTableSource}} (FLINK-3849) interfaces. > The implementation can be based on Flink's {{TableInputFormat}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-2821) Change Akka configuration to allow accessing actors from different URLs
[ https://issues.apache.org/jira/browse/FLINK-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848287#comment-15848287 ] Philipp von dem Bussche commented on FLINK-2821: Hello [~mxm], after being quiet for a while I wanted to feed back on the setup I am running at the moment. To recap (I had to think about my setup myself again after not spending much time on it lately ;) ): - job manager and task manager run in Docker containers - I am using an orchestration engine called Rancher on top of docker which also introduces another set of IP addresses / network on top of Docker. Since I am communicating to the JobManager from within the Docker / Rancher network as well as from outside (from my local buildserver) I had to have the JobManager register to a hostname that is resolvable on the Internet. Both the task manager (coming from within the Docker / Rancher network) as well as the build server connect via the internet host name now. Obviously since the task manager would live right next to the job manager the preferred solution would be for the task manager to connect locally (meaning through the Docker / Rancher network) but since one can only specify one listener address it has to go through the internet host name. However this does not solve the problem completly yet because if I just tell the JobManager to bind to the internet host name I am getting the following exception while JobManager starts up: 017-02-01 11:13:51,997 INFO org.apache.flink.util.NetUtils - Unable to allocate on port 6123, due to error: Address not available (Bind failed) 2017-02-01 11:13:51,999 ERROR org.apache.flink.runtime.jobmanager.JobManager - Failed to run JobManager. java.lang.RuntimeException: Unable to do further retries starting the actor system at org.apache.flink.runtime.jobmanager.JobManager$.retryOnBindException(JobManager.scala:2136) at org.apache.flink.runtime.jobmanager.JobManager$.runJobManager(JobManager.scala:2076) at org.apache.flink.runtime.jobmanager.JobManager$$anon$12.call(JobManager.scala:1971) at org.apache.flink.runtime.jobmanager.JobManager$$anon$12.call(JobManager.scala:1969) at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:29) at org.apache.flink.runtime.jobmanager.JobManager$.main(JobManager.scala:1969) at org.apache.flink.runtime.jobmanager.JobManager.main(JobManager.scala) So additionally I had to put the Docker IP address of the JobManager container into /etc/hosts resolving to the internet host name so that it tries to bind on the Docker IP address rather than the Amazon AWS IP address (which is the IP that the internet host name resolves to). This works for me now, I would not call it ideal though. I have to admit I have not tested this with the latest RC, will do that later in the week. Thanks > Change Akka configuration to allow accessing actors from different URLs > --- > > Key: FLINK-2821 > URL: https://issues.apache.org/jira/browse/FLINK-2821 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Reporter: Robert Metzger >Assignee: Maximilian Michels > Fix For: 1.2.0 > > > Akka expects the actor's URL to be exactly matching. > As pointed out here, cases where users were complaining about this: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-trying-to-access-JM-through-proxy-td3018.html > - Proxy routing (as described here, send to the proxy URL, receiver > recognizes only original URL) > - Using hostname / IP interchangeably does not work (we solved this by > always putting IP addresses into URLs, never hostnames) > - Binding to multiple interfaces (any local 0.0.0.0) does not work. Still > no solution to that (but seems not too much of a restriction) > I am aware that this is not possible due to Akka, so it is actually not a > Flink bug. But I think we should track the resolution of the issue here > anyways because its affecting our user's satisfaction. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5353) Elasticsearch Sink loses well-formed documents when there are malformed documents
[ https://issues.apache.org/jira/browse/FLINK-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848275#comment-15848275 ] ASF GitHub Bot commented on FLINK-5353: --- Github user fpompermaier commented on the issue: https://github.com/apache/flink/pull/3246 I think that's great! > Elasticsearch Sink loses well-formed documents when there are malformed > documents > - > > Key: FLINK-5353 > URL: https://issues.apache.org/jira/browse/FLINK-5353 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.1.3 >Reporter: Flavio Pompermaier >Assignee: Tzu-Li (Gordon) Tai > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3246: [FLINK-5353] [elasticsearch] User-provided failure handle...
Github user fpompermaier commented on the issue: https://github.com/apache/flink/pull/3246 I think that's great! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5353) Elasticsearch Sink loses well-formed documents when there are malformed documents
[ https://issues.apache.org/jira/browse/FLINK-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848250#comment-15848250 ] ASF GitHub Bot commented on FLINK-5353: --- Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3246 @fpompermaier @static-max tagging you so that you're aware of this PR. Will be great to hear feedback from you! > Elasticsearch Sink loses well-formed documents when there are malformed > documents > - > > Key: FLINK-5353 > URL: https://issues.apache.org/jira/browse/FLINK-5353 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.1.3 >Reporter: Flavio Pompermaier >Assignee: Tzu-Li (Gordon) Tai > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3246: [FLINK-5353] [elasticsearch] User-provided failure handle...
Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/3246 @fpompermaier @static-max tagging you so that you're aware of this PR. Will be great to hear feedback from you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #3246: [FLINK-5353] [elasticsearch] User-provided failure...
GitHub user tzulitai opened a pull request: https://github.com/apache/flink/pull/3246 [FLINK-5353] [elasticsearch] User-provided failure handler for ElasticsearchSink Only the last commit is relevant. This PR is based on #3112 so that the functionality is added for all Elasticsearch versions. It is also based on the work of @static-max in #2861, but with improvements for a more general approach to solve both [FLINK-5353](https://issues.apache.org/jira/browse/FLINK-5353) and [FLINK-5122](https://issues.apache.org/jira/browse/FLINK-5122). The PR is more of a preview of the functionality for our Elasticsearch users, as proper testing for the expected behaviours is still pending / Javadoc updates. With this PR, users can now provide a `ActionRequestFailureHandler` that controls how to deal with a failed Elasticsearch request. Example: ``` private static class ExampleActionRequestFailureHandler implements ActionRequestFailureHandler { @Override boolean onFailure(ActionRequest action, Throwable failure, RequestIndexer indexer) { if (failure instanceOf EsRejectedExecutionException) { indexer.add(action); return false; } else if (failure instanceOf ElasticsearchParseException) { // simply drop request without failing sink return false; } else { // for all other failures, fail the sink return true; } } } ``` The above example will let the sink re-add requests that failed due to queue capacity saturation and drop requests with malformed documents, without failing the sink. For all other failures, the sink will fail. The handler is provided to the constructor of `ElasticsearchSink`. Note that the `onFailure` method is called only after the internal `BulkProcessor` finishes all backoff retry attempts for temporary `EsRejectedExecutionException`s (saturated ES node queue capacity). ### Alternatives: 1. Currently, all failures reported in the `afterBulk` callback will be used to invoke `onFailure` of the handler. We can perhaps just pass some specific exceptions for the user to decide on how to handle them. 2. The original `ElasticsearchSinkFunction` and new `ActionRequestFailureHandler` interface could perhaps be integrated into one. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tzulitai/flink FLINK-5353 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3246.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3246 commit bf84c0aa91924aca779189b628a656d9b54e36db Author: Mike DiasDate: 2016-11-07T20:09:48Z [FLINK-4988] Elasticsearch 5.x support commit 4efb2d497759b3688fe80261df19bb1e1c3f1c21 Author: Tzu-Li (Gordon) Tai Date: 2017-01-12T13:21:56Z [FLINK-4988] [elasticsearch] Restructure Elasticsearch connectors commit be35862383b69c0d65fefd2c48c772a81fceb8d5 Author: Max Kuklinski Date: 2016-11-23T16:54:11Z [FLINK-5122] [elasticsearch] Retry temporary Elasticsearch request errors. Covered exceptions are: Timeouts, No Master, UnavailableShardsException, bulk queue on node full commit fa67e8be5ca8e90d47ad12e947eac7b695e8fcca Author: Tzu-Li (Gordon) Tai Date: 2017-01-30T05:55:26Z [FLINK-5353] [elasticsearch] User-provided failure handler for ElasticsearchSink This commit fixes both FLINK-5353 and FLINK-5122. It allows users to implement a failure handler to control how failed action requests are dealt with. The commit also includes general improvements to FLINK-5122: 1. Use the built-in backoff functionality in the Elasticsearch BulkProcessor (not available for Elasticsearch 1.x) 2. Migrate the `checkErrorAndRetryBulk` functionality to the new failure handler --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-5353) Elasticsearch Sink loses well-formed documents when there are malformed documents
[ https://issues.apache.org/jira/browse/FLINK-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848248#comment-15848248 ] ASF GitHub Bot commented on FLINK-5353: --- GitHub user tzulitai opened a pull request: https://github.com/apache/flink/pull/3246 [FLINK-5353] [elasticsearch] User-provided failure handler for ElasticsearchSink Only the last commit is relevant. This PR is based on #3112 so that the functionality is added for all Elasticsearch versions. It is also based on the work of @static-max in #2861, but with improvements for a more general approach to solve both [FLINK-5353](https://issues.apache.org/jira/browse/FLINK-5353) and [FLINK-5122](https://issues.apache.org/jira/browse/FLINK-5122). The PR is more of a preview of the functionality for our Elasticsearch users, as proper testing for the expected behaviours is still pending / Javadoc updates. With this PR, users can now provide a `ActionRequestFailureHandler` that controls how to deal with a failed Elasticsearch request. Example: ``` private static class ExampleActionRequestFailureHandler implements ActionRequestFailureHandler { @Override boolean onFailure(ActionRequest action, Throwable failure, RequestIndexer indexer) { if (failure instanceOf EsRejectedExecutionException) { indexer.add(action); return false; } else if (failure instanceOf ElasticsearchParseException) { // simply drop request without failing sink return false; } else { // for all other failures, fail the sink return true; } } } ``` The above example will let the sink re-add requests that failed due to queue capacity saturation and drop requests with malformed documents, without failing the sink. For all other failures, the sink will fail. The handler is provided to the constructor of `ElasticsearchSink`. Note that the `onFailure` method is called only after the internal `BulkProcessor` finishes all backoff retry attempts for temporary `EsRejectedExecutionException`s (saturated ES node queue capacity). ### Alternatives: 1. Currently, all failures reported in the `afterBulk` callback will be used to invoke `onFailure` of the handler. We can perhaps just pass some specific exceptions for the user to decide on how to handle them. 2. The original `ElasticsearchSinkFunction` and new `ActionRequestFailureHandler` interface could perhaps be integrated into one. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tzulitai/flink FLINK-5353 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3246.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3246 commit bf84c0aa91924aca779189b628a656d9b54e36db Author: Mike DiasDate: 2016-11-07T20:09:48Z [FLINK-4988] Elasticsearch 5.x support commit 4efb2d497759b3688fe80261df19bb1e1c3f1c21 Author: Tzu-Li (Gordon) Tai Date: 2017-01-12T13:21:56Z [FLINK-4988] [elasticsearch] Restructure Elasticsearch connectors commit be35862383b69c0d65fefd2c48c772a81fceb8d5 Author: Max Kuklinski Date: 2016-11-23T16:54:11Z [FLINK-5122] [elasticsearch] Retry temporary Elasticsearch request errors. Covered exceptions are: Timeouts, No Master, UnavailableShardsException, bulk queue on node full commit fa67e8be5ca8e90d47ad12e947eac7b695e8fcca Author: Tzu-Li (Gordon) Tai Date: 2017-01-30T05:55:26Z [FLINK-5353] [elasticsearch] User-provided failure handler for ElasticsearchSink This commit fixes both FLINK-5353 and FLINK-5122. It allows users to implement a failure handler to control how failed action requests are dealt with. The commit also includes general improvements to FLINK-5122: 1. Use the built-in backoff functionality in the Elasticsearch BulkProcessor (not available for Elasticsearch 1.x) 2. Migrate the `checkErrorAndRetryBulk` functionality to the new failure handler > Elasticsearch Sink loses well-formed documents when there are malformed > documents > - > > Key: FLINK-5353 > URL: https://issues.apache.org/jira/browse/FLINK-5353 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.1.3 >Reporter: Flavio Pompermaier >Assignee: Tzu-Li (Gordon) Tai > -- This message was sent by Atlassian JIRA
[jira] [Updated] (FLINK-2168) Add HBaseTableSource
[ https://issues.apache.org/jira/browse/FLINK-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Solovev updated FLINK-2168: - Labels: (was: starter) > Add HBaseTableSource > > > Key: FLINK-2168 > URL: https://issues.apache.org/jira/browse/FLINK-2168 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: ramkrishna.s.vasudevan >Priority: Minor > > Add a {{HBaseTableSource}} to read data from a HBase table. The > {{HBaseTableSource}} should implement the {{ProjectableTableSource}} > (FLINK-3848) and {{FilterableTableSource}} (FLINK-3849) interfaces. > The implementation can be based on Flink's {{TableInputFormat}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (FLINK-5353) Elasticsearch Sink loses well-formed documents when there are malformed documents
[ https://issues.apache.org/jira/browse/FLINK-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848109#comment-15848109 ] Tzu-Li (Gordon) Tai edited comment on FLINK-5353 at 2/1/17 8:18 AM: Ok, then I think the actual cause for the well-formed documents to be missing also isn't because the {{BulkProcessor}} is dropping them due to the malformed documents, but simply because at-least-once isn't properly supported in the ES connector yet (see FLINK-5487). Any documents that were buffered in the {{BulkProcessor}} at the time of failure simply could not be recovered. Thank you for the feedback. With the suggested {{FailedActionRequestHandler}}, you can simply just drop the malformed document or re-process it if you want to. was (Author: tzulitai): Ok, then I think the actual cause for the well-formed documents to be missing also isn't because the {{BulkProcessor}} is dropping them due to the malformed documents, but simply because at-least-once isn't properly supported in the ES connector yet (see FLINK-5487). Any documents that were buffered in the {{BulkProcessor}} at the time of failure simply could not be recovered. With the suggested {{FailedActionRequestHandler}}, you can simply just drop the malformed document or re-process it if you want to. > Elasticsearch Sink loses well-formed documents when there are malformed > documents > - > > Key: FLINK-5353 > URL: https://issues.apache.org/jira/browse/FLINK-5353 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.1.3 >Reporter: Flavio Pompermaier > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (FLINK-5353) Elasticsearch Sink loses well-formed documents when there are malformed documents
[ https://issues.apache.org/jira/browse/FLINK-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tzu-Li (Gordon) Tai reassigned FLINK-5353: -- Assignee: Tzu-Li (Gordon) Tai > Elasticsearch Sink loses well-formed documents when there are malformed > documents > - > > Key: FLINK-5353 > URL: https://issues.apache.org/jira/browse/FLINK-5353 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.1.3 >Reporter: Flavio Pompermaier >Assignee: Tzu-Li (Gordon) Tai > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-5353) Elasticsearch Sink loses well-formed documents when there are malformed documents
[ https://issues.apache.org/jira/browse/FLINK-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tzu-Li (Gordon) Tai updated FLINK-5353: --- Component/s: (was: Batch Connectors and Input/Output Formats) Streaming Connectors > Elasticsearch Sink loses well-formed documents when there are malformed > documents > - > > Key: FLINK-5353 > URL: https://issues.apache.org/jira/browse/FLINK-5353 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.1.3 >Reporter: Flavio Pompermaier > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-2168) Add HBaseTableSource
[ https://issues.apache.org/jira/browse/FLINK-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848113#comment-15848113 ] ASF GitHub Bot commented on FLINK-2168: --- Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3149 @fhueske I preffer a better API with nesting. We can extend `ProjectableTableSource` with a method like this ```scala def projectNestedFields(fields: Array[String]): ProjectableTableSource[T] ``` with default realisation. override the method in HBase table source and while pushing projection down extract `fieldAccessor`s from `inputRex` and push them into as well. Or create another `ProjectableNestedFieldsTableSource` with the same method. > Add HBaseTableSource > > > Key: FLINK-2168 > URL: https://issues.apache.org/jira/browse/FLINK-2168 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: ramkrishna.s.vasudevan >Priority: Minor > Labels: starter > > Add a {{HBaseTableSource}} to read data from a HBase table. The > {{HBaseTableSource}} should implement the {{ProjectableTableSource}} > (FLINK-3848) and {{FilterableTableSource}} (FLINK-3849) interfaces. > The implementation can be based on Flink's {{TableInputFormat}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] flink issue #3149: FLINK-2168 Add HBaseTableSource
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/3149 @fhueske I preffer a better API with nesting. We can extend `ProjectableTableSource` with a method like this ```scala def projectNestedFields(fields: Array[String]): ProjectableTableSource[T] ``` with default realisation. override the method in HBase table source and while pushing projection down extract `fieldAccessor`s from `inputRex` and push them into as well. Or create another `ProjectableNestedFieldsTableSource` with the same method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Comment Edited] (FLINK-5353) Elasticsearch Sink loses well-formed documents when there are malformed documents
[ https://issues.apache.org/jira/browse/FLINK-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848109#comment-15848109 ] Tzu-Li (Gordon) Tai edited comment on FLINK-5353 at 2/1/17 8:12 AM: Ok, then I think the actual cause for the well-formed documents to be missing also isn't because the {{BulkProcessor}} is dropping them due to the malformed documents, but simply because at-least-once isn't properly supported in the ES connector yet (see FLINK-5487). Any documents that were buffered in the {{BulkProcessor}} at the time of failure simply could not be recovered. With the suggested {{FailedActionRequestHandler}}, you can simply just drop the malformed document or re-process it if you want to. was (Author: tzulitai): Ok, then I think the actual cause for the well-formed documents to be missing also isn't because the {{BulkProcessor}} is dropping them just because of the malformed documents, but simply because at-least-once isn't properly supported in the ES connector yet (see FLINK-5487). With the suggested {{FailedActionRequestHandler}}, you can simply just drop the malformed document or re-process it if you want to. > Elasticsearch Sink loses well-formed documents when there are malformed > documents > - > > Key: FLINK-5353 > URL: https://issues.apache.org/jira/browse/FLINK-5353 > Project: Flink > Issue Type: Bug > Components: Batch Connectors and Input/Output Formats >Affects Versions: 1.1.3 >Reporter: Flavio Pompermaier > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5353) Elasticsearch Sink loses well-formed documents when there are malformed documents
[ https://issues.apache.org/jira/browse/FLINK-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848109#comment-15848109 ] Tzu-Li (Gordon) Tai commented on FLINK-5353: Ok, then I think the actual cause for the well-formed documents to be missing also isn't because the {{BulkProcessor}} is dropping them just because of the malformed documents, but simply because at-least-once isn't properly supported in the ES connector yet (see FLINK-5487). With the suggested {{FailedActionRequestHandler}}, you can simply just drop the malformed document or re-process it if you want to. > Elasticsearch Sink loses well-formed documents when there are malformed > documents > - > > Key: FLINK-5353 > URL: https://issues.apache.org/jira/browse/FLINK-5353 > Project: Flink > Issue Type: Bug > Components: Batch Connectors and Input/Output Formats >Affects Versions: 1.1.3 >Reporter: Flavio Pompermaier > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5353) Elasticsearch Sink loses well-formed documents when there are malformed documents
[ https://issues.apache.org/jira/browse/FLINK-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848105#comment-15848105 ] Flavio Pompermaier commented on FLINK-5353: --- My previous PR was https://github.com/apache/flink/pull/2790. The error was thrown in afterBulk and hasFailure was set to true. Then close() was throwing a RunTimeException also for a single malformed document and the entire job used to fail also for a single malformed document. Maybe that PR could be useful to test the error. Unfortunately right now I don't have much time to help on this :( I'd like to avoid to set {{ignore_malformed}} in my indices. Maybe it could also be usefult to add an accumulator that keeps track of documents that weren't indexed because malformed. I like the idea of a {{FailedActionRequestHandler}} as suggested in #5122 as long as it address this issue properly > Elasticsearch Sink loses well-formed documents when there are malformed > documents > - > > Key: FLINK-5353 > URL: https://issues.apache.org/jira/browse/FLINK-5353 > Project: Flink > Issue Type: Bug > Components: Batch Connectors and Input/Output Formats >Affects Versions: 1.1.3 >Reporter: Flavio Pompermaier > -- This message was sent by Atlassian JIRA (v6.3.15#6346)