[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558199#comment-16558199
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

aljoscha closed pull request #6377: [FLINK-8981] Add end-to-end test for 
running on YARN with Kerberos
URL: https://github.com/apache/flink/pull/6377
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/flink-end-to-end-tests/run-nightly-tests.sh 
b/flink-end-to-end-tests/run-nightly-tests.sh
index dc8424f25ec..db75ae89324 100755
--- a/flink-end-to-end-tests/run-nightly-tests.sh
+++ b/flink-end-to-end-tests/run-nightly-tests.sh
@@ -18,7 +18,7 @@
 

 
 END_TO_END_DIR="`dirname \"$0\"`" # relative
-END_TO_END_DIR="`( cd \"$END_TO_END_DIR\" && pwd )`" # absolutized and 
normalized
+END_TO_END_DIR="`( cd \"$END_TO_END_DIR\" && pwd -P)`" # absolutized and 
normalized
 if [ -z "$END_TO_END_DIR" ] ; then
 # error; for some reason, the path is not accessible
 # to the script (e.g. permissions re-evaled after suid)
@@ -34,7 +34,7 @@ fi
 
 source "${END_TO_END_DIR}/test-scripts/test-runner-common.sh"
 
-FLINK_DIR="`( cd \"$FLINK_DIR\" && pwd )`" # absolutized and normalized
+FLINK_DIR="`( cd \"$FLINK_DIR\" && pwd -P)`" # absolutized and normalized
 
 echo "flink-end-to-end-test directory: $END_TO_END_DIR"
 echo "Flink distribution directory: $FLINK_DIR"
@@ -105,5 +105,7 @@ run_test "Avro Confluent Schema Registry nightly end-to-end 
test" "$END_TO_END_D
 run_test "State TTL Heap backend end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_stream_state_ttl.sh file"
 run_test "State TTL RocksDb backend end-to-end test" 
"$END_TO_END_DIR/test-scripts/test_stream_state_ttl.sh rocks"
 
+run_test "Running Kerberized YARN on Docker test " 
"$END_TO_END_DIR/test-scripts/test_yarn_kerberos_docker.sh"
+
 printf "\n[PASS] All tests passed\n"
 exit 0
diff --git a/flink-end-to-end-tests/run-pre-commit-tests.sh 
b/flink-end-to-end-tests/run-pre-commit-tests.sh
index 6355fd0f2aa..7b9777c43f3 100755
--- a/flink-end-to-end-tests/run-pre-commit-tests.sh
+++ b/flink-end-to-end-tests/run-pre-commit-tests.sh
@@ -18,7 +18,7 @@
 

 
 END_TO_END_DIR="`dirname \"$0\"`" # relative
-END_TO_END_DIR="`( cd \"$END_TO_END_DIR\" && pwd )`" # absolutized and 
normalized
+END_TO_END_DIR="`( cd \"$END_TO_END_DIR\" && pwd -P )`" # absolutized and 
normalized
 if [ -z "$END_TO_END_DIR" ] ; then
 # error; for some reason, the path is not accessible
 # to the script (e.g. permissions re-evaled after suid)
@@ -34,7 +34,7 @@ fi
 
 source ${END_TO_END_DIR}/test-scripts/test-runner-common.sh
 
-FLINK_DIR="`( cd \"$FLINK_DIR\" && pwd )`" # absolutized and normalized
+FLINK_DIR="`( cd \"$FLINK_DIR\" && pwd -P)`" # absolutized and normalized
 
 echo "flink-end-to-end-test directory: $END_TO_END_DIR"
 echo "Flink distribution directory: $FLINK_DIR"
diff --git a/flink-end-to-end-tests/run-single-test.sh 
b/flink-end-to-end-tests/run-single-test.sh
index 86b313d757f..833a78a5af3 100755
--- a/flink-end-to-end-tests/run-single-test.sh
+++ b/flink-end-to-end-tests/run-single-test.sh
@@ -26,7 +26,7 @@ if [ $# -eq 0 ]; then
 fi
 
 END_TO_END_DIR="`dirname \"$0\"`" # relative
-END_TO_END_DIR="`( cd \"$END_TO_END_DIR\" && pwd )`" # absolutized and 
normalized
+END_TO_END_DIR="`( cd \"$END_TO_END_DIR\" && pwd -P)`" # absolutized and 
normalized
 if [ -z "$END_TO_END_DIR" ] ; then
 # error; for some reason, the path is not accessible
 # to the script (e.g. permissions re-evaled after suid)
@@ -42,7 +42,7 @@ fi
 
 source "${END_TO_END_DIR}/test-scripts/test-runner-common.sh"
 
-FLINK_DIR="`( cd \"$FLINK_DIR\" && pwd )`" # absolutized and normalized
+FLINK_DIR="`( cd \"$FLINK_DIR\" && pwd -P )`" # absolutized and normalized
 
 echo "flink-end-to-end-test directory: $END_TO_END_DIR"
 echo "Flink distribution directory: $FLINK_DIR"
diff --git a/flink-end-to-end-tests/test-scripts/common.sh 
b/flink-end-to-end-tests/test-scripts/common.sh
index f4563cc3ea4..621db11e824 100644
--- a/flink-end-to-end-tests/test-scripts/common.sh
+++ b/flink-end-to-end-tests/test-scripts/common.sh
@@ -37,10 +37,10 @@ export EXIT_CODE=0
 echo "Flink dist directory: $FLINK_DIR"
 
 USE_SSL=OFF # set via set_conf_ssl(), reset via revert_default_config()
-TEST_ROOT=`pwd`
+TEST_ROOT=`pwd -P`
 TEST_INFRA_DIR="$END_TO_END_DIR/test-scripts/"
 cd $TEST_INFRA_DIR
-TEST_INFRA_DIR=`pwd`
+TEST_INFRA_DIR=`pwd -P`
 cd $TEST_ROOT
 
 function print_mem_use_osx {
diff --git 
a/flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfil

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558052#comment-16558052
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

aljoscha commented on issue #6377: [FLINK-8981] Add end-to-end test for running 
on YARN with Kerberos
URL: https://github.com/apache/flink/pull/6377#issuecomment-408019351
 
 
   I found the last cause of test flakiness and successfully ran this about 30 
times on `flink-ci`, will merge now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552526#comment-16552526
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/6377
  
@zentol & @dawidwys I think I addressed all of your comments


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552522#comment-16552522
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204330950
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml
 ---
@@ -0,0 +1,87 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+version: '3.5'
+
+networks:
+  docker-hadoop-cluster-network:
+driver: bridge
+name: docker-hadoop-cluster-network
+
+services:
+  kdc:
+container_name: "kdc"
+hostname: kdc.kerberos.com
+image: sequenceiq/kerberos
+networks:
+  - docker-hadoop-cluster-network
+environment:
+  REALM: EXAMPLE.COM
+  DOMAIN_REALM: kdc.kerberos.com
+
+  master:
+image: 
${DOCKER_HADOOP_IMAGE_NAME:-flink/docker-hadoop-secure-cluster:latest}
+command: master
+depends_on:
+  - kdc
+ports:
+  - "50070:50070"
--- End diff --

This was because the setup was meant to be accessible for more generic use 
and access from outside. I'm removing it.


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552521#comment-16552521
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204329978
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml
 ---
@@ -0,0 +1,87 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+version: '3.5'
+
+networks:
+  docker-hadoop-cluster-network:
+driver: bridge
+name: docker-hadoop-cluster-network
+
+services:
+  kdc:
+container_name: "kdc"
+hostname: kdc.kerberos.com
+image: sequenceiq/kerberos
+networks:
+  - docker-hadoop-cluster-network
+environment:
+  REALM: EXAMPLE.COM
+  DOMAIN_REALM: kdc.kerberos.com
+
+  master:
+image: 
${DOCKER_HADOOP_IMAGE_NAME:-flink/docker-hadoop-secure-cluster:latest}
+command: master
+depends_on:
+  - kdc
+ports:
+  - "50070:50070"
+  - "50470:50470"
+  - "8088:8088"
+  - "19888:19888"
+  - "8188:8188"
+container_name: "master"
+hostname: master.docker-hadoop-cluster-network
+networks:
+  - docker-hadoop-cluster-network
+environment:
+  KRB_REALM: EXAMPLE.COM
+  DOMAIN_REALM: kdc.kerberos.com
+
+  slave1:
--- End diff --

I tried this at the very beginning but this doesn't work because the slaves 
need well formed hostnames for the Kerberos setup to work (it's tricky with the 
Kerberos principal names). That's why I did it like this. I also don't like it 


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552509#comment-16552509
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204328745
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/README.md ---
@@ -0,0 +1,118 @@
+# Apache Hadoop Docker image with Kerberos enabled
+
+This image is modified version of Knappek/docker-hadoop-secure
+ * Knappek/docker-hadoop-secure 

+
+With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+ * Lewuathe/docker-hadoop-cluster 

+
+And a lot of added stuff for making this an actual, properly configured, 
kerberized cluster with proper user/permissions structure.
+
+Versions
+
+
+* JDK8
+* Hadoop 2.8.3
+
+Default Environment Variables
+-
+
+| Name | Value | Description |
+|  |   |  |
+| `KRB_REALM` | `EXAMPLE.COM` | The Kerberos Realm, more information 
[here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#)
 |
+| `DOMAIN_REALM` | `example.com` | The Kerberos Domain Realm, more 
information 
[here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#)
 |
+| `KERBEROS_ADMIN` | `admin/admin` | The KDC admin user |
+| `KERBEROS_ADMIN_PASSWORD` | `admin` | The KDC admin password |
+
+You can simply define these variables in the `docker-compose.yml`.
+
+Run image
+-
+
+Clone the [Github 
project](https://github.com/aljoscha/docker-hadoop-secure-cluster) and run
+
+```
+docker-compose up
+```
+
+Usage
+-
+
+Get the container name with `docker ps` and login to the container with
+
+```
+docker exec -it  /bin/bash
+```
+
+
+To obtain a Kerberos ticket, execute
+
+```
+kinit -kt /home/hadoop-user/hadoop-user.keytab hadoop-user
+```
+
+Afterwards you can use `hdfs` CLI like
+
+```
+hdfs dfs -ls /
+```
+
+
+Known issues
+
+
+### Unable to obtain Kerberos password
+
+ Error
+docker-compose up fails for the first time with the error
+
+```
+Login failure for nn/hadoop.docker@example.com from keytab 
/etc/security/keytabs/nn.service.keytab: 
javax.security.auth.login.LoginException: Unable to obtain password from user
+```
+
+ Solution
+
+Stop the containers with `docker-compose down` and start again with 
`docker-compose up -d`.
+
+
+### JDK 8
+
+Make sure you use download a JDK version that is still available. Old 
versions can be deprecated by Oracle and thus the download link won't be able 
anymore.
+
+Get the latest JDK8 Download URL with
+
+```
+curl -s https://lv.binarybabel.org/catalog-api/java/jdk8.json
+```
+
+### Java Keystore
+
+If the Keystroe has been expired, then create a new `keystore.jks`:
--- End diff --

Yes, I rather meant if the expiring of the keystore might be a problem. 
Could we create the keystore in test?

What is the expiry time for the keystore you use? Maybe setting it to some 
big number will be enough, but I think the default (365 days) might cause some 
troubles.


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552506#comment-16552506
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204327765
  
--- Diff: flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh 
---
@@ -0,0 +1,104 @@
+#!/usr/bin/env bash

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+set -o pipefail
+
+source "$(dirname "$0")"/common.sh
+
+FLINK_TARBALL_DIR=$TEST_DATA_DIR
+FLINK_TARBALL=flink.tar.gz
+FLINK_DIRNAME=$(basename $FLINK_DIR)
+
+echo "Flink Tarball directory $FLINK_TARBALL_DIR"
+echo "Flink tarball filename $FLINK_TARBALL"
+echo "Flink distribution directory name $FLINK_DIRNAME"
+echo "End-to-end directory $END_TO_END_DIR"
+docker --version
+docker-compose --version
+
+mkdir -p $FLINK_TARBALL_DIR
+tar czf $FLINK_TARBALL_DIR/$FLINK_TARBALL -C $(dirname $FLINK_DIR) .
+
+echo "Building Hadoop Docker container"
+until docker build -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/Dockerfile -t 
flink/docker-hadoop-secure-cluster:latest 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/; do
+# with all the downloading and ubuntu updating a lot of flakiness can 
happen, make sure
+# we don't immediately fail
+echo "Something went wrong while building the Docker image, retrying 
..."
+sleep 2
+done
+
+echo "Starting Hadoop cluster"
+docker-compose -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml up 
-d
+
+# make sure we stop our cluster at the end
+function cluster_shutdown {
+  # don't call ourselves again for another signal interruption
+  trap "exit -1" INT
+  # don't call ourselves again for normal exit
+  trap "" EXIT
+
+  docker-compose -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml 
down
+  rm $FLINK_TARBALL_DIR/$FLINK_TARBALL
+}
+trap cluster_shutdown INT
+trap cluster_shutdown EXIT
+
+until docker cp $FLINK_TARBALL_DIR/$FLINK_TARBALL 
master:/home/hadoop-user/; do
--- End diff --

I think if we add it as one of the last steps of the Dockerfile it wouldn't 
make a difference in build time as all previous layers would be cached anyway. 
At the same time if we move it to the Dockerfile we will no longer need the 
loop.


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552504#comment-16552504
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204327419
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/bootstrap.sh 
---
@@ -0,0 +1,121 @@
+#!/bin/bash

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+
+: ${HADOOP_PREFIX:=/usr/local/hadoop}
+
+$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
+
+rm /tmp/*.pid
+
+# installing libraries if any - (resource urls added comma separated to 
the ACP system variable)
+cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == 
$cp; curl -LO $cp ; done; cd -
+
+# kerberos client
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" /etc/krb5.conf
+sed -i "s/example.com/${DOMAIN_REALM}/g" /etc/krb5.conf
+
+# update config files
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/core-site.xml
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/core-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/core-site.xml
+
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
+
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/mapred-site.xml
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/mapred-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/mapred-site.xml
+
+sed -i 
"s#/usr/local/hadoop/bin/container-executor#${NM_CONTAINER_EXECUTOR_PATH}#g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+
+# create namenode kerberos principal and keytab
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey hdfs/$(hostname -f)@${KRB_REALM}"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey mapred/$(hostname -f)@${KRB_REALM}"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey yarn/$(hostname -f)@${KRB_REALM}"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey HTTP/$(hostname -f)@${KRB_REALM}"
+
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k 
hdfs.keytab hdfs/$(hostname -f) HTTP/$(hostname -f)"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k 
mapred.keytab mapred/$(hostname -f) HTTP/$(hostname -f)"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k 
yarn.keytab yarn/$(hostname -f) HTTP/$(hostname -f)"
+
+mkdir -p ${KEYTAB_DIR}
+mv hdfs.keytab ${KEYTAB_DIR}
+mv mapred.keytab ${KEYTAB_DIR}
+mv yarn.keytab ${KEYTAB_DIR}
+chmod 400 ${KEYTAB_DIR}/hdfs.keytab
+chmod 400 ${KEYTAB_DIR}/mapred.keytab
+chmod 400 ${KEYTAB_DIR}/yarn.keytab
+chown hdfs:hadoop ${KEYTAB_DIR}/hdfs.keytab
+chown mapred:hadoop ${KEYTAB_DIR}/mapred.keytab
+chown yarn:hadoop ${KEYTAB_DIR}/yarn.keytab
+
+service ssh start
+
+if [ "$1" == "--help" -o "$1" == "-h" ]; then
+echo "Usage: $(basename $0) (master|worker)"
+exit 0
+elif [ "$1" == "master" ]; then
+yes| sudo -E -u hdfs $HADOOP_PREFIX/bin/hdfs namenode -format
+
+nohup sudo -E -u hdfs $HADOOP_PREFIX/bin/hdfs namenode 2>> 
/var/log/hadoo

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552502#comment-16552502
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204327123
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/README.md ---
@@ -0,0 +1,118 @@
+# Apache Hadoop Docker image with Kerberos enabled
+
+This image is modified version of Knappek/docker-hadoop-secure
+ * Knappek/docker-hadoop-secure 

+
+With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+ * Lewuathe/docker-hadoop-cluster 

+
+And a lot of added stuff for making this an actual, properly configured, 
kerberized cluster with proper user/permissions structure.
+
+Versions
+
+
+* JDK8
+* Hadoop 2.8.3
+
+Default Environment Variables
+-
+
+| Name | Value | Description |
+|  |   |  |
+| `KRB_REALM` | `EXAMPLE.COM` | The Kerberos Realm, more information 
[here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#)
 |
+| `DOMAIN_REALM` | `example.com` | The Kerberos Domain Realm, more 
information 
[here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#)
 |
+| `KERBEROS_ADMIN` | `admin/admin` | The KDC admin user |
+| `KERBEROS_ADMIN_PASSWORD` | `admin` | The KDC admin password |
+
+You can simply define these variables in the `docker-compose.yml`.
+
+Run image
+-
+
+Clone the [Github 
project](https://github.com/aljoscha/docker-hadoop-secure-cluster) and run
+
+```
+docker-compose up
+```
+
+Usage
+-
+
+Get the container name with `docker ps` and login to the container with
+
+```
+docker exec -it  /bin/bash
+```
+
+
+To obtain a Kerberos ticket, execute
+
+```
+kinit -kt /home/hadoop-user/hadoop-user.keytab hadoop-user
+```
+
+Afterwards you can use `hdfs` CLI like
+
+```
+hdfs dfs -ls /
+```
+
+
+Known issues
+
+
+### Unable to obtain Kerberos password
+
+ Error
+docker-compose up fails for the first time with the error
+
+```
+Login failure for nn/hadoop.docker@example.com from keytab 
/etc/security/keytabs/nn.service.keytab: 
javax.security.auth.login.LoginException: Unable to obtain password from user
+```
+
+ Solution
+
+Stop the containers with `docker-compose down` and start again with 
`docker-compose up -d`.
+
+
+### JDK 8
+
+Make sure you use download a JDK version that is still available. Old 
versions can be deprecated by Oracle and thus the download link won't be able 
anymore.
+
+Get the latest JDK8 Download URL with
+
+```
+curl -s https://lv.binarybabel.org/catalog-api/java/jdk8.json
+```
+
+### Java Keystore
+
+If the Keystroe has been expired, then create a new `keystore.jks`:
--- End diff --

fixing the typo but we need the keystore for the SSL setup, which we seem 
to need for the Kerberos setup



> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552489#comment-16552489
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r20431
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/bootstrap.sh 
---
@@ -0,0 +1,121 @@
+#!/bin/bash

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+
+: ${HADOOP_PREFIX:=/usr/local/hadoop}
+
+$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
+
+rm /tmp/*.pid
+
+# installing libraries if any - (resource urls added comma separated to 
the ACP system variable)
+cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == 
$cp; curl -LO $cp ; done; cd -
+
+# kerberos client
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" /etc/krb5.conf
+sed -i "s/example.com/${DOMAIN_REALM}/g" /etc/krb5.conf
+
+# update config files
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/core-site.xml
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/core-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/core-site.xml
+
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
+
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/mapred-site.xml
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/mapred-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/mapred-site.xml
+
+sed -i 
"s#/usr/local/hadoop/bin/container-executor#${NM_CONTAINER_EXECUTOR_PATH}#g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+
+# create namenode kerberos principal and keytab
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey hdfs/$(hostname -f)@${KRB_REALM}"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey mapred/$(hostname -f)@${KRB_REALM}"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey yarn/$(hostname -f)@${KRB_REALM}"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey HTTP/$(hostname -f)@${KRB_REALM}"
+
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k 
hdfs.keytab hdfs/$(hostname -f) HTTP/$(hostname -f)"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k 
mapred.keytab mapred/$(hostname -f) HTTP/$(hostname -f)"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k 
yarn.keytab yarn/$(hostname -f) HTTP/$(hostname -f)"
+
+mkdir -p ${KEYTAB_DIR}
+mv hdfs.keytab ${KEYTAB_DIR}
+mv mapred.keytab ${KEYTAB_DIR}
+mv yarn.keytab ${KEYTAB_DIR}
+chmod 400 ${KEYTAB_DIR}/hdfs.keytab
+chmod 400 ${KEYTAB_DIR}/mapred.keytab
+chmod 400 ${KEYTAB_DIR}/yarn.keytab
+chown hdfs:hadoop ${KEYTAB_DIR}/hdfs.keytab
+chown mapred:hadoop ${KEYTAB_DIR}/mapred.keytab
+chown yarn:hadoop ${KEYTAB_DIR}/yarn.keytab
+
+service ssh start
--- End diff --

from a quick search it's not easily possible: 
https://stackoverflow.com/questions/22886470/start-sshd-automatically-with-docker-container


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> 

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552474#comment-16552474
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204315760
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
--- End diff --

removing


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552473#comment-16552473
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204315672
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
+RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync 
unzip
+
+# Kerberos client
+RUN apt-get install krb5-user -y
+RUN mkdir -p /var/log/kerberos
+RUN touch /var/log/kerberos/kadmind.log
+
+# passwordless ssh
+RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key 
/root/.ssh/id_rsa
+RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
+RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
+
+# java
+RUN mkdir -p /usr/java/default && \
+ curl -Ls 
'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz'
 -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
+ tar --strip-components=1 -xz -C /usr/java/default/
+
+ENV JAVA_HOME /usr/java/default
+ENV PATH $PATH:$JAVA_HOME/bin
+
+RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 
'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
+RUN unzip jce_policy-8.zip
+RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar 
/UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
+
+ENV HADOOP_VERSION=2.8.4
+
+# ENV HADOOP_URL 
https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
+ENV HADOOP_URL 
http://archive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
+RUN set -x \
+&& curl -fSL "$HADOOP_URL" -o /tmp/hadoop.tar.gz \
+&& tar -xf /tmp/hadoop.tar.gz -C /usr/local/ \
+&& rm /tmp/hadoop.tar.gz*
+
+WORKDIR /usr/local
+RUN ln -s /usr/local/hadoop-${HADOOP_VERSION} /usr/local/hadoop
+RUN chown root:root -R /usr/local/hadoop-${HADOOP_VERSION}/
+RUN chown root:root -R /usr/local/hadoop/
+RUN chown root:yarn /usr/local/hadoop/bin/container-executor
+RUN chmod 6050 /usr/local/hadoop/bin/container-executor
+RUN mkdir -p /hadoop-data/nm-local-dirs
+RUN mkdir -p /hadoop-data/nm-log-dirs
+RUN chown yarn:yarn /hadoop-data
+RUN chown yarn:yarn /hadoop-data/nm-local-dirs
+RUN chown yarn:yarn /hadoop-data/nm-log-dirs
+RUN chmod 755 /hadoop-data
+RUN chmod 755 /hadoop-data/nm-local-dirs
+RUN chmod 755 /hadoop-data/nm-log-dirs
+
+
+ENV HAD

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552470#comment-16552470
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204314392
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
+RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync 
unzip
+
+# Kerberos client
+RUN apt-get install krb5-user -y
+RUN mkdir -p /var/log/kerberos
+RUN touch /var/log/kerberos/kadmind.log
+
+# passwordless ssh
+RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key 
/root/.ssh/id_rsa
+RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
+RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
+
+# java
+RUN mkdir -p /usr/java/default && \
--- End diff --

I think I possibly could but I don't know exactly what else I then would 
need to setup to make the whole Hadoop thing work


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552468#comment-16552468
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204314197
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
+RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync 
unzip
+
+# Kerberos client
+RUN apt-get install krb5-user -y
+RUN mkdir -p /var/log/kerberos
+RUN touch /var/log/kerberos/kadmind.log
+
+# passwordless ssh
+RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key 
/root/.ssh/id_rsa
+RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
+RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
+
+# java
+RUN mkdir -p /usr/java/default && \
+ curl -Ls 
'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz'
 -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
+ tar --strip-components=1 -xz -C /usr/java/default/
+
+ENV JAVA_HOME /usr/java/default
+ENV PATH $PATH:$JAVA_HOME/bin
+
+RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 
'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
+RUN unzip jce_policy-8.zip
+RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar 
/UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
+
+ENV HADOOP_VERSION=2.8.4
+
+# ENV HADOOP_URL 
https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
+ENV HADOOP_URL 
http://archive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
+RUN set -x \
+&& curl -fSL "$HADOOP_URL" -o /tmp/hadoop.tar.gz \
+&& tar -xf /tmp/hadoop.tar.gz -C /usr/local/ \
+&& rm /tmp/hadoop.tar.gz*
+
+WORKDIR /usr/local
+RUN ln -s /usr/local/hadoop-${HADOOP_VERSION} /usr/local/hadoop
+RUN chown root:root -R /usr/local/hadoop-${HADOOP_VERSION}/
+RUN chown root:root -R /usr/local/hadoop/
+RUN chown root:yarn /usr/local/hadoop/bin/container-executor
+RUN chmod 6050 /usr/local/hadoop/bin/container-executor
+RUN mkdir -p /hadoop-data/nm-local-dirs
+RUN mkdir -p /hadoop-data/nm-log-dirs
+RUN chown yarn:yarn /hadoop-data
+RUN chown yarn:yarn /hadoop-data/nm-local-dirs
+RUN chown yarn:yarn /hadoop-data/nm-log-dirs
+RUN chmod 755 /hadoop-data
+RUN chmod 755 /hadoop-data/nm-local-dirs
+RUN chmod 755 /hadoop-data/nm-log-dirs
+
+
+ENV HAD

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552458#comment-16552458
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204312035
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
--- End diff --

fixing


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552447#comment-16552447
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204308017
  
--- Diff: flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh 
---
@@ -0,0 +1,104 @@
+#!/usr/bin/env bash

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+set -o pipefail
+
+source "$(dirname "$0")"/common.sh
+
+FLINK_TARBALL_DIR=$TEST_DATA_DIR
+FLINK_TARBALL=flink.tar.gz
+FLINK_DIRNAME=$(basename $FLINK_DIR)
+
+echo "Flink Tarball directory $FLINK_TARBALL_DIR"
+echo "Flink tarball filename $FLINK_TARBALL"
+echo "Flink distribution directory name $FLINK_DIRNAME"
+echo "End-to-end directory $END_TO_END_DIR"
+docker --version
+docker-compose --version
+
+mkdir -p $FLINK_TARBALL_DIR
+tar czf $FLINK_TARBALL_DIR/$FLINK_TARBALL -C $(dirname $FLINK_DIR) .
+
+echo "Building Hadoop Docker container"
+until docker build -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/Dockerfile -t 
flink/docker-hadoop-secure-cluster:latest 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/; do
+# with all the downloading and ubuntu updating a lot of flakiness can 
happen, make sure
+# we don't immediately fail
+echo "Something went wrong while building the Docker image, retrying 
..."
+sleep 2
+done
+
+echo "Starting Hadoop cluster"
+docker-compose -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml up 
-d
+
+# make sure we stop our cluster at the end
+function cluster_shutdown {
+  # don't call ourselves again for another signal interruption
+  trap "exit -1" INT
+  # don't call ourselves again for normal exit
+  trap "" EXIT
+
+  docker-compose -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml 
down
+  rm $FLINK_TARBALL_DIR/$FLINK_TARBALL
+}
+trap cluster_shutdown INT
+trap cluster_shutdown EXIT
+
+until docker cp $FLINK_TARBALL_DIR/$FLINK_TARBALL 
master:/home/hadoop-user/; do
--- End diff --

I did it like this so that rebuilding Flink does not require building the 
docker image. I know I could do it as one of the last steps but with repeatedly 
running the test locally I think it's still easier this way. WDYT?


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552444#comment-16552444
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r20430
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml
 ---
@@ -0,0 +1,87 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+version: '3.5'
+
+networks:
+  docker-hadoop-cluster-network:
--- End diff --

apparently we don't need it, removing


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552445#comment-16552445
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204307793
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
--- End diff --

will do


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552438#comment-16552438
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204305783
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
+RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync 
unzip
+
+# Kerberos client
+RUN apt-get install krb5-user -y
+RUN mkdir -p /var/log/kerberos
+RUN touch /var/log/kerberos/kadmind.log
+
+# passwordless ssh
+RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key 
/root/.ssh/id_rsa
+RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
+RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
+
+# java
+RUN mkdir -p /usr/java/default && \
+ curl -Ls 
'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz'
 -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
+ tar --strip-components=1 -xz -C /usr/java/default/
+
+ENV JAVA_HOME /usr/java/default
+ENV PATH $PATH:$JAVA_HOME/bin
+
+RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 
'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
+RUN unzip jce_policy-8.zip
+RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar 
/UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
+
+ENV HADOOP_VERSION=2.8.4
--- End diff --

and I'm running the nightly tests using the `withoutHadoop` profile


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/se

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552437#comment-16552437
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204305537
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
+RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync 
unzip
+
+# Kerberos client
+RUN apt-get install krb5-user -y
+RUN mkdir -p /var/log/kerberos
+RUN touch /var/log/kerberos/kadmind.log
+
+# passwordless ssh
+RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key 
/root/.ssh/id_rsa
+RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
+RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
+
+# java
+RUN mkdir -p /usr/java/default && \
+ curl -Ls 
'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz'
 -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
+ tar --strip-components=1 -xz -C /usr/java/default/
+
+ENV JAVA_HOME /usr/java/default
+ENV PATH $PATH:$JAVA_HOME/bin
+
+RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 
'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
+RUN unzip jce_policy-8.zip
+RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar 
/UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
+
+ENV HADOOP_VERSION=2.8.4
--- End diff --

I added a config option to the Dockerfile


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  f

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550972#comment-16550972
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/6377
  
I also ran the new version on `flink-ci`: 
https://travis-ci.org/aljoscha/flink-ci/builds/406269018


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550824#comment-16550824
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/6377
  
@zentol I addressed most of your comments. I now added a test in there that 
verifies the job fails if we don't set a keytab. I'm not running with different 
Hadoop. It might work but I'm basically setting up a hadoop cluster in docker 
and I don't know if this is similar enough (or exactly the same, for my 
purposes) between the versions.

@dawidwys Thanks for the thorough comments, I'll go through them next!


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550806#comment-16550806
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204057806
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml
 ---
@@ -0,0 +1,87 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+version: '3.5'
+
+networks:
+  docker-hadoop-cluster-network:
+driver: bridge
+name: docker-hadoop-cluster-network
+
+services:
+  kdc:
+container_name: "kdc"
+hostname: kdc.kerberos.com
+image: sequenceiq/kerberos
+networks:
+  - docker-hadoop-cluster-network
+environment:
+  REALM: EXAMPLE.COM
+  DOMAIN_REALM: kdc.kerberos.com
+
+  master:
+image: 
${DOCKER_HADOOP_IMAGE_NAME:-flink/docker-hadoop-secure-cluster:latest}
+command: master
+depends_on:
+  - kdc
+ports:
+  - "50070:50070"
+  - "50470:50470"
+  - "8088:8088"
+  - "19888:19888"
+  - "8188:8188"
+container_name: "master"
+hostname: master.docker-hadoop-cluster-network
+networks:
+  - docker-hadoop-cluster-network
+environment:
+  KRB_REALM: EXAMPLE.COM
+  DOMAIN_REALM: kdc.kerberos.com
+
+  slave1:
--- End diff --

Maybe create just one slave and just use `docker-compose scale`? You run 
flink from within container anyway. So It could all be uatomatical.


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550805#comment-16550805
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204057939
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml
 ---
@@ -0,0 +1,87 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+version: '3.5'
+
+networks:
+  docker-hadoop-cluster-network:
+driver: bridge
+name: docker-hadoop-cluster-network
+
+services:
+  kdc:
+container_name: "kdc"
+hostname: kdc.kerberos.com
+image: sequenceiq/kerberos
+networks:
+  - docker-hadoop-cluster-network
+environment:
+  REALM: EXAMPLE.COM
+  DOMAIN_REALM: kdc.kerberos.com
+
+  master:
+image: 
${DOCKER_HADOOP_IMAGE_NAME:-flink/docker-hadoop-secure-cluster:latest}
+command: master
+depends_on:
+  - kdc
+ports:
+  - "50070:50070"
--- End diff --

I think we do not need to expose ports to the host. We run the flink job 
from within container anyway.


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550802#comment-16550802
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204057245
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/bootstrap.sh 
---
@@ -0,0 +1,121 @@
+#!/bin/bash

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+
+: ${HADOOP_PREFIX:=/usr/local/hadoop}
+
+$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
+
+rm /tmp/*.pid
+
+# installing libraries if any - (resource urls added comma separated to 
the ACP system variable)
+cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == 
$cp; curl -LO $cp ; done; cd -
+
+# kerberos client
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" /etc/krb5.conf
+sed -i "s/example.com/${DOMAIN_REALM}/g" /etc/krb5.conf
+
+# update config files
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/core-site.xml
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/core-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/core-site.xml
+
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
+
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/mapred-site.xml
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/mapred-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/mapred-site.xml
+
+sed -i 
"s#/usr/local/hadoop/bin/container-executor#${NM_CONTAINER_EXECUTOR_PATH}#g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+
+# create namenode kerberos principal and keytab
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey hdfs/$(hostname -f)@${KRB_REALM}"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey mapred/$(hostname -f)@${KRB_REALM}"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey yarn/$(hostname -f)@${KRB_REALM}"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey HTTP/$(hostname -f)@${KRB_REALM}"
+
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k 
hdfs.keytab hdfs/$(hostname -f) HTTP/$(hostname -f)"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k 
mapred.keytab mapred/$(hostname -f) HTTP/$(hostname -f)"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k 
yarn.keytab yarn/$(hostname -f) HTTP/$(hostname -f)"
+
+mkdir -p ${KEYTAB_DIR}
+mv hdfs.keytab ${KEYTAB_DIR}
+mv mapred.keytab ${KEYTAB_DIR}
+mv yarn.keytab ${KEYTAB_DIR}
+chmod 400 ${KEYTAB_DIR}/hdfs.keytab
+chmod 400 ${KEYTAB_DIR}/mapred.keytab
+chmod 400 ${KEYTAB_DIR}/yarn.keytab
+chown hdfs:hadoop ${KEYTAB_DIR}/hdfs.keytab
+chown mapred:hadoop ${KEYTAB_DIR}/mapred.keytab
+chown yarn:hadoop ${KEYTAB_DIR}/yarn.keytab
+
+service ssh start
+
+if [ "$1" == "--help" -o "$1" == "-h" ]; then
+echo "Usage: $(basename $0) (master|worker)"
+exit 0
+elif [ "$1" == "master" ]; then
+yes| sudo -E -u hdfs $HADOOP_PREFIX/bin/hdfs namenode -format
+
+nohup sudo -E -u hdfs $HADOOP_PREFIX/bin/hdfs namenode 2>> 
/var/log/hadoo

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550701#comment-16550701
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204020995
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/bootstrap.sh 
---
@@ -0,0 +1,121 @@
+#!/bin/bash

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+
+: ${HADOOP_PREFIX:=/usr/local/hadoop}
+
+$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
+
+rm /tmp/*.pid
+
+# installing libraries if any - (resource urls added comma separated to 
the ACP system variable)
+cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == 
$cp; curl -LO $cp ; done; cd -
+
+# kerberos client
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" /etc/krb5.conf
+sed -i "s/example.com/${DOMAIN_REALM}/g" /etc/krb5.conf
+
+# update config files
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/core-site.xml
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/core-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/core-site.xml
+
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
+
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" 
$HADOOP_PREFIX/etc/hadoop/mapred-site.xml
+sed -i "s/HOSTNAME/$(hostname -f)/g" 
$HADOOP_PREFIX/etc/hadoop/mapred-site.xml
+sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" 
$HADOOP_PREFIX/etc/hadoop/mapred-site.xml
+
+sed -i 
"s#/usr/local/hadoop/bin/container-executor#${NM_CONTAINER_EXECUTOR_PATH}#g" 
$HADOOP_PREFIX/etc/hadoop/yarn-site.xml
+
+# create namenode kerberos principal and keytab
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey hdfs/$(hostname -f)@${KRB_REALM}"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey mapred/$(hostname -f)@${KRB_REALM}"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey yarn/$(hostname -f)@${KRB_REALM}"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc 
-randkey HTTP/$(hostname -f)@${KRB_REALM}"
+
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k 
hdfs.keytab hdfs/$(hostname -f) HTTP/$(hostname -f)"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k 
mapred.keytab mapred/$(hostname -f) HTTP/$(hostname -f)"
+kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k 
yarn.keytab yarn/$(hostname -f) HTTP/$(hostname -f)"
+
+mkdir -p ${KEYTAB_DIR}
+mv hdfs.keytab ${KEYTAB_DIR}
+mv mapred.keytab ${KEYTAB_DIR}
+mv yarn.keytab ${KEYTAB_DIR}
+chmod 400 ${KEYTAB_DIR}/hdfs.keytab
+chmod 400 ${KEYTAB_DIR}/mapred.keytab
+chmod 400 ${KEYTAB_DIR}/yarn.keytab
+chown hdfs:hadoop ${KEYTAB_DIR}/hdfs.keytab
+chown mapred:hadoop ${KEYTAB_DIR}/mapred.keytab
+chown yarn:hadoop ${KEYTAB_DIR}/yarn.keytab
+
+service ssh start
--- End diff --

Can we just make ssh start automatically in Dockerfile?


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Pro

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550699#comment-16550699
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204020749
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/README.md ---
@@ -0,0 +1,118 @@
+# Apache Hadoop Docker image with Kerberos enabled
+
+This image is modified version of Knappek/docker-hadoop-secure
+ * Knappek/docker-hadoop-secure 

+
+With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+ * Lewuathe/docker-hadoop-cluster 

+
+And a lot of added stuff for making this an actual, properly configured, 
kerberized cluster with proper user/permissions structure.
+
+Versions
+
+
+* JDK8
+* Hadoop 2.8.3
+
+Default Environment Variables
+-
+
+| Name | Value | Description |
+|  |   |  |
+| `KRB_REALM` | `EXAMPLE.COM` | The Kerberos Realm, more information 
[here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#)
 |
+| `DOMAIN_REALM` | `example.com` | The Kerberos Domain Realm, more 
information 
[here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#)
 |
+| `KERBEROS_ADMIN` | `admin/admin` | The KDC admin user |
+| `KERBEROS_ADMIN_PASSWORD` | `admin` | The KDC admin password |
+
+You can simply define these variables in the `docker-compose.yml`.
+
+Run image
+-
+
+Clone the [Github 
project](https://github.com/aljoscha/docker-hadoop-secure-cluster) and run
+
+```
+docker-compose up
+```
+
+Usage
+-
+
+Get the container name with `docker ps` and login to the container with
+
+```
+docker exec -it  /bin/bash
+```
+
+
+To obtain a Kerberos ticket, execute
+
+```
+kinit -kt /home/hadoop-user/hadoop-user.keytab hadoop-user
+```
+
+Afterwards you can use `hdfs` CLI like
+
+```
+hdfs dfs -ls /
+```
+
+
+Known issues
+
+
+### Unable to obtain Kerberos password
+
+ Error
+docker-compose up fails for the first time with the error
+
+```
+Login failure for nn/hadoop.docker@example.com from keytab 
/etc/security/keytabs/nn.service.keytab: 
javax.security.auth.login.LoginException: Unable to obtain password from user
+```
+
+ Solution
+
+Stop the containers with `docker-compose down` and start again with 
`docker-compose up -d`.
+
+
+### JDK 8
+
+Make sure you use download a JDK version that is still available. Old 
versions can be deprecated by Oracle and thus the download link won't be able 
anymore.
+
+Get the latest JDK8 Download URL with
+
+```
+curl -s https://lv.binarybabel.org/catalog-api/java/jdk8.json
+```
+
+### Java Keystore
+
+If the Keystroe has been expired, then create a new `keystore.jks`:
--- End diff --

Keystroe -> Keystore

Won't it be a problem in tests? Will the test start failing one day because 
of the keystore expired?


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550693#comment-16550693
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204019505
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
--- End diff --

I think commands in Dockerfile are by default executed as root. So this 
command is unnecessary.


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550689#comment-16550689
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204018916
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
+RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync 
unzip
+
+# Kerberos client
+RUN apt-get install krb5-user -y
+RUN mkdir -p /var/log/kerberos
+RUN touch /var/log/kerberos/kadmind.log
+
+# passwordless ssh
+RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key 
/root/.ssh/id_rsa
+RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
+RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
+
+# java
+RUN mkdir -p /usr/java/default && \
+ curl -Ls 
'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz'
 -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
+ tar --strip-components=1 -xz -C /usr/java/default/
+
+ENV JAVA_HOME /usr/java/default
+ENV PATH $PATH:$JAVA_HOME/bin
+
+RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 
'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
+RUN unzip jce_policy-8.zip
+RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar 
/UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
+
+ENV HADOOP_VERSION=2.8.4
+
+# ENV HADOOP_URL 
https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
+ENV HADOOP_URL 
http://archive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
+RUN set -x \
+&& curl -fSL "$HADOOP_URL" -o /tmp/hadoop.tar.gz \
+&& tar -xf /tmp/hadoop.tar.gz -C /usr/local/ \
+&& rm /tmp/hadoop.tar.gz*
+
+WORKDIR /usr/local
+RUN ln -s /usr/local/hadoop-${HADOOP_VERSION} /usr/local/hadoop
+RUN chown root:root -R /usr/local/hadoop-${HADOOP_VERSION}/
+RUN chown root:root -R /usr/local/hadoop/
+RUN chown root:yarn /usr/local/hadoop/bin/container-executor
+RUN chmod 6050 /usr/local/hadoop/bin/container-executor
+RUN mkdir -p /hadoop-data/nm-local-dirs
+RUN mkdir -p /hadoop-data/nm-log-dirs
+RUN chown yarn:yarn /hadoop-data
+RUN chown yarn:yarn /hadoop-data/nm-local-dirs
+RUN chown yarn:yarn /hadoop-data/nm-log-dirs
+RUN chmod 755 /hadoop-data
+RUN chmod 755 /hadoop-data/nm-local-dirs
+RUN chmod 755 /hadoop-data/nm-log-dirs
+
+
+ENV HAD

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550681#comment-16550681
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204017957
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
+RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync 
unzip
+
+# Kerberos client
+RUN apt-get install krb5-user -y
+RUN mkdir -p /var/log/kerberos
+RUN touch /var/log/kerberos/kadmind.log
+
+# passwordless ssh
+RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key 
/root/.ssh/id_rsa
+RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
+RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
+
+# java
+RUN mkdir -p /usr/java/default && \
+ curl -Ls 
'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz'
 -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
+ tar --strip-components=1 -xz -C /usr/java/default/
+
+ENV JAVA_HOME /usr/java/default
+ENV PATH $PATH:$JAVA_HOME/bin
+
+RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 
'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
+RUN unzip jce_policy-8.zip
+RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar 
/UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
+
+ENV HADOOP_VERSION=2.8.4
+
+# ENV HADOOP_URL 
https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
+ENV HADOOP_URL 
http://archive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
+RUN set -x \
+&& curl -fSL "$HADOOP_URL" -o /tmp/hadoop.tar.gz \
+&& tar -xf /tmp/hadoop.tar.gz -C /usr/local/ \
+&& rm /tmp/hadoop.tar.gz*
+
+WORKDIR /usr/local
+RUN ln -s /usr/local/hadoop-${HADOOP_VERSION} /usr/local/hadoop
+RUN chown root:root -R /usr/local/hadoop-${HADOOP_VERSION}/
+RUN chown root:root -R /usr/local/hadoop/
+RUN chown root:yarn /usr/local/hadoop/bin/container-executor
+RUN chmod 6050 /usr/local/hadoop/bin/container-executor
+RUN mkdir -p /hadoop-data/nm-local-dirs
+RUN mkdir -p /hadoop-data/nm-log-dirs
+RUN chown yarn:yarn /hadoop-data
+RUN chown yarn:yarn /hadoop-data/nm-local-dirs
+RUN chown yarn:yarn /hadoop-data/nm-log-dirs
+RUN chmod 755 /hadoop-data
+RUN chmod 755 /hadoop-data/nm-local-dirs
+RUN chmod 755 /hadoop-data/nm-log-dirs
+
+
+ENV HAD

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550680#comment-16550680
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204017611
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
+RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync 
unzip
+
+# Kerberos client
+RUN apt-get install krb5-user -y
+RUN mkdir -p /var/log/kerberos
+RUN touch /var/log/kerberos/kadmind.log
+
+# passwordless ssh
+RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key 
/root/.ssh/id_rsa
+RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
+RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
+
+# java
+RUN mkdir -p /usr/java/default && \
--- End diff --

Can't we use java image as the base image?


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550679#comment-16550679
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r204017355
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
--- End diff --

This is a Dockerfile anti-pattern that leads to some cacheing issues:

https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#run


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550612#comment-16550612
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203983391
  
--- Diff: flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh 
---
@@ -0,0 +1,104 @@
+#!/usr/bin/env bash

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+set -o pipefail
+
+source "$(dirname "$0")"/common.sh
+
+FLINK_TARBALL_DIR=$TEST_DATA_DIR
+FLINK_TARBALL=flink.tar.gz
+FLINK_DIRNAME=$(basename $FLINK_DIR)
+
+echo "Flink Tarball directory $FLINK_TARBALL_DIR"
+echo "Flink tarball filename $FLINK_TARBALL"
+echo "Flink distribution directory name $FLINK_DIRNAME"
+echo "End-to-end directory $END_TO_END_DIR"
+docker --version
+docker-compose --version
+
+mkdir -p $FLINK_TARBALL_DIR
+tar czf $FLINK_TARBALL_DIR/$FLINK_TARBALL -C $(dirname $FLINK_DIR) .
+
+echo "Building Hadoop Docker container"
+until docker build -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/Dockerfile -t 
flink/docker-hadoop-secure-cluster:latest 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/; do
+# with all the downloading and ubuntu updating a lot of flakiness can 
happen, make sure
+# we don't immediately fail
+echo "Something went wrong while building the Docker image, retrying 
..."
+sleep 2
+done
+
+echo "Starting Hadoop cluster"
+docker-compose -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml up 
-d
+
+# make sure we stop our cluster at the end
+function cluster_shutdown {
+  # don't call ourselves again for another signal interruption
+  trap "exit -1" INT
+  # don't call ourselves again for normal exit
+  trap "" EXIT
+
+  docker-compose -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml 
down
+  rm $FLINK_TARBALL_DIR/$FLINK_TARBALL
+}
+trap cluster_shutdown INT
+trap cluster_shutdown EXIT
+
+until docker cp $FLINK_TARBALL_DIR/$FLINK_TARBALL 
master:/home/hadoop-user/; do
--- End diff --

Can't we set it up during image build?


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550611#comment-16550611
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203981196
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
--- End diff --

Could we merge such blocks in a single command? It will create less layers 
which should decrease both building time and size of the image.


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550610#comment-16550610
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user dawidwys commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203982453
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml
 ---
@@ -0,0 +1,87 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+version: '3.5'
+
+networks:
+  docker-hadoop-cluster-network:
--- End diff --

Do we need bridged network?


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550609#comment-16550609
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203998139
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
+RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync 
unzip
+
+# Kerberos client
+RUN apt-get install krb5-user -y
+RUN mkdir -p /var/log/kerberos
+RUN touch /var/log/kerberos/kadmind.log
+
+# passwordless ssh
+RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key 
/root/.ssh/id_rsa
+RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
+RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
+
+# java
+RUN mkdir -p /usr/java/default && \
+ curl -Ls 
'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz'
 -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
+ tar --strip-components=1 -xz -C /usr/java/default/
+
+ENV JAVA_HOME /usr/java/default
+ENV PATH $PATH:$JAVA_HOME/bin
+
+RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 
'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
+RUN unzip jce_policy-8.zip
+RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar 
/UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
+
+ENV HADOOP_VERSION=2.8.4
--- End diff --

I agree, but for now we still have to ensure that the hadoop version in 
flink-dist matches, no?


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/fli

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550608#comment-16550608
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203997764
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/bootstrap.sh 
---
@@ -0,0 +1,121 @@
+#!/bin/bash

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+
+: ${HADOOP_PREFIX:=/usr/local/hadoop}
+
+$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
+
+rm /tmp/*.pid
+
+# installing libraries if any - (resource urls added comma separated to 
the ACP system variable)
+cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == 
$cp; curl -LO $cp ; done; cd -
+
+# kerberos client
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" /etc/krb5.conf
--- End diff --

yeah nvm, I doubt introducing a placeholder really fixes things :/


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550582#comment-16550582
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203990327
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/README.md ---
@@ -0,0 +1,118 @@
+# Apache Hadoop Docker image with Kerberos enabled
+
+This image is modified version of Knappek/docker-hadoop-secure
+ * Knappek/docker-hadoop-secure 

+
+With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+ * Lewuathe/docker-hadoop-cluster 

+
+And a lot of added stuff for making this an actual, properly configured, 
kerberized cluster with proper user/permissions structure.
+
+Versions
+
+
+* JDK8
+* Hadoop 2.8.3
+
+Default Environment Variables
+-
+
+| Name | Value | Description |
+|  |   |  |
+| `KRB_REALM` | `EXAMPLE.COM` | The Kerberos Realm, more information 
[here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#)
 |
+| `DOMAIN_REALM` | `example.com` | The Kerberos Domain Realm, more 
information 
[here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#)
 |
+| `KERBEROS_ADMIN` | `admin/admin` | The KDC admin user |
+| `KERBEROS_ADMIN_PASSWORD` | `admin` | The KDC admin password |
+
+You can simply define these variables in the `docker-compose.yml`.
+
+Run image
+-
+
+Clone the [Github 
project](https://github.com/aljoscha/docker-hadoop-secure-cluster) and run
--- End diff --

fixing


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550580#comment-16550580
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203990078
  
--- Diff: flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh 
---
@@ -0,0 +1,104 @@
+#!/usr/bin/env bash

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+set -o pipefail
+
+source "$(dirname "$0")"/common.sh
+
+FLINK_TARBALL_DIR=$TEST_DATA_DIR
+FLINK_TARBALL=flink.tar.gz
+FLINK_DIRNAME=$(basename $FLINK_DIR)
+
+echo "Flink Tarball directory $FLINK_TARBALL_DIR"
+echo "Flink tarball filename $FLINK_TARBALL"
+echo "Flink distribution directory name $FLINK_DIRNAME"
+echo "End-to-end directory $END_TO_END_DIR"
+docker --version
+docker-compose --version
+
+mkdir -p $FLINK_TARBALL_DIR
+tar czf $FLINK_TARBALL_DIR/$FLINK_TARBALL -C $(dirname $FLINK_DIR) .
+
+echo "Building Hadoop Docker container"
+until docker build -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/Dockerfile -t 
flink/docker-hadoop-secure-cluster:latest 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/; do
+# with all the downloading and ubuntu updating a lot of flakiness can 
happen, make sure
+# we don't immediately fail
+echo "Something went wrong while building the Docker image, retrying 
..."
+sleep 2
+done
+
+echo "Starting Hadoop cluster"
+docker-compose -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml up 
-d
+
+# make sure we stop our cluster at the end
+function cluster_shutdown {
+  # don't call ourselves again for another signal interruption
+  trap "exit -1" INT
+  # don't call ourselves again for normal exit
+  trap "" EXIT
+
+  docker-compose -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml 
down
+  rm $FLINK_TARBALL_DIR/$FLINK_TARBALL
+}
+trap cluster_shutdown INT
+trap cluster_shutdown EXIT
+
+until docker cp $FLINK_TARBALL_DIR/$FLINK_TARBALL 
master:/home/hadoop-user/; do
+# we're retrying this one because we don't know yet if the container 
is ready
+echo "Uploading Flink tarball to docker master failed, retrying ..."
+sleep 5
+done
+
+# now, at least the container is ready
+docker exec -it master bash -c "tar xzf /home/hadoop-user/$FLINK_TARBALL 
--directory /home/hadoop-user/"
+
+docker exec -it master bash -c "echo \"security.kerberos.login.keytab: 
/home/hadoop-user/hadoop-user.keytab\" >> 
/home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
+docker exec -it master bash -c "echo \"security.kerberos.login.principal: 
hadoop-user\" >> /home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
+
+echo "Flink config:"
+docker exec -it master bash -c "cat 
/home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
+
+# make the output path random, just in case it already exists, for example 
if we
+# had cached docker containers
+OUTPUT_PATH=hdfs:///user/hadoop-user/wc-out-$RANDOM
+
+# it's important to run this with higher parallelism, otherwise we might 
risk that
+# JM and TM are on the same YARN node and that we therefore don't test the 
keytab shipping
+until docker exec -it master bash -c "export HADOOP_CLASSPATH=\`hadoop 
classpath\` && /home/hadoop-user/$FLINK_DIRNAME/bin/flink run -m yarn-cluster 
-yn 3 -ys 1 -ytm 1200 -yjm 800 -p 3 
/home/hadoop-user/$FLINK_DIRNAME/examples/streaming/WordCount.jar --output 
$OUTPUT_PATH"; do
+echo "Running the Flink job failed, might be that the cluster is not 
ready yet, retrying ..."
--- End diff --

I'm afraid not, that's why ther

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550579#comment-16550579
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203989967
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/config/log4j.properties
 ---
@@ -0,0 +1,354 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+
+# Define some default values that can be overridden by system properties
+hadoop.root.logger=INFO,console
+hadoop.log.dir=.
+hadoop.log.file=hadoop.log
+
+# Define the root logger to the system property "hadoop.root.logger".
+log4j.rootLogger=${hadoop.root.logger}, EventCounter
+
+# Logging Threshold
+log4j.threshold=ALL
+
+# Null Appender
+log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
+
+#
+# Rolling File Appender - cap space usage at 5gb.
+#
+hadoop.log.maxfilesize=256MB
+hadoop.log.maxbackupindex=20
+log4j.appender.RFA=org.apache.log4j.RollingFileAppender
+log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
+
+log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize}
+log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex}
+
+log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
+
+# Pattern format: Date LogLevel LoggerName LogMessage
+log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
+# Debugging Pattern format
+#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} 
(%F:%M(%L)) - %m%n
+
+
+#
+# Daily Rolling File Appender
+#
+
+log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
+
+# Rollover at midnight
+log4j.appender.DRFA.DatePattern=.-MM-dd
+
+log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
+
+# Pattern format: Date LogLevel LoggerName LogMessage
+log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
+# Debugging Pattern format
+#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} 
(%F:%M(%L)) - %m%n
+
+
+#
+# console
+# Add "console" to rootlogger above if you want to use this
+#
+
+log4j.appender.console=org.apache.log4j.ConsoleAppender
+log4j.appender.console.target=System.err
+log4j.appender.console.layout=org.apache.log4j.PatternLayout
+log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p 
%c{2}: %m%n
+
+#
+# TaskLog Appender
+#
+
+#Default values
+hadoop.tasklog.taskid=null
+hadoop.tasklog.iscleanup=false
+hadoop.tasklog.noKeepSplits=4
+hadoop.tasklog.totalLogFileSize=100
+hadoop.tasklog.purgeLogSplits=true
+hadoop.tasklog.logsRetainHours=12
+
+log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
+log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
+log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
+log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
+
+log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
+log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
+
+#
+# HDFS block state change log from block manager
+#
+# Uncomment the following to log normal block state change
+# messages from BlockManager in NameNode.
+#log4j.logger.BlockStateChange=DEBUG
+
+#
+#Security appender
+#
+hadoop.security.logger=INFO,NullAppender
+hadoop.security.log.maxfilesize=256MB
+hadoop.security.log.maxbackupindex=20
+log4j.category.SecurityLogger=${hadoop.security.logger}
+hadoop.security.log.file=SecurityAuth-${user.name}.audit
+log4j.appender.R

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550578#comment-16550578
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203989614
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/config/log4j.properties
 ---
@@ -0,0 +1,354 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+
+# Define some default values that can be overridden by system properties
+hadoop.root.logger=INFO,console
+hadoop.log.dir=.
+hadoop.log.file=hadoop.log
+
+# Define the root logger to the system property "hadoop.root.logger".
+log4j.rootLogger=${hadoop.root.logger}, EventCounter
+
+# Logging Threshold
+log4j.threshold=ALL
+
+# Null Appender
+log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
+
+#
+# Rolling File Appender - cap space usage at 5gb.
+#
+hadoop.log.maxfilesize=256MB
+hadoop.log.maxbackupindex=20
+log4j.appender.RFA=org.apache.log4j.RollingFileAppender
+log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
+
+log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize}
+log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex}
+
+log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
+
+# Pattern format: Date LogLevel LoggerName LogMessage
+log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
+# Debugging Pattern format
+#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} 
(%F:%M(%L)) - %m%n
+
+
+#
+# Daily Rolling File Appender
+#
+
+log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
+
+# Rollover at midnight
+log4j.appender.DRFA.DatePattern=.-MM-dd
+
+log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
+
+# Pattern format: Date LogLevel LoggerName LogMessage
+log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
+# Debugging Pattern format
+#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} 
(%F:%M(%L)) - %m%n
+
+
+#
+# console
+# Add "console" to rootlogger above if you want to use this
+#
+
+log4j.appender.console=org.apache.log4j.ConsoleAppender
+log4j.appender.console.target=System.err
+log4j.appender.console.layout=org.apache.log4j.PatternLayout
+log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p 
%c{2}: %m%n
+
+#
+# TaskLog Appender
+#
+
+#Default values
+hadoop.tasklog.taskid=null
+hadoop.tasklog.iscleanup=false
+hadoop.tasklog.noKeepSplits=4
+hadoop.tasklog.totalLogFileSize=100
+hadoop.tasklog.purgeLogSplits=true
+hadoop.tasklog.logsRetainHours=12
+
+log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
+log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
+log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
+log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
+
+log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
+log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
+
+#
+# HDFS block state change log from block manager
+#
+# Uncomment the following to log normal block state change
+# messages from BlockManager in NameNode.
+#log4j.logger.BlockStateChange=DEBUG
+
+#
+#Security appender
+#
+hadoop.security.logger=INFO,NullAppender
+hadoop.security.log.maxfilesize=256MB
+hadoop.security.log.maxbackupindex=20
+log4j.category.SecurityLogger=${hadoop.security.logger}
+hadoop.security.log.file=SecurityAuth-${user.name}.audit
+log4j.appender.R

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550575#comment-16550575
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203989263
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
+RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync 
unzip
+
+# Kerberos client
+RUN apt-get install krb5-user -y
+RUN mkdir -p /var/log/kerberos
+RUN touch /var/log/kerberos/kadmind.log
+
+# passwordless ssh
+RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key 
/root/.ssh/id_rsa
+RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
+RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
+
+# java
+RUN mkdir -p /usr/java/default && \
+ curl -Ls 
'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz'
 -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
+ tar --strip-components=1 -xz -C /usr/java/default/
+
+ENV JAVA_HOME /usr/java/default
+ENV PATH $PATH:$JAVA_HOME/bin
+
+RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 
'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
+RUN unzip jce_policy-8.zip
+RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar 
/UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
+
+ENV HADOOP_VERSION=2.8.4
--- End diff --

I think the solution in the long run should be to never ship Flink with a 
Hadoop version, i.e. make the hadoop-free version the default. 


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550573#comment-16550573
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203989036
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
+RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync 
unzip
+
+# Kerberos client
+RUN apt-get install krb5-user -y
+RUN mkdir -p /var/log/kerberos
+RUN touch /var/log/kerberos/kadmind.log
+
+# passwordless ssh
+RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key 
/root/.ssh/id_rsa
+RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
+RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
+
+# java
+RUN mkdir -p /usr/java/default && \
+ curl -Ls 
'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz'
 -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
+ tar --strip-components=1 -xz -C /usr/java/default/
+
+ENV JAVA_HOME /usr/java/default
+ENV PATH $PATH:$JAVA_HOME/bin
+
+RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 
'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
+RUN unzip jce_policy-8.zip
+RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar 
/UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
+
+ENV HADOOP_VERSION=2.8.4
+
+# ENV HADOOP_URL 
https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
--- End diff --

removing


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs|

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550571#comment-16550571
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203988838
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/bootstrap.sh 
---
@@ -0,0 +1,121 @@
+#!/bin/bash

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+
+: ${HADOOP_PREFIX:=/usr/local/hadoop}
+
+$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
+
+rm /tmp/*.pid
+
+# installing libraries if any - (resource urls added comma separated to 
the ACP system variable)
+cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == 
$cp; curl -LO $cp ; done; cd -
+
+# kerberos client
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" /etc/krb5.conf
--- End diff --

`EXAMPLE.COM` is pretty buch the placeholder for this and could be replaced 
with a different realm in `bootstrap.sh`. But the default is just to still use 
`EXAMPLE.COM`. I could rename this `TEMPLATE.URL` if you want. 😅 


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550501#comment-16550501
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203974298
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/bootstrap.sh 
---
@@ -0,0 +1,121 @@
+#!/bin/bash

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+
+: ${HADOOP_PREFIX:=/usr/local/hadoop}
+
+$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
+
+rm /tmp/*.pid
+
+# installing libraries if any - (resource urls added comma separated to 
the ACP system variable)
+cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == 
$cp; curl -LO $cp ; done; cd -
+
+# kerberos client
+sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" /etc/krb5.conf
--- End diff --

`EXAMPLE.COM` is used in several places, is there any way we can set this 
in a single place? (for example with search&replace if necessary)


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550498#comment-16550498
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203969501
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/README.md ---
@@ -0,0 +1,118 @@
+# Apache Hadoop Docker image with Kerberos enabled
+
+This image is modified version of Knappek/docker-hadoop-secure
+ * Knappek/docker-hadoop-secure 

+
+With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+ * Lewuathe/docker-hadoop-cluster 

+
+And a lot of added stuff for making this an actual, properly configured, 
kerberized cluster with proper user/permissions structure.
+
+Versions
+
+
+* JDK8
+* Hadoop 2.8.3
+
+Default Environment Variables
+-
+
+| Name | Value | Description |
+|  |   |  |
+| `KRB_REALM` | `EXAMPLE.COM` | The Kerberos Realm, more information 
[here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#)
 |
+| `DOMAIN_REALM` | `example.com` | The Kerberos Domain Realm, more 
information 
[here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#)
 |
+| `KERBEROS_ADMIN` | `admin/admin` | The KDC admin user |
+| `KERBEROS_ADMIN_PASSWORD` | `admin` | The KDC admin password |
+
+You can simply define these variables in the `docker-compose.yml`.
+
+Run image
+-
+
+Clone the [Github 
project](https://github.com/aljoscha/docker-hadoop-secure-cluster) and run
--- End diff --

point to apache repo instead


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550503#comment-16550503
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203973314
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
+RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync 
unzip
+
+# Kerberos client
+RUN apt-get install krb5-user -y
+RUN mkdir -p /var/log/kerberos
+RUN touch /var/log/kerberos/kadmind.log
+
+# passwordless ssh
+RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key 
/root/.ssh/id_rsa
+RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
+RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
+
+# java
+RUN mkdir -p /usr/java/default && \
+ curl -Ls 
'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz'
 -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
+ tar --strip-components=1 -xz -C /usr/java/default/
+
+ENV JAVA_HOME /usr/java/default
+ENV PATH $PATH:$JAVA_HOME/bin
+
+RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 
'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
+RUN unzip jce_policy-8.zip
+RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar 
/UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
+
+ENV HADOOP_VERSION=2.8.4
+
+# ENV HADOOP_URL 
https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
--- End diff --

remove


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> 

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550500#comment-16550500
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203972230
  
--- Diff: flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh 
---
@@ -0,0 +1,104 @@
+#!/usr/bin/env bash

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+set -o pipefail
+
+source "$(dirname "$0")"/common.sh
+
+FLINK_TARBALL_DIR=$TEST_DATA_DIR
+FLINK_TARBALL=flink.tar.gz
+FLINK_DIRNAME=$(basename $FLINK_DIR)
+
+echo "Flink Tarball directory $FLINK_TARBALL_DIR"
+echo "Flink tarball filename $FLINK_TARBALL"
+echo "Flink distribution directory name $FLINK_DIRNAME"
+echo "End-to-end directory $END_TO_END_DIR"
+docker --version
+docker-compose --version
+
+mkdir -p $FLINK_TARBALL_DIR
+tar czf $FLINK_TARBALL_DIR/$FLINK_TARBALL -C $(dirname $FLINK_DIR) .
+
+echo "Building Hadoop Docker container"
+until docker build -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/Dockerfile -t 
flink/docker-hadoop-secure-cluster:latest 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/; do
+# with all the downloading and ubuntu updating a lot of flakiness can 
happen, make sure
+# we don't immediately fail
+echo "Something went wrong while building the Docker image, retrying 
..."
+sleep 2
+done
+
+echo "Starting Hadoop cluster"
+docker-compose -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml up 
-d
+
+# make sure we stop our cluster at the end
+function cluster_shutdown {
+  # don't call ourselves again for another signal interruption
+  trap "exit -1" INT
+  # don't call ourselves again for normal exit
+  trap "" EXIT
+
+  docker-compose -f 
$END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml 
down
+  rm $FLINK_TARBALL_DIR/$FLINK_TARBALL
+}
+trap cluster_shutdown INT
+trap cluster_shutdown EXIT
+
+until docker cp $FLINK_TARBALL_DIR/$FLINK_TARBALL 
master:/home/hadoop-user/; do
+# we're retrying this one because we don't know yet if the container 
is ready
+echo "Uploading Flink tarball to docker master failed, retrying ..."
+sleep 5
+done
+
+# now, at least the container is ready
+docker exec -it master bash -c "tar xzf /home/hadoop-user/$FLINK_TARBALL 
--directory /home/hadoop-user/"
+
+docker exec -it master bash -c "echo \"security.kerberos.login.keytab: 
/home/hadoop-user/hadoop-user.keytab\" >> 
/home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
+docker exec -it master bash -c "echo \"security.kerberos.login.principal: 
hadoop-user\" >> /home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
+
+echo "Flink config:"
+docker exec -it master bash -c "cat 
/home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
+
+# make the output path random, just in case it already exists, for example 
if we
+# had cached docker containers
+OUTPUT_PATH=hdfs:///user/hadoop-user/wc-out-$RANDOM
+
+# it's important to run this with higher parallelism, otherwise we might 
risk that
+# JM and TM are on the same YARN node and that we therefore don't test the 
keytab shipping
+until docker exec -it master bash -c "export HADOOP_CLASSPATH=\`hadoop 
classpath\` && /home/hadoop-user/$FLINK_DIRNAME/bin/flink run -m yarn-cluster 
-yn 3 -ys 1 -ytm 1200 -yjm 800 -p 3 
/home/hadoop-user/$FLINK_DIRNAME/examples/streaming/WordCount.jar --output 
$OUTPUT_PATH"; do
+echo "Running the Flink job failed, might be that the cluster is not 
ready yet, retrying ..."
--- End diff --

is there no way to check whether 

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550499#comment-16550499
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203973291
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
@@ -0,0 +1,159 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+#
+# This image is modified version of Knappek/docker-hadoop-secure
+#   * Knappek/docker-hadoop-secure 

+#
+# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend 
it to start a proper kerberized Hadoop cluster:
+#   * Lewuathe/docker-hadoop-cluster 

+#
+# Author: Aljoscha Krettek
+# Date:   2018 May, 15
+#
+# Creates multi-node, kerberized Hadoop cluster on Docker
+
+FROM sequenceiq/pam:ubuntu-14.04
+MAINTAINER aljoscha
+
+USER root
+
+RUN addgroup hadoop
+RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
+RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
+RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
+
+RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
+
+# install dev tools
+RUN apt-get update
+RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync 
unzip
+
+# Kerberos client
+RUN apt-get install krb5-user -y
+RUN mkdir -p /var/log/kerberos
+RUN touch /var/log/kerberos/kadmind.log
+
+# passwordless ssh
+RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key 
/root/.ssh/id_rsa
+RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
+RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
+RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
+
+# java
+RUN mkdir -p /usr/java/default && \
+ curl -Ls 
'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz'
 -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
+ tar --strip-components=1 -xz -C /usr/java/default/
+
+ENV JAVA_HOME /usr/java/default
+ENV PATH $PATH:$JAVA_HOME/bin
+
+RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 
'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
+RUN unzip jce_policy-8.zip
+RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar 
/UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
+
+ENV HADOOP_VERSION=2.8.4
--- End diff --

This potentially uses a different hadoop version than the one against 
flink-dist was built against.


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550502#comment-16550502
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/6377#discussion_r203972431
  
--- Diff: 
flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/config/log4j.properties
 ---
@@ -0,0 +1,354 @@

+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+
+
+# Define some default values that can be overridden by system properties
+hadoop.root.logger=INFO,console
+hadoop.log.dir=.
+hadoop.log.file=hadoop.log
+
+# Define the root logger to the system property "hadoop.root.logger".
+log4j.rootLogger=${hadoop.root.logger}, EventCounter
+
+# Logging Threshold
+log4j.threshold=ALL
+
+# Null Appender
+log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
+
+#
+# Rolling File Appender - cap space usage at 5gb.
+#
+hadoop.log.maxfilesize=256MB
+hadoop.log.maxbackupindex=20
+log4j.appender.RFA=org.apache.log4j.RollingFileAppender
+log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
+
+log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize}
+log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex}
+
+log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
+
+# Pattern format: Date LogLevel LoggerName LogMessage
+log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
+# Debugging Pattern format
+#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} 
(%F:%M(%L)) - %m%n
+
+
+#
+# Daily Rolling File Appender
+#
+
+log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
+
+# Rollover at midnight
+log4j.appender.DRFA.DatePattern=.-MM-dd
+
+log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
+
+# Pattern format: Date LogLevel LoggerName LogMessage
+log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
+# Debugging Pattern format
+#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} 
(%F:%M(%L)) - %m%n
+
+
+#
+# console
+# Add "console" to rootlogger above if you want to use this
+#
+
+log4j.appender.console=org.apache.log4j.ConsoleAppender
+log4j.appender.console.target=System.err
+log4j.appender.console.layout=org.apache.log4j.PatternLayout
+log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p 
%c{2}: %m%n
+
+#
+# TaskLog Appender
+#
+
+#Default values
+hadoop.tasklog.taskid=null
+hadoop.tasklog.iscleanup=false
+hadoop.tasklog.noKeepSplits=4
+hadoop.tasklog.totalLogFileSize=100
+hadoop.tasklog.purgeLogSplits=true
+hadoop.tasklog.logsRetainHours=12
+
+log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
+log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
+log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
+log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
+
+log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
+log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
+
+#
+# HDFS block state change log from block manager
+#
+# Uncomment the following to log normal block state change
+# messages from BlockManager in NameNode.
+#log4j.logger.BlockStateChange=DEBUG
+
+#
+#Security appender
+#
+hadoop.security.logger=INFO,NullAppender
+hadoop.security.log.maxfilesize=256MB
+hadoop.security.log.maxbackupindex=20
+log4j.category.SecurityLogger=${hadoop.security.logger}
+hadoop.security.log.file=SecurityAuth-${user.name}.audit
+log4j.appender.RFA

[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550368#comment-16550368
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/6377
  
This PR adds the test to `flink-ci`: 
https://github.com/zentol/flink-ci/pull/1

This is a run on my on `flink-ci` fork where the test is run five times 
without issue: https://travis-ci.org/aljoscha/flink-ci/builds/405995875


> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8981) Add end-to-end test for running on YARN with Kerberos

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550360#comment-16550360
 ] 

ASF GitHub Bot commented on FLINK-8981:
---

GitHub user aljoscha opened a pull request:

https://github.com/apache/flink/pull/6377

[FLINK-8981] Add end-to-end test for running on YARN with Kerberos

This adds a complete Docker container setup and Docker Compose file for
starting a kerberized Hadoop cluster on Docker.

The test script does the following:
 * package "build-target" Flink dist into a tarball
 * build docker container
 * start cluster using docker compose
 * upload tarball and unpack
 * modify flink-conf.yaml to use Kerberos keytab for hadoop-user
 * Run Streaming WordCount Job
 * verify results

We set an exit trap before to ensure that we shut down the docker
compose cluster at the end.

As a prerequisite, this also fixes how we resolve directories in the 
end-to-end scripts.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/flink 
jira-8981-kerberos-end-to-end-test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6377.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6377


commit 5aec051a76089f623ebc21418ec5751f9fcad780
Author: Aljoscha Krettek 
Date:   2018-07-18T09:51:27Z

[hotfix] Resolve symbolic links in test scripts

commit 634426b096a36147c3180f9c732efef51155e5bb
Author: Aljoscha Krettek 
Date:   2018-07-18T11:46:29Z

[FLINK-8981] Add end-to-end test for running on YARN with Kerberos

This adds a complete Docker container setup and Docker Compose file for
starting a kerberized Hadoop cluster on Docker.

The test script does the following:
 * package "build-target" Flink dist into a tarball
 * build docker container
 * start cluster using docker compose
 * upload tarball and unpack
 * modify flink-conf.yaml to use Kerberos keytab for hadoop-user
 * Run Streaming WordCount Job
 * verify results

We set an exit trap before to ensure that we shut down the docker
compose cluster at the end.




> Add end-to-end test for running on YARN with Kerberos
> -
>
> Key: FLINK-8981
> URL: https://issues.apache.org/jira/browse/FLINK-8981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Security, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Aljoscha Krettek
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We should add an end-to-end test which verifies Flink's integration with 
> Kerberos security. In order to do this, we should start a Kerberos secured 
> Hadoop, ZooKeeper and Kafka cluster. Then we should start a Flink cluster 
> with HA enabled and run a job which reads from and writes to Kafka. We could 
> use a simple pipe job for that purpose which has some state for checkpointing 
> to HDFS.
> See [security docs| 
> https://ci.apache.org/projects/flink/flink-docs-master/ops/security-kerberos.html]
>  for how more information about Flink's Kerberos integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)