Re: Flink 1.9, MapR secure cluster, high availability
Hi! Not sure what is happening here. - I cannot understand why MapR FS should use Flink's relocated ZK dependency - It might be that it doesn't and that all the logging we see probably comes from Flink's HA services. Maybe the MapR stuff uses a different logging framework and the logs do not get forwarded (there is suspiciously no MapR related log line at all in the code) - Then it might be that Flink is simply not set up with the correct credentials to work against ZK. - Can you check this page here? https://ci.apache.org/projects/flink/flink-docs-stable/ops/jobmanager_high_availability.html#configuring-for-zookeeper-security Best, Stephan On Mon, Sep 16, 2019 at 6:14 PM Maxim Parkachov wrote: > Hi Stephan, > > sorry for the late answer, didn't have access to cluster. > Here is log and stacktrace. > > Hope this helps, > Maxim. > > > - > 2019-09-16 18:00:31,804 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > > 2019-09-16 18:00:31,806 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting > YarnSessionClusterEntrypoint (Version: 1.9.0, Rev:9c32ed9, Date:19.08.2019 > @ 16:16:55 UTC) > 2019-09-16 18:00:31,806 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS > current user: run > 2019-09-16 18:00:32,285 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current > Hadoop/Kerberos user: run > 2019-09-16 18:00:32,285 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: Java > HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.152-b16 > 2019-09-16 18:00:32,285 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum > heap size: 161 MiBytes > 2019-09-16 18:00:32,285 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > JAVA_HOME: /pb/apps/java/jdk1.8.0_152/ > 2019-09-16 18:00:32,286 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Hadoop > version: 2.7.0-mapr-1808 > 2019-09-16 18:00:32,286 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM > Options: > 2019-09-16 18:00:32,286 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Xms168m > 2019-09-16 18:00:32,286 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Xmx168m > 2019-09-16 18:00:32,286 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Dlog.file=/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1568125115996_13508/container_e64_1568125115996_13508_01_01/jobmanager.log > 2019-09-16 18:00:32,286 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > -Dlog4j.configuration=file:log4j.properties > 2019-09-16 18:00:32,286 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program > Arguments: (none) > 2019-09-16 18:00:32,287 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > Classpath: > log4j.properties:flink.jar:flink-conf.yaml::/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/hadoop-common-2.7.0-mapr-1808-tests.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/hadoop-common-2.7.0-mapr-1808.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/hadoop-nfs-2.7.0-mapr-1808.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jackson-databind-2.9.5.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/activation-1.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jsr305-3.0.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jersey-server-1.9.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/junit-4.11.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/log4j-1.2.17.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/asm-3.2.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/maprdb-6.1.0-mapr.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/avro-1.7.6.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/mockito-all-1.8.5.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jettison-1.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/mapr-spark-yarn-shuffle.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-cli-1.2.jar:/opt/mapr/hadoop/hadoop-2
Re: Flink 1.9, MapR secure cluster, high availability
Hi Stephan, sorry for the late answer, didn't have access to cluster. Here is log and stacktrace. Hope this helps, Maxim. - 2019-09-16 18:00:31,804 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 2019-09-16 18:00:31,806 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint (Version: 1.9.0, Rev:9c32ed9, Date:19.08.2019 @ 16:16:55 UTC) 2019-09-16 18:00:31,806 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: run 2019-09-16 18:00:32,285 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user: run 2019-09-16 18:00:32,285 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.152-b16 2019-09-16 18:00:32,285 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 161 MiBytes 2019-09-16 18:00:32,285 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: /pb/apps/java/jdk1.8.0_152/ 2019-09-16 18:00:32,286 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Hadoop version: 2.7.0-mapr-1808 2019-09-16 18:00:32,286 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options: 2019-09-16 18:00:32,286 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xms168m 2019-09-16 18:00:32,286 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx168m 2019-09-16 18:00:32,286 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog.file=/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1568125115996_13508/container_e64_1568125115996_13508_01_01/jobmanager.log 2019-09-16 18:00:32,286 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:log4j.properties 2019-09-16 18:00:32,286 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments: (none) 2019-09-16 18:00:32,287 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Classpath: log4j.properties:flink.jar:flink-conf.yaml::/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/hadoop-common-2.7.0-mapr-1808-tests.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/hadoop-common-2.7.0-mapr-1808.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/hadoop-nfs-2.7.0-mapr-1808.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jackson-databind-2.9.5.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/activation-1.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jsr305-3.0.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jersey-server-1.9.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/junit-4.11.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/log4j-1.2.17.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/asm-3.2.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/maprdb-6.1.0-mapr.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/avro-1.7.6.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/mockito-all-1.8.5.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jettison-1.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/mapr-spark-yarn-shuffle.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-cli-1.2.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/netty-3.6.2.Final.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-codec-1.4.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jetty-6.1.26.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-collections-3.2.2.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/xz-1.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-compress-1.4.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jetty-util-6.1.26.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-configuration-1.6.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/json-smart-1.2.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-digester-1.8.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/maprfs-6.1.0-mapr.jar:/opt/mapr/hado
Re: Flink 1.9, MapR secure cluster, high availability
Could you share the stack trace where the failure occurs, so we can see why the Flink ZK is used during MapR FS access? /CC Till and Tison - just FYI On Fri, Aug 30, 2019 at 9:40 AM Maxim Parkachov wrote: > Hi Stephan, > > With previous versions, I tried around 1.7, I always had to compile MapR > hadoop to get it working. > With 1.9 I took hadoop-less Flink, which worked with MapR FS until I > switched on HA. > So it is hard to say if this is regression or not. > > The error happens when Flink tries to initialize BLOB storage on MapR FS. > Without HA it takes > zookeeper from classpath (MapR org.apache.zookeeper) and with HA it takes > shaded one. > > After fixing couple of issue with pom, I was able to compile Flink with > MapR zookeeper and now > when I start with HA mode it uses shaded zookeeper (which is now MapR) to > initialize BLOB and > org.apache.zookeeper (which is as well MapR) for HA recovery. > > It works, but, I was expecting it to work without compiling MapR > dependencies. > > Hope this helps, > Maxim. > > On Thu, Aug 29, 2019 at 7:00 PM Stephan Ewen wrote: > >> Hi Maxim! >> >> The change of the MapR dependency should not have an impact on that. >> Do you know if the same thing worked in prior Flink versions? Is that a >> regression in 1.9? >> >> The exception that you report, is that from Flink's HA services trying to >> connect to ZK, or from the MapR FS client trying to connect to ZK? >> >> Best, >> Stephan >> >> >> On Tue, Aug 27, 2019 at 11:03 AM Maxim Parkachov >> wrote: >> >>> Hi everyone, >>> >>> I'm testing release 1.9 on MapR secure cluster. I took flink binaries >>> from download page and trying to start Yarn session cluster. All MapR >>> specific libraries and configs are added according to documentation. >>> >>> When I start yarn-session without high availability, it uses zookeeper >>> from MapR distribution (org.apache.zookeeper) and correctly connects to >>> cluster and access to maprfs works as expected. >>> >>> But if I add zookeeper as high-avalability option, instead of MapR >>> zookeeper it tries to use shaded zookeeper and this one could not connect >>> with mapr credentials: >>> >>> 2019-08-27 10:42:45,240 ERROR >>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.client.ZooKeeperSaslClient >>> - An error: (java.security.PrivilegedActionException: >>> javax.security.sasl.SaslException: GSS initiate failed [Caused by >>> GSSException: No valid credentials provided (Mechanism level: Failed to >>> find any Kerberos tgt)]) occurred when evaluating Zookeeper Quorum Member's >>> received SASL token. Zookeeper Client will go to AUTH_FAILED state. >>> 2019-08-27 10:42:45,240 ERROR >>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - SASL >>> authentication with Zookeeper Quorum member failed: >>> javax.security.sasl.SaslException: An error: >>> (java.security.PrivilegedActionException: >>> javax.security.sasl.SaslException: GSS initiate failed [Caused by >>> GSSException: No valid credentials provided (Mechanism level: Failed to >>> find any Kerberos tgt)]) occurred when evaluating Zookeeper Quorum Member's >>> received SASL token. Zookeeper Client will go to AUTH_FAILED state. >>> I tried to use separate zookeeper cluster for HA, but maprfs still doesn't >>> work. >>> >>> Is this related to removal of MapR specific settings in Release 1.9 ? >>> Should I still compile custom version of Flink with MapR dependencies ? >>> (trying to do now, but getting some errors during compilation). >>> >>> Can I somehow force flink to use MapR zookeeper even with HA mode ? >>> >>> Thanks in advance, >>> Maxim. >>> >>>
Re: Flink 1.9, MapR secure cluster, high availability
Hi Stephan, With previous versions, I tried around 1.7, I always had to compile MapR hadoop to get it working. With 1.9 I took hadoop-less Flink, which worked with MapR FS until I switched on HA. So it is hard to say if this is regression or not. The error happens when Flink tries to initialize BLOB storage on MapR FS. Without HA it takes zookeeper from classpath (MapR org.apache.zookeeper) and with HA it takes shaded one. After fixing couple of issue with pom, I was able to compile Flink with MapR zookeeper and now when I start with HA mode it uses shaded zookeeper (which is now MapR) to initialize BLOB and org.apache.zookeeper (which is as well MapR) for HA recovery. It works, but, I was expecting it to work without compiling MapR dependencies. Hope this helps, Maxim. On Thu, Aug 29, 2019 at 7:00 PM Stephan Ewen wrote: > Hi Maxim! > > The change of the MapR dependency should not have an impact on that. > Do you know if the same thing worked in prior Flink versions? Is that a > regression in 1.9? > > The exception that you report, is that from Flink's HA services trying to > connect to ZK, or from the MapR FS client trying to connect to ZK? > > Best, > Stephan > > > On Tue, Aug 27, 2019 at 11:03 AM Maxim Parkachov > wrote: > >> Hi everyone, >> >> I'm testing release 1.9 on MapR secure cluster. I took flink binaries >> from download page and trying to start Yarn session cluster. All MapR >> specific libraries and configs are added according to documentation. >> >> When I start yarn-session without high availability, it uses zookeeper >> from MapR distribution (org.apache.zookeeper) and correctly connects to >> cluster and access to maprfs works as expected. >> >> But if I add zookeeper as high-avalability option, instead of MapR >> zookeeper it tries to use shaded zookeeper and this one could not connect >> with mapr credentials: >> >> 2019-08-27 10:42:45,240 ERROR >> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.client.ZooKeeperSaslClient >> - An error: (java.security.PrivilegedActionException: >> javax.security.sasl.SaslException: GSS initiate failed [Caused by >> GSSException: No valid credentials provided (Mechanism level: Failed to find >> any Kerberos tgt)]) occurred when evaluating Zookeeper Quorum Member's >> received SASL token. Zookeeper Client will go to AUTH_FAILED state. >> 2019-08-27 10:42:45,240 ERROR >> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - SASL >> authentication with Zookeeper Quorum member failed: >> javax.security.sasl.SaslException: An error: >> (java.security.PrivilegedActionException: javax.security.sasl.SaslException: >> GSS initiate failed [Caused by GSSException: No valid credentials provided >> (Mechanism level: Failed to find any Kerberos tgt)]) occurred when >> evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client >> will go to AUTH_FAILED state. >> I tried to use separate zookeeper cluster for HA, but maprfs still doesn't >> work. >> >> Is this related to removal of MapR specific settings in Release 1.9 ? >> Should I still compile custom version of Flink with MapR dependencies ? >> (trying to do now, but getting some errors during compilation). >> >> Can I somehow force flink to use MapR zookeeper even with HA mode ? >> >> Thanks in advance, >> Maxim. >> >>
Re: Flink 1.9, MapR secure cluster, high availability
Hi Maxim! The change of the MapR dependency should not have an impact on that. Do you know if the same thing worked in prior Flink versions? Is that a regression in 1.9? The exception that you report, is that from Flink's HA services trying to connect to ZK, or from the MapR FS client trying to connect to ZK? Best, Stephan On Tue, Aug 27, 2019 at 11:03 AM Maxim Parkachov wrote: > Hi everyone, > > I'm testing release 1.9 on MapR secure cluster. I took flink binaries from > download page and trying to start Yarn session cluster. All MapR specific > libraries and configs are added according to documentation. > > When I start yarn-session without high availability, it uses zookeeper > from MapR distribution (org.apache.zookeeper) and correctly connects to > cluster and access to maprfs works as expected. > > But if I add zookeeper as high-avalability option, instead of MapR > zookeeper it tries to use shaded zookeeper and this one could not connect > with mapr credentials: > > 2019-08-27 10:42:45,240 ERROR > org.apache.flink.shaded.zookeeper.org.apache.zookeeper.client.ZooKeeperSaslClient > - An error: (java.security.PrivilegedActionException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]) occurred when evaluating Zookeeper Quorum Member's > received SASL token. Zookeeper Client will go to AUTH_FAILED state. > 2019-08-27 10:42:45,240 ERROR > org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - SASL > authentication with Zookeeper Quorum member failed: > javax.security.sasl.SaslException: An error: > (java.security.PrivilegedActionException: javax.security.sasl.SaslException: > GSS initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)]) occurred when evaluating > Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to > AUTH_FAILED state. > I tried to use separate zookeeper cluster for HA, but maprfs still doesn't > work. > > Is this related to removal of MapR specific settings in Release 1.9 ? > Should I still compile custom version of Flink with MapR dependencies ? > (trying to do now, but getting some errors during compilation). > > Can I somehow force flink to use MapR zookeeper even with HA mode ? > > Thanks in advance, > Maxim. > >