Re: Flink 1.9, MapR secure cluster, high availability

2019-09-19 Thread Stephan Ewen
Hi!

Not sure what is happening here.

  - I cannot understand why MapR FS should use Flink's relocated ZK
dependency
  - It might be that it doesn't and that all the logging we see probably
comes from Flink's HA services. Maybe the MapR stuff uses a different
logging framework and the logs do not get forwarded (there is suspiciously
no MapR related log line at all in the code)
  - Then it might be that Flink is simply not set up with the correct
credentials to work against ZK.
  - Can you check this page here?
https://ci.apache.org/projects/flink/flink-docs-stable/ops/jobmanager_high_availability.html#configuring-for-zookeeper-security

Best,
Stephan


On Mon, Sep 16, 2019 at 6:14 PM Maxim Parkachov 
wrote:

> Hi Stephan,
>
> sorry for the late answer, didn't have access to cluster.
> Here is log and stacktrace.
>
> Hope this helps,
> Maxim.
>
>
> -
> 2019-09-16 18:00:31,804 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
> 
> 2019-09-16 18:00:31,806 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Starting
> YarnSessionClusterEntrypoint (Version: 1.9.0, Rev:9c32ed9, Date:19.08.2019
> @ 16:16:55 UTC)
> 2019-09-16 18:00:31,806 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  OS
> current user: run
> 2019-09-16 18:00:32,285 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Current
> Hadoop/Kerberos user: run
> 2019-09-16 18:00:32,285 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JVM: Java
> HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.152-b16
> 2019-09-16 18:00:32,285 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Maximum
> heap size: 161 MiBytes
> 2019-09-16 18:00:32,285 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
>  JAVA_HOME: /pb/apps/java/jdk1.8.0_152/
> 2019-09-16 18:00:32,286 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Hadoop
> version: 2.7.0-mapr-1808
> 2019-09-16 18:00:32,286 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JVM
> Options:
> 2019-09-16 18:00:32,286 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
> -Xms168m
> 2019-09-16 18:00:32,286 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
> -Xmx168m
> 2019-09-16 18:00:32,286 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
> -Dlog.file=/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1568125115996_13508/container_e64_1568125115996_13508_01_01/jobmanager.log
> 2019-09-16 18:00:32,286 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
> -Dlog4j.configuration=file:log4j.properties
> 2019-09-16 18:00:32,286 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Program
> Arguments: (none)
> 2019-09-16 18:00:32,287 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
>  Classpath:
> log4j.properties:flink.jar:flink-conf.yaml::/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/hadoop-common-2.7.0-mapr-1808-tests.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/hadoop-common-2.7.0-mapr-1808.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/hadoop-nfs-2.7.0-mapr-1808.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jackson-databind-2.9.5.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/activation-1.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jsr305-3.0.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jersey-server-1.9.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/junit-4.11.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/log4j-1.2.17.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/asm-3.2.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/maprdb-6.1.0-mapr.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/avro-1.7.6.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/mockito-all-1.8.5.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jettison-1.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/mapr-spark-yarn-shuffle.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-cli-1.2.jar:/opt/mapr/hadoop/hadoop-2

Re: Flink 1.9, MapR secure cluster, high availability

2019-09-16 Thread Maxim Parkachov
Hi Stephan,

sorry for the late answer, didn't have access to cluster.
Here is log and stacktrace.

Hope this helps,
Maxim.

-
2019-09-16 18:00:31,804 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -

2019-09-16 18:00:31,806 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Starting
YarnSessionClusterEntrypoint (Version: 1.9.0, Rev:9c32ed9, Date:19.08.2019
@ 16:16:55 UTC)
2019-09-16 18:00:31,806 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  OS
current user: run
2019-09-16 18:00:32,285 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Current
Hadoop/Kerberos user: run
2019-09-16 18:00:32,285 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JVM: Java
HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.152-b16
2019-09-16 18:00:32,285 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Maximum
heap size: 161 MiBytes
2019-09-16 18:00:32,285 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
 JAVA_HOME: /pb/apps/java/jdk1.8.0_152/
2019-09-16 18:00:32,286 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Hadoop
version: 2.7.0-mapr-1808
2019-09-16 18:00:32,286 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JVM
Options:
2019-09-16 18:00:32,286 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
-Xms168m
2019-09-16 18:00:32,286 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
-Xmx168m
2019-09-16 18:00:32,286 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
-Dlog.file=/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1568125115996_13508/container_e64_1568125115996_13508_01_01/jobmanager.log
2019-09-16 18:00:32,286 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
-Dlog4j.configuration=file:log4j.properties
2019-09-16 18:00:32,286 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Program
Arguments: (none)
2019-09-16 18:00:32,287 INFO
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
 Classpath:
log4j.properties:flink.jar:flink-conf.yaml::/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/hadoop-common-2.7.0-mapr-1808-tests.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/hadoop-common-2.7.0-mapr-1808.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/hadoop-nfs-2.7.0-mapr-1808.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jackson-databind-2.9.5.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/activation-1.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jsr305-3.0.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jersey-server-1.9.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/junit-4.11.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/log4j-1.2.17.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/asm-3.2.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/maprdb-6.1.0-mapr.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/avro-1.7.6.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/mockito-all-1.8.5.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jettison-1.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/mapr-spark-yarn-shuffle.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-cli-1.2.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/netty-3.6.2.Final.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-codec-1.4.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jetty-6.1.26.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-collections-3.2.2.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/xz-1.0.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-compress-1.4.1.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/jetty-util-6.1.26.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-configuration-1.6.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/json-smart-1.2.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/commons-digester-1.8.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/maprfs-6.1.0-mapr.jar:/opt/mapr/hado

Re: Flink 1.9, MapR secure cluster, high availability

2019-08-30 Thread Stephan Ewen
Could you share the stack trace where the failure occurs, so we can see why
the Flink ZK is used during MapR FS access?

/CC Till and Tison - just FYI

On Fri, Aug 30, 2019 at 9:40 AM Maxim Parkachov 
wrote:

> Hi Stephan,
>
> With previous versions, I tried around 1.7, I always had to compile MapR
> hadoop to get it working.
> With 1.9 I took hadoop-less Flink, which worked with MapR FS until I
> switched on HA.
> So it is hard to say if this is regression or not.
>
> The error happens when Flink tries to initialize BLOB storage on MapR FS.
> Without HA it takes
> zookeeper from classpath (MapR org.apache.zookeeper) and with HA it takes
> shaded one.
>
> After fixing couple of issue with pom, I was able to compile Flink with
> MapR zookeeper and now
> when I start with HA mode it uses shaded zookeeper (which is now MapR) to
> initialize BLOB and
> org.apache.zookeeper (which is as well MapR) for HA recovery.
>
> It works, but, I was expecting it to work without compiling MapR
> dependencies.
>
> Hope this helps,
> Maxim.
>
> On Thu, Aug 29, 2019 at 7:00 PM Stephan Ewen  wrote:
>
>> Hi Maxim!
>>
>> The change of the MapR dependency should not have an impact on that.
>> Do you know if the same thing worked in prior Flink versions? Is that a
>> regression in 1.9?
>>
>> The exception that you report, is that from Flink's HA services trying to
>> connect to ZK, or from the MapR FS client trying to connect to ZK?
>>
>> Best,
>> Stephan
>>
>>
>> On Tue, Aug 27, 2019 at 11:03 AM Maxim Parkachov 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I'm testing release 1.9 on MapR secure cluster. I took flink binaries
>>> from download page and trying to start Yarn session cluster. All MapR
>>> specific libraries and configs are added according to documentation.
>>>
>>> When I start yarn-session without high availability, it uses zookeeper
>>> from MapR distribution (org.apache.zookeeper) and correctly connects to
>>> cluster and access to maprfs works as expected.
>>>
>>> But if I add zookeeper as high-avalability option, instead of MapR
>>> zookeeper it tries to use shaded zookeeper and this one could not connect
>>> with mapr credentials:
>>>
>>> 2019-08-27 10:42:45,240 ERROR 
>>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.client.ZooKeeperSaslClient
>>>   - An error: (java.security.PrivilegedActionException: 
>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
>>> GSSException: No valid credentials provided (Mechanism level: Failed to 
>>> find any Kerberos tgt)]) occurred when evaluating Zookeeper Quorum Member's 
>>>  received SASL token. Zookeeper Client will go to AUTH_FAILED state.
>>> 2019-08-27 10:42:45,240 ERROR 
>>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - SASL 
>>> authentication with Zookeeper Quorum member failed: 
>>> javax.security.sasl.SaslException: An error: 
>>> (java.security.PrivilegedActionException: 
>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
>>> GSSException: No valid credentials provided (Mechanism level: Failed to 
>>> find any Kerberos tgt)]) occurred when evaluating Zookeeper Quorum Member's 
>>>  received SASL token. Zookeeper Client will go to AUTH_FAILED state.
>>> I tried to use separate zookeeper cluster for HA, but maprfs still doesn't 
>>> work.
>>>
>>> Is this related to removal of MapR specific settings in Release 1.9 ?
>>> Should I still compile custom version of Flink with MapR dependencies ?
>>> (trying to do now, but getting some errors during compilation).
>>>
>>> Can I somehow force flink to use MapR zookeeper even with HA mode ?
>>>
>>> Thanks in advance,
>>> Maxim.
>>>
>>>


Re: Flink 1.9, MapR secure cluster, high availability

2019-08-30 Thread Maxim Parkachov
Hi Stephan,

With previous versions, I tried around 1.7, I always had to compile MapR
hadoop to get it working.
With 1.9 I took hadoop-less Flink, which worked with MapR FS until I
switched on HA.
So it is hard to say if this is regression or not.

The error happens when Flink tries to initialize BLOB storage on MapR FS.
Without HA it takes
zookeeper from classpath (MapR org.apache.zookeeper) and with HA it takes
shaded one.

After fixing couple of issue with pom, I was able to compile Flink with
MapR zookeeper and now
when I start with HA mode it uses shaded zookeeper (which is now MapR) to
initialize BLOB and
org.apache.zookeeper (which is as well MapR) for HA recovery.

It works, but, I was expecting it to work without compiling MapR
dependencies.

Hope this helps,
Maxim.

On Thu, Aug 29, 2019 at 7:00 PM Stephan Ewen  wrote:

> Hi Maxim!
>
> The change of the MapR dependency should not have an impact on that.
> Do you know if the same thing worked in prior Flink versions? Is that a
> regression in 1.9?
>
> The exception that you report, is that from Flink's HA services trying to
> connect to ZK, or from the MapR FS client trying to connect to ZK?
>
> Best,
> Stephan
>
>
> On Tue, Aug 27, 2019 at 11:03 AM Maxim Parkachov 
> wrote:
>
>> Hi everyone,
>>
>> I'm testing release 1.9 on MapR secure cluster. I took flink binaries
>> from download page and trying to start Yarn session cluster. All MapR
>> specific libraries and configs are added according to documentation.
>>
>> When I start yarn-session without high availability, it uses zookeeper
>> from MapR distribution (org.apache.zookeeper) and correctly connects to
>> cluster and access to maprfs works as expected.
>>
>> But if I add zookeeper as high-avalability option, instead of MapR
>> zookeeper it tries to use shaded zookeeper and this one could not connect
>> with mapr credentials:
>>
>> 2019-08-27 10:42:45,240 ERROR 
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.client.ZooKeeperSaslClient
>>   - An error: (java.security.PrivilegedActionException: 
>> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
>> GSSException: No valid credentials provided (Mechanism level: Failed to find 
>> any Kerberos tgt)]) occurred when evaluating Zookeeper Quorum Member's  
>> received SASL token. Zookeeper Client will go to AUTH_FAILED state.
>> 2019-08-27 10:42:45,240 ERROR 
>> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - SASL 
>> authentication with Zookeeper Quorum member failed: 
>> javax.security.sasl.SaslException: An error: 
>> (java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
>> GSS initiate failed [Caused by GSSException: No valid credentials provided 
>> (Mechanism level: Failed to find any Kerberos tgt)]) occurred when 
>> evaluating Zookeeper Quorum Member's  received SASL token. Zookeeper Client 
>> will go to AUTH_FAILED state.
>> I tried to use separate zookeeper cluster for HA, but maprfs still doesn't 
>> work.
>>
>> Is this related to removal of MapR specific settings in Release 1.9 ?
>> Should I still compile custom version of Flink with MapR dependencies ?
>> (trying to do now, but getting some errors during compilation).
>>
>> Can I somehow force flink to use MapR zookeeper even with HA mode ?
>>
>> Thanks in advance,
>> Maxim.
>>
>>


Re: Flink 1.9, MapR secure cluster, high availability

2019-08-29 Thread Stephan Ewen
Hi Maxim!

The change of the MapR dependency should not have an impact on that.
Do you know if the same thing worked in prior Flink versions? Is that a
regression in 1.9?

The exception that you report, is that from Flink's HA services trying to
connect to ZK, or from the MapR FS client trying to connect to ZK?

Best,
Stephan


On Tue, Aug 27, 2019 at 11:03 AM Maxim Parkachov 
wrote:

> Hi everyone,
>
> I'm testing release 1.9 on MapR secure cluster. I took flink binaries from
> download page and trying to start Yarn session cluster. All MapR specific
> libraries and configs are added according to documentation.
>
> When I start yarn-session without high availability, it uses zookeeper
> from MapR distribution (org.apache.zookeeper) and correctly connects to
> cluster and access to maprfs works as expected.
>
> But if I add zookeeper as high-avalability option, instead of MapR
> zookeeper it tries to use shaded zookeeper and this one could not connect
> with mapr credentials:
>
> 2019-08-27 10:42:45,240 ERROR 
> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.client.ZooKeeperSaslClient
>   - An error: (java.security.PrivilegedActionException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]) occurred when evaluating Zookeeper Quorum Member's  
> received SASL token. Zookeeper Client will go to AUTH_FAILED state.
> 2019-08-27 10:42:45,240 ERROR 
> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - SASL 
> authentication with Zookeeper Quorum member failed: 
> javax.security.sasl.SaslException: An error: 
> (java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
> GSS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to find any Kerberos tgt)]) occurred when evaluating 
> Zookeeper Quorum Member's  received SASL token. Zookeeper Client will go to 
> AUTH_FAILED state.
> I tried to use separate zookeeper cluster for HA, but maprfs still doesn't 
> work.
>
> Is this related to removal of MapR specific settings in Release 1.9 ?
> Should I still compile custom version of Flink with MapR dependencies ?
> (trying to do now, but getting some errors during compilation).
>
> Can I somehow force flink to use MapR zookeeper even with HA mode ?
>
> Thanks in advance,
> Maxim.
>
>