Alexey and I looked at this today and realized the issue is with OpenSSL
1.1.1, which adds support for TLS 1.3. This breaks the TLS negotiation in
the krpc library. Likely Impala's usage of krpc would also break in this
environment when wire encryption is enabled. I put up a temporary fix
(disable TLS 1.3) here: http://gerrit.cloudera.org:8080/13683

Likely we need to cross-port this to Impala's krpc copy as well.

-Todd

On Wed, Jun 19, 2019 at 3:30 PM Alexey Serbin <aser...@cloudera.com> wrote:

> Yep, some time ago over a weekend I started with an attempt to get
> Fedora29 machine, but I stuck there while trying to provision such a
> thing.  I.e., the machine has been eventually provisioned, but I could not
> access it.  That was where I left it.
>
> Having Ubuntu18 as a target machine is better since at least it's easier
> to create one for me.  I've provisioned one already and I'm starting Kudu
> build there in at attempt take a look at the issue later tonight.
>
> I'll keep you posted on my findings.
>
>
> Kind regards,
>
> Alexey
>
> On Wed, Jun 19, 2019 at 2:53 PM Todd Lipcon <t...@cloudera.com> wrote:
>
>> This same issue was reported a month or two ago for Kudu on Fedora 29. I
>> think Alexey Serbin had started to look into it. Alexey, did we figure out
>> what was going on here?
>>
>> -Todd
>>
>> On Wed, Jun 19, 2019 at 6:00 AM Laszlo Gaal <laszlo.g...@cloudera.com>
>> wrote:
>>
>>> Having looked at the failing build Jim quoted above, the failure seems to
>>> come from the security area.
>>> This is from the Kudu master's log, from the startup sequence (see
>>>
>>> https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/16/artifact/Impala/logs_static/logs/cluster/cdh6-node-1/kudu/master/kudu-master.INFO/*view*/
>>> ),
>>> all this in the context of an Impala minicluster:
>>>
>>> I0612 04:12:56.129866  8515 sys_catalog.cc:424] T
>>> 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515
>>> [sys.catalog]: configured and running, proceeding with master startup.
>>> W0612 04:12:56.130080  8522 catalog_manager.cc:1113] T
>>> 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515:
>>> acquiring CA information for follower catalog manager: Not found: root CA
>>> entry not found
>>> W0612 04:12:56.130123  8522 catalog_manager.cc:596] Not found: root CA
>>> entry not found: failed to prepare follower catalog manager, will retry
>>> I0612 04:12:56.130151  8521 catalog_manager.cc:1055] Loading table and
>>> tablet metadata into memory...
>>> I0612 04:12:56.130228  8521 catalog_manager.cc:1066] Initializing Kudu
>>> internal certificate authority...
>>> W0612 04:12:56.167639  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50174: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.170145  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50176: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.172571  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50178: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.182530  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50180: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.185034  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50182: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.187453  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50184: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> I0612 04:12:56.197146  8521 catalog_manager.cc:950] Generated new
>>> certificate authority record
>>> I0612 04:12:56.198005  8521 catalog_manager.cc:1075] Loading token
>>> signing
>>> keys...
>>> W0612 04:12:56.293697  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50186: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.295320  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50188: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.296821  8636 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50190: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> I0612 04:12:56.416918  8521 catalog_manager.cc:4292] T
>>> 00000000000000000000000000000000 P 58a05ce6efa74b30907ac4d679bd0515:
>>> Generated new TSK 0
>>> W0612 04:12:57.174684  8901 negotiation.cc:320] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> 127.0.0.1:50192: expected TLS_HANDSHAKE step: SASL_INITIATE
>>> [and so on...]
>>>
>>> The same run has very similar messages in the tablet server logs as well:
>>> 0612 04:12:56.289767  8396 rpc_server.cc:205] RPC server started. Bound
>>> to:
>>> 127.0.0.1:31202
>>> I0612 04:12:56.289903  8396 webserver.cc:308] Webserver started at
>>> http://0.0.0.0:31302/ using document root
>>>
>>> /home/ubuntu/Impala/toolchain/cdh_components-1137441/kudu-1.10.0-cdh6.x-SNAPSHOT/release/bin/../lib/kudu/www
>>> and password file <none>
>>> W0612 04:12:56.293773  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (0 consecutive failures): Not authorized: Failed to ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:12:56.296866  8897 heartbeater.cc:380] Failed 3 heartbeats in a
>>> row: no longer allowing fast heartbeat attempts.
>>> W0612 04:13:56.424613  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (62 consecutive failures): Not authorized: Failed to ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:14:56.556850  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (122 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:15:56.694403  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (182 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:16:56.826400  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (242 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:17:56.955927  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (302 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:18:57.103503  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (362 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:19:57.237712  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (422 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:20:57.393489  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (482 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:21:57.522513  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (542 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:22:57.652271  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (602 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:23:57.782537  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (662 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>> W0612 04:24:57.910481  8897 heartbeater.cc:587] Failed to heartbeat to
>>> 127.0.0.1:7051 (722 consecutive failures): Not authorized: Failed to
>>> ping
>>> master at 127.0.0.1:7051: Client connection negotiation failed: client
>>> connection to 127.0.0.1:7051: FATAL_UNAUTHORIZED: Not authorized:
>>> expected
>>> TLS_HANDSHAKE step: SASL_INITIATE
>>>
>>>
>>> On Mon, Jun 17, 2019 at 9:08 PM Todd Lipcon <t...@cloudera.com> wrote:
>>>
>>> > On Sat, Jun 15, 2019 at 2:20 PM Jim Apple <apa...@jbapple.com> wrote:
>>> >
>>> > > My goal is to have Impala keep up with (what I perceive to be) the
>>> most
>>> > > popular version of the most popular Linux distribution, for the
>>> purpose
>>> > of
>>> > > easing the workflow of developers, especially new developers.
>>> > >
>>> >
>>> > Sure, that makes sense. I use Ubuntu 18 myself, but tend to develop
>>> Impala
>>> > on a remote box running el7 because the dev environment is too
>>> heavy-weight
>>> > to realistically run on my laptop.
>>> >
>>> >
>>> > >
>>> > > 18.04 stopped being able to load data some time between June 9th and
>>> > > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/14/ and
>>> June 12
>>> > > and
>>> > >
>>> > >
>>> >
>>> https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/16/artifact/Impala/logs_static/logs/data_loading/catalogd.ERROR/*view*/
>>> > > .
>>> > > I tried reproducing the June 9 run with the same git checkouts
>>> (Impala
>>> > and
>>> > > Impala-LZO) as #14 today, and data loading still failed.
>>> > >
>>> > > What RHEL 7 components did you have in mind that are closer to Ubuntu
>>> > 16.04
>>> > > than 18.04?
>>> > >
>>> >
>>> > Stuff like libc, openssl, krb5, sasl, etc are pretty different
>>> > version-wise. At least, I know when we made Kudu pass tests on Ubuntu
>>> 18,
>>> > we dealt with issues mostly in those libraries, which aren't part of
>>> the
>>> > toolchain (for security reasons we rely on OS-provided libs).
>>> >
>>> > Generally I think precommit running on something closer to the oldest
>>> > supported OS is better than running on the newest, since it's more
>>> likely
>>> > that new OSes are backward-compatible. Otherwise it's very easy to
>>> > introduce code that uses features not available on el7, for example.
>>> >
>>> >
>>> > >
>>> > > On Wed, May 22, 2019 at 10:41 AM Todd Lipcon <t...@cloudera.com>
>>> wrote:
>>> > >
>>> > > > On Mon, May 20, 2019 at 8:36 PM Jim Apple <apa...@jbapple.com>
>>> wrote:
>>> > > >
>>> > > > > Maybe now would be a good time to implement Everblue jobs that
>>> ping
>>> > > dev@
>>> > > > > when they fail. Thoughts?
>>> > > > >
>>> > > >
>>> > > > Mixed feelings on that. We already get many test runs per day of
>>> the
>>> > > > "default" config because people are running precommit builds.
>>> Adding an
>>> > > > additional cron-based job to the mix that runs the same builds
>>> doesn't
>>> > > seem
>>> > > > like it adds much unless it tests some other config (eg Ubuntu 18
>>> or a
>>> > > > longer suite of tests). One thing I could get on board with would
>>> be
>>> > > > switching the precommit builds to run just "core" tests or some
>>> other
>>> > > > faster subset, and defer the exhaustive/long runs to scheduled
>>> builds
>>> > or
>>> > > as
>>> > > > an optional precommit for particularly invasive patches. I think
>>> that
>>> > > would
>>> > > > increase dev quality of life substantially (I find my productivity
>>> is
>>> > > often
>>> > > > hampered by only getting two shots at a precommit run per work
>>> day).
>>> > > >
>>> > > > I'm not against adding a cron-triggered full test/build on Ubuntu
>>> 18,
>>> > but
>>> > > > would like to know if someone plans to sign up to triage it when it
>>> > > fails.
>>> > > > My experience with other Apache communities is that collective
>>> > ownership
>>> > > > over test triage duty (ie "email the dev list on failure" doesn't
>>> > work. I
>>> > > > seem to recall we had such builds back in 2010 or so on Hadoop and
>>> they
>>> > > > just always got ignored. In various "day job" teams I've seen this
>>> work
>>> > > via
>>> > > > a prescriptive rotation ("all team members take a triage/build-cop
>>> > > shift")
>>> > > > but that's not really compatbile with the nature of Apache projects
>>> > being
>>> > > > volunteer communities.
>>> > > >
>>> > > > So, I think I'll put the question back to you: as a committer you
>>> can
>>> > > spend
>>> > > > your time as you like. If you think an Ubuntu 18 job running on a
>>> > > schedule
>>> > > > would be useful and willing to sign up to triage failures, sounds
>>> great
>>> > > to
>>> > > > me :) Personally I don't develop on Ubuntu 18 and in my day job
>>> it's
>>> > not
>>> > > a
>>> > > > particularly important deployment platform, so I personally don't
>>> think
>>> > > > I'll spend much time triaging that build.
>>> > > >
>>> > > > Todd
>>> > > >
>>> > > >
>>> > > > >
>>> > > > > On Mon, May 20, 2019 at 9:09 AM Todd Lipcon <t...@cloudera.com>
>>> > wrote:
>>> > > > >
>>> > > > > > Adding a build-only job for 18.04 makes sense to me. A full
>>> test
>>> > run
>>> > > on
>>> > > > > > every precommit seems a bit expensive but doing one once a
>>> week or
>>> > > > > > something like that might be a good idea to prevent runtime
>>> > > > regressions.
>>> > > > > >
>>> > > > > > As for switching the precommit from 16.04 to 18.04, I'd lean
>>> > towards
>>> > > > > > keeping to 16.04 due to it being closer in terms of component
>>> > > versions
>>> > > > to
>>> > > > > > common enterprise distros like RHEL 7.
>>> > > > > >
>>> > > > > > -Todd
>>> > > > > >
>>> > > > > > On Sun, May 19, 2019 at 5:03 PM Jim Apple <jbap...@apache.org>
>>> > > wrote:
>>> > > > > >
>>> > > > > > > HEAD now passes on Ubuntu 18.04:
>>> > > > > > >
>>> > > > > > > https://jenkins.impala.io/job/ubuntu-18.04-from-scratch/
>>> > > > > > >
>>> > > > > > > Thanks to the community members who have made this happen!
>>> > > > > > >
>>> > > > > > > Should we add Ubuntu 18.04 to our pre-merge Jenkins job,
>>> replace
>>> > > > 16.04
>>> > > > > > with
>>> > > > > > > 18.04 in our pre-merge Jenkins job, or neither?
>>> > > > > > >
>>> > > > > > > I propose adding 18.04 for now (ans so running both 16.04 and
>>> > 18.04
>>> > > > on
>>> > > > > > > merge) and removing 16.04 when it starts to become
>>> inconvenient.
>>> > > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > --
>>> > > > > > Todd Lipcon
>>> > > > > > Software Engineer, Cloudera
>>> > > > > >
>>> > > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > Todd Lipcon
>>> > > > Software Engineer, Cloudera
>>> > > >
>>> > >
>>> >
>>> >
>>> > --
>>> > Todd Lipcon
>>> > Software Engineer, Cloudera
>>> >
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to