Re: Time source for Kudu tests

2019-10-27 Thread Adar Lieber-Dembo
s it safe that 'system_unsync'
> > > uses CLOCK_REALTIME under the hood, which is affected by NTP and admin
> > > time changes?
> > >
> > > On Sun, Oct 6, 2019 at 11:27 PM Alexey Serbin
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > Recently, built-in NTP client was been introduced in Kudu.  Todd Lipcon
> > > > put together the original WIP patch a couple of years ago [1], and
> > about
> > > a
> > > > week
> > > > ago the built-in NTP client was merged into the master branch of the
> > Kudu
> > > > repo.
> > > > With that, a new time source is now available for Kudu masters and
> > tablet
> > > > servers: the built-in NTP client.
> > > >
> > > > With the introduction of the new time source for Kudu components, we
> > have
> > > > had a few offline discussions about using different time sources in
> > Kudu
> > > > tests.  That's not only about providing test coverage for the newly
> > > > introduced
> > > > built-in NTP client, but for all other tests as well.  Last week Adar
> > > and I
> > > > talked about that offline one more time.  I'll try to summarize few key
> > > > points in this e-mail message.
> > > >
> > > > With the introduction of the built-in NTP client, a significant part of
> > > > tests
> > > > was switched to run with it as the time source.  Particularly, all
> > tests
> > > > based on ExternalMiniCluster are now run with built-in NTP client with
> > > > commit
> > > > 03e2ada69 [2].  The idea is to have more coverage for the newly added
> > > > functionality, especially given the fact that at some point we might
> > > switch
> > > > to use the built-in NTP client by default instead of relying on the
> > local
> > > > machine clock synchronized by the system NTP service.
> > > >
> > > > There are many other Kudu tests (e.g., tests based on
> > > InternalMiniCluster)
> > > > which still require the machine clock to be synchronized with NTP to
> > run.
> > > > Ideally, it would be great to run all Kudu tests with using same time
> > > source
> > > > unless they are specifically targeted to verify functionality of a
> > > > particular
> > > > time source.  An example of the latter are tests which cover the
> > > > functionality
> > > > of the built-in NTP client itself.
> > > >
> > > > Prior to the introduction of the built-in NTP client, almost all Kudu
> > > tests
> > > > were running against the local system clock synchronized with NTP, at
> > > least
> > > > on Linux.  The only exception were tests for read-your-writes and other
> > > > consistency guarantees: those use mock time source to control the time
> > > > in a very specific way.
> > > >
> > > > In retrospective, it was great to have a single time source because we
> > > > always
> > > > knew that the same time source was used for almost all of our tests.
> > > Also,
> > > > the time source used in Kudu tests was the same as in Kudu masters and
> > > > tablet
> > > > servers running in real distributed clusters.  From the other side,
> > that
> > > > required local machine's clock to be synchronized with NTP to run those
> > > > tests,
> > > > and the tests would fail if the time was not synchronized with NTP.
> > > > In addition, as far as I know, there are no Kudu tests which are
> > targeted
> > > > to detect inconsistencies due to irregularities in the time source or
> > > > non-synchronized clocks between different Kudu nodes.  Of course, we
> > have
> > > > Jepsen-based tests, but that's another story: we are not talking about
> > > > Jepsen
> > > > tests here, only C++ and Java tests based on gtest and JUnit
> > frameworks.
> > > >
> > > > Now, here comes one observation: all the components of above mentioned
> > > tests
> > > > are running at the same machine during execution of corresponding
> > > scenarios.
> > > > If all of them are using the local machine's clock, then there is no
> > need
> > > > to use NTP or other synchronisation techniques designed to synchronize
> > > > multiple clocks.
> > > >
> > > > So, here comes the question: why to require the local clock to be
> > > > synchronized
> > > > with NTP for tests?  Yes, we need to verify that the 'system' time
> > source
> > > > works for HybridTime timestamps, but as for the functionality of a
> > > generic
> > > > Kudu test which is simply based on the HybridTime timestamps without
> > any
> > > > knowledge of the underlying time source, why is it important?
> > > >
> > > > It seems we can safely change the time source in those generic tests to
> > > be
> > > > 'system_unsync': local machine clock which is not necessarily
> > > synchronized
> > > > by
> > > > the kernel NTP discipline (see SystemUnsyncTime in
> > > > $KUDU_ROOT/src/kudu/clock/system_unsync_time.{h,cc} for details).
> > > >
> > > > Once switched to 'system_unsync' time source for tests, in addition it
> > > will
> > > > be necessary to add new dedicated tests scenarios such as:
> > > >   * Smoke scenarios: using the system NTP time source for Kudu
> > clusters.
> > > >   * Smoke scenarios: using the built-in NTP time source for Kudu
> > > clusters.
> > > >   * Advanced scenarios to detect issues due to irregularities in time
> > > source
> > > > (like time jumping back and forth, etc.).
> > > >
> > > > What do you think?  Your feedback is appreciated.
> > > >
> > > > Thank you!
> > > >
> > > >
> > > >
> > > > Kind regards,
> > > >
> > > > Alexey
> > > >
> > > > [1] http://gerrit.cloudera.org:8080/7477
> > > > [2]
> > > >
> > >
> > https://github.com/apache/kudu/commit/03e2ada694290cafce0bea6ebbf092709aa64a2a
> > >
> >
> >
> > --
> > Andrew Wong
> >


Re: Time source for Kudu tests

2019-10-25 Thread Alexey Serbin
he master branch of the
> Kudu
> > > repo.
> > > With that, a new time source is now available for Kudu masters and
> tablet
> > > servers: the built-in NTP client.
> > >
> > > With the introduction of the new time source for Kudu components, we
> have
> > > had a few offline discussions about using different time sources in
> Kudu
> > > tests.  That's not only about providing test coverage for the newly
> > > introduced
> > > built-in NTP client, but for all other tests as well.  Last week Adar
> > and I
> > > talked about that offline one more time.  I'll try to summarize few key
> > > points in this e-mail message.
> > >
> > > With the introduction of the built-in NTP client, a significant part of
> > > tests
> > > was switched to run with it as the time source.  Particularly, all
> tests
> > > based on ExternalMiniCluster are now run with built-in NTP client with
> > > commit
> > > 03e2ada69 [2].  The idea is to have more coverage for the newly added
> > > functionality, especially given the fact that at some point we might
> > switch
> > > to use the built-in NTP client by default instead of relying on the
> local
> > > machine clock synchronized by the system NTP service.
> > >
> > > There are many other Kudu tests (e.g., tests based on
> > InternalMiniCluster)
> > > which still require the machine clock to be synchronized with NTP to
> run.
> > > Ideally, it would be great to run all Kudu tests with using same time
> > source
> > > unless they are specifically targeted to verify functionality of a
> > > particular
> > > time source.  An example of the latter are tests which cover the
> > > functionality
> > > of the built-in NTP client itself.
> > >
> > > Prior to the introduction of the built-in NTP client, almost all Kudu
> > tests
> > > were running against the local system clock synchronized with NTP, at
> > least
> > > on Linux.  The only exception were tests for read-your-writes and other
> > > consistency guarantees: those use mock time source to control the time
> > > in a very specific way.
> > >
> > > In retrospective, it was great to have a single time source because we
> > > always
> > > knew that the same time source was used for almost all of our tests.
> > Also,
> > > the time source used in Kudu tests was the same as in Kudu masters and
> > > tablet
> > > servers running in real distributed clusters.  From the other side,
> that
> > > required local machine's clock to be synchronized with NTP to run those
> > > tests,
> > > and the tests would fail if the time was not synchronized with NTP.
> > > In addition, as far as I know, there are no Kudu tests which are
> targeted
> > > to detect inconsistencies due to irregularities in the time source or
> > > non-synchronized clocks between different Kudu nodes.  Of course, we
> have
> > > Jepsen-based tests, but that's another story: we are not talking about
> > > Jepsen
> > > tests here, only C++ and Java tests based on gtest and JUnit
> frameworks.
> > >
> > > Now, here comes one observation: all the components of above mentioned
> > tests
> > > are running at the same machine during execution of corresponding
> > scenarios.
> > > If all of them are using the local machine's clock, then there is no
> need
> > > to use NTP or other synchronisation techniques designed to synchronize
> > > multiple clocks.
> > >
> > > So, here comes the question: why to require the local clock to be
> > > synchronized
> > > with NTP for tests?  Yes, we need to verify that the 'system' time
> source
> > > works for HybridTime timestamps, but as for the functionality of a
> > generic
> > > Kudu test which is simply based on the HybridTime timestamps without
> any
> > > knowledge of the underlying time source, why is it important?
> > >
> > > It seems we can safely change the time source in those generic tests to
> > be
> > > 'system_unsync': local machine clock which is not necessarily
> > synchronized
> > > by
> > > the kernel NTP discipline (see SystemUnsyncTime in
> > > $KUDU_ROOT/src/kudu/clock/system_unsync_time.{h,cc} for details).
> > >
> > > Once switched to 'system_unsync' time source for tests, in addition it
> > will
> > > be necessary to add new dedicated tests scenarios such as:
> > >   * Smoke scenarios: using the system NTP time source for Kudu
> clusters.
> > >   * Smoke scenarios: using the built-in NTP time source for Kudu
> > clusters.
> > >   * Advanced scenarios to detect issues due to irregularities in time
> > source
> > > (like time jumping back and forth, etc.).
> > >
> > > What do you think?  Your feedback is appreciated.
> > >
> > > Thank you!
> > >
> > >
> > >
> > > Kind regards,
> > >
> > > Alexey
> > >
> > > [1] http://gerrit.cloudera.org:8080/7477
> > > [2]
> > >
> >
> https://github.com/apache/kudu/commit/03e2ada694290cafce0bea6ebbf092709aa64a2a
> >
>
>
> --
> Andrew Wong
>


Re: Time source for Kudu tests

2019-10-07 Thread Andrew Wong
y, it would be great to run all Kudu tests with using same time
> source
> > unless they are specifically targeted to verify functionality of a
> > particular
> > time source.  An example of the latter are tests which cover the
> > functionality
> > of the built-in NTP client itself.
> >
> > Prior to the introduction of the built-in NTP client, almost all Kudu
> tests
> > were running against the local system clock synchronized with NTP, at
> least
> > on Linux.  The only exception were tests for read-your-writes and other
> > consistency guarantees: those use mock time source to control the time
> > in a very specific way.
> >
> > In retrospective, it was great to have a single time source because we
> > always
> > knew that the same time source was used for almost all of our tests.
> Also,
> > the time source used in Kudu tests was the same as in Kudu masters and
> > tablet
> > servers running in real distributed clusters.  From the other side, that
> > required local machine's clock to be synchronized with NTP to run those
> > tests,
> > and the tests would fail if the time was not synchronized with NTP.
> > In addition, as far as I know, there are no Kudu tests which are targeted
> > to detect inconsistencies due to irregularities in the time source or
> > non-synchronized clocks between different Kudu nodes.  Of course, we have
> > Jepsen-based tests, but that's another story: we are not talking about
> > Jepsen
> > tests here, only C++ and Java tests based on gtest and JUnit frameworks.
> >
> > Now, here comes one observation: all the components of above mentioned
> tests
> > are running at the same machine during execution of corresponding
> scenarios.
> > If all of them are using the local machine's clock, then there is no need
> > to use NTP or other synchronisation techniques designed to synchronize
> > multiple clocks.
> >
> > So, here comes the question: why to require the local clock to be
> > synchronized
> > with NTP for tests?  Yes, we need to verify that the 'system' time source
> > works for HybridTime timestamps, but as for the functionality of a
> generic
> > Kudu test which is simply based on the HybridTime timestamps without any
> > knowledge of the underlying time source, why is it important?
> >
> > It seems we can safely change the time source in those generic tests to
> be
> > 'system_unsync': local machine clock which is not necessarily
> synchronized
> > by
> > the kernel NTP discipline (see SystemUnsyncTime in
> > $KUDU_ROOT/src/kudu/clock/system_unsync_time.{h,cc} for details).
> >
> > Once switched to 'system_unsync' time source for tests, in addition it
> will
> > be necessary to add new dedicated tests scenarios such as:
> >   * Smoke scenarios: using the system NTP time source for Kudu clusters.
> >   * Smoke scenarios: using the built-in NTP time source for Kudu
> clusters.
> >   * Advanced scenarios to detect issues due to irregularities in time
> source
> > (like time jumping back and forth, etc.).
> >
> > What do you think?  Your feedback is appreciated.
> >
> > Thank you!
> >
> >
> >
> > Kind regards,
> >
> > Alexey
> >
> > [1] http://gerrit.cloudera.org:8080/7477
> > [2]
> >
> https://github.com/apache/kudu/commit/03e2ada694290cafce0bea6ebbf092709aa64a2a
>


-- 
Andrew Wong


Re: Time source for Kudu tests

2019-10-07 Thread Adar Lieber-Dembo
Thanks for the detailed summary.

I agree with the goal of using the same time source across all tests
if possible. Originally, I was hoping this would be 'builtin', but
that'd require deploying MiniChronyd too, both to avoid a dependency
on Internet connectivity and to ensure that clock synchronization at
test startup is sufficiently fast. And, like you said, not only is
that a good chunk of work, it's also somewhat antithetical to those
pure "in-proc" test setups.

So I reluctantly agree that 'system_unsync' may be the next best
option, with targeted uses of 'system' and 'builtin' for testing
specific functionality. That said, is it safe that 'system_unsync'
uses CLOCK_REALTIME under the hood, which is affected by NTP and admin
time changes?

On Sun, Oct 6, 2019 at 11:27 PM Alexey Serbin
 wrote:
>
> Hi,
>
> Recently, built-in NTP client was been introduced in Kudu.  Todd Lipcon
> put together the original WIP patch a couple of years ago [1], and about a
> week
> ago the built-in NTP client was merged into the master branch of the Kudu
> repo.
> With that, a new time source is now available for Kudu masters and tablet
> servers: the built-in NTP client.
>
> With the introduction of the new time source for Kudu components, we have
> had a few offline discussions about using different time sources in Kudu
> tests.  That's not only about providing test coverage for the newly
> introduced
> built-in NTP client, but for all other tests as well.  Last week Adar and I
> talked about that offline one more time.  I'll try to summarize few key
> points in this e-mail message.
>
> With the introduction of the built-in NTP client, a significant part of
> tests
> was switched to run with it as the time source.  Particularly, all tests
> based on ExternalMiniCluster are now run with built-in NTP client with
> commit
> 03e2ada69 [2].  The idea is to have more coverage for the newly added
> functionality, especially given the fact that at some point we might switch
> to use the built-in NTP client by default instead of relying on the local
> machine clock synchronized by the system NTP service.
>
> There are many other Kudu tests (e.g., tests based on InternalMiniCluster)
> which still require the machine clock to be synchronized with NTP to run.
> Ideally, it would be great to run all Kudu tests with using same time source
> unless they are specifically targeted to verify functionality of a
> particular
> time source.  An example of the latter are tests which cover the
> functionality
> of the built-in NTP client itself.
>
> Prior to the introduction of the built-in NTP client, almost all Kudu tests
> were running against the local system clock synchronized with NTP, at least
> on Linux.  The only exception were tests for read-your-writes and other
> consistency guarantees: those use mock time source to control the time
> in a very specific way.
>
> In retrospective, it was great to have a single time source because we
> always
> knew that the same time source was used for almost all of our tests.  Also,
> the time source used in Kudu tests was the same as in Kudu masters and
> tablet
> servers running in real distributed clusters.  From the other side, that
> required local machine's clock to be synchronized with NTP to run those
> tests,
> and the tests would fail if the time was not synchronized with NTP.
> In addition, as far as I know, there are no Kudu tests which are targeted
> to detect inconsistencies due to irregularities in the time source or
> non-synchronized clocks between different Kudu nodes.  Of course, we have
> Jepsen-based tests, but that's another story: we are not talking about
> Jepsen
> tests here, only C++ and Java tests based on gtest and JUnit frameworks.
>
> Now, here comes one observation: all the components of above mentioned tests
> are running at the same machine during execution of corresponding scenarios.
> If all of them are using the local machine's clock, then there is no need
> to use NTP or other synchronisation techniques designed to synchronize
> multiple clocks.
>
> So, here comes the question: why to require the local clock to be
> synchronized
> with NTP for tests?  Yes, we need to verify that the 'system' time source
> works for HybridTime timestamps, but as for the functionality of a generic
> Kudu test which is simply based on the HybridTime timestamps without any
> knowledge of the underlying time source, why is it important?
>
> It seems we can safely change the time source in those generic tests to be
> 'system_unsync': local machine clock which is not necessarily synchronized
> by
> the kernel NTP discipline (see SystemUnsyncTime in
> $KUDU_ROOT/src/kudu/clock/system_unsync_time.{h,cc} for details).
>
> Once switched to 's

Time source for Kudu tests

2019-10-07 Thread Alexey Serbin
Hi,

Recently, built-in NTP client was been introduced in Kudu.  Todd Lipcon
put together the original WIP patch a couple of years ago [1], and about a
week
ago the built-in NTP client was merged into the master branch of the Kudu
repo.
With that, a new time source is now available for Kudu masters and tablet
servers: the built-in NTP client.

With the introduction of the new time source for Kudu components, we have
had a few offline discussions about using different time sources in Kudu
tests.  That's not only about providing test coverage for the newly
introduced
built-in NTP client, but for all other tests as well.  Last week Adar and I
talked about that offline one more time.  I'll try to summarize few key
points in this e-mail message.

With the introduction of the built-in NTP client, a significant part of
tests
was switched to run with it as the time source.  Particularly, all tests
based on ExternalMiniCluster are now run with built-in NTP client with
commit
03e2ada69 [2].  The idea is to have more coverage for the newly added
functionality, especially given the fact that at some point we might switch
to use the built-in NTP client by default instead of relying on the local
machine clock synchronized by the system NTP service.

There are many other Kudu tests (e.g., tests based on InternalMiniCluster)
which still require the machine clock to be synchronized with NTP to run.
Ideally, it would be great to run all Kudu tests with using same time source
unless they are specifically targeted to verify functionality of a
particular
time source.  An example of the latter are tests which cover the
functionality
of the built-in NTP client itself.

Prior to the introduction of the built-in NTP client, almost all Kudu tests
were running against the local system clock synchronized with NTP, at least
on Linux.  The only exception were tests for read-your-writes and other
consistency guarantees: those use mock time source to control the time
in a very specific way.

In retrospective, it was great to have a single time source because we
always
knew that the same time source was used for almost all of our tests.  Also,
the time source used in Kudu tests was the same as in Kudu masters and
tablet
servers running in real distributed clusters.  From the other side, that
required local machine's clock to be synchronized with NTP to run those
tests,
and the tests would fail if the time was not synchronized with NTP.
In addition, as far as I know, there are no Kudu tests which are targeted
to detect inconsistencies due to irregularities in the time source or
non-synchronized clocks between different Kudu nodes.  Of course, we have
Jepsen-based tests, but that's another story: we are not talking about
Jepsen
tests here, only C++ and Java tests based on gtest and JUnit frameworks.

Now, here comes one observation: all the components of above mentioned tests
are running at the same machine during execution of corresponding scenarios.
If all of them are using the local machine's clock, then there is no need
to use NTP or other synchronisation techniques designed to synchronize
multiple clocks.

So, here comes the question: why to require the local clock to be
synchronized
with NTP for tests?  Yes, we need to verify that the 'system' time source
works for HybridTime timestamps, but as for the functionality of a generic
Kudu test which is simply based on the HybridTime timestamps without any
knowledge of the underlying time source, why is it important?

It seems we can safely change the time source in those generic tests to be
'system_unsync': local machine clock which is not necessarily synchronized
by
the kernel NTP discipline (see SystemUnsyncTime in
$KUDU_ROOT/src/kudu/clock/system_unsync_time.{h,cc} for details).

Once switched to 'system_unsync' time source for tests, in addition it will
be necessary to add new dedicated tests scenarios such as:
  * Smoke scenarios: using the system NTP time source for Kudu clusters.
  * Smoke scenarios: using the built-in NTP time source for Kudu clusters.
  * Advanced scenarios to detect issues due to irregularities in time source
(like time jumping back and forth, etc.).

What do you think?  Your feedback is appreciated.

Thank you!



Kind regards,

Alexey

[1] http://gerrit.cloudera.org:8080/7477
[2]
https://github.com/apache/kudu/commit/03e2ada694290cafce0bea6ebbf092709aa64a2a