I think such nightly builds will be useful for testing and debugging in the
future.

I also wonder if we can somehow create builds even from previous commits
(e.g., for the past few years). Such builds from previous commits don't
have to be daily builds, and I think weekly builds (or even monthly builds)
would also be very useful.

The reason I wish such builds were available is to facilitate debugging and
testing. When tested against the TPC-DS benchmark, the current master
branch has several correctness problems that were introduced after the
release of Hive 3.1.2. We have reported all problems known to us in [1] and
also submitted several patches. If such nightly builds had been available,
we would have saved quite a bit of time for implementing the patches by
quickly finding offending commits that introduced new correctness bugs.

In addition, you can find quite a few commits in the master branch that
report bugs which are not reproduced in Hive 3.1.2. Examples: HIVE-19990,
HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
HIVE-22227, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
HIVE-25170, HIVE-25864, HIVE-26671.
(There may be some errors in this list because we compared against Hive
3.1.2 with many patches backported.) Such nightly builds can be useful for
finding root causes of such bugs.

Ideally I wish there was an automated procedure to create nightly builds,
run TPC-DS benchmark, and report correctness/performance results, although
this would be quite hard to implement. (I remember Spark implemented this
procedure in the era of Spark 2, but my memory could be wrong.)

[1] https://issues.apache.org/jira/browse/HIVE-26654


On Tue, May 23, 2023 at 10:44 AM Ayush Saxena <ayush...@gmail.com> wrote:

> Hi Vihang,
> +1, We were even exploring publishing the docker images of the snapshot
> version as well per commit or maybe weekly, so just shoot 2 docker commands
> and you get a Hive cluster running with master code.
>
> Sai, I think to spin up an env via Docker with all these things should be
> doable for sure, but would require someone with real good expertise with
> docker as well as setting up these services with Hive. Obviously, I am not
> that guy :-)
>
> @Simhadri has a PR which publishes docker images once a release tag is
> pushed, you can explore to have similar stuff for the Snapshot version,
> maybe if that sounds cool
>
> -Ayush
>
> On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
> <saihema...@cloudera.com.invalid> wrote:
>
> > Hi Vihang,
> >
> > +1 on the idea.
> >
> > This is a great idea to quickly test if a certain feature is working as
> > expected on a certain branch.
> > This way we test data loss, correctness, or any other unexpected
> scenarios
> > that are Hive specific only. However, I'm wondering if it is possible to
> > deploy/test in a kerberized environment or issues involving authorization
> > services like sentry/ranger.
> >
> > Thanks,
> > Sai.
> >
> > On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <
> vihan...@apache.org>
> > wrote:
> >
> > > Hello Team,
> > >
> > > I have observed that it is a common use-case where users would like to
> > test
> > > out unreleased features/bug fixes either to unblock them or test out if
> > the
> > > bug fixes really work as intended in their environments. Today in the
> > case
> > > of Apache Hive, this is not very user friendly because it requires the
> > end
> > > user to build the binaries directly from the hive source code.
> > >
> > > I found that Apache Spark has a very useful infrastructure [1] which
> > > deploys nightly snapshots [2] [3] from the branch using github actions.
> > > This is super useful for any user who wants to try out the latest and
> > > greatest using the nightly builds.
> > >
> > > I was wondering if we should also adopt this. We can use github actions
> > to
> > > upload the snapshot jars to the public repository (e.g github packages)
> > and
> > > schedule it as a nightly job.
> > >
> > > [1] https://issues.apache.org/jira/browse/INFRA-21167
> > > [2]
> https://github.com/apache/spark/pkgs/container/apache-spark-ci-image
> > > [3] https://github.com/apache/spark/pull/30623
> > >
> > > I can take a stab at this if the community thinks that this is a nice
> > thing
> > > to have.
> > >
> > > Thanks,
> > > Vihang
> > >
> >
>

Reply via email to