Hi Stamatis and Sungwoo,

Agree with several points. Hive has millions of LOC which is here and will
be with us in the same way, it is not a question. But we need to think
about the future of the project. There are no engineers in the world who
want to use old and legacy technologies, every engineer wants to use cool
staff where He/She can learn new stuff, patterns, designs. If we do not
improve on our codebase that will be a legacy zombieland, which won't be
touched by love and passion. *(Oh what a management bullshit - you can tell
:) )* But I truly think that if we introduce new principals it could give
us speed, motivation, and power to continue the innovation. As an engineer
I always want to use a modern approach, because this gives me more
excitement, I think that introducing a DI for this type of project is hard,
challenging and gives excitement. I want to live in a world where Hive is
the leader of the new principals, stable and easy to use, also the
on-boarding experience would be much much faster and easier.

I don't wanna live in a world <https://coub.com/view/34gga3>

As you wrote, the DI is powerful, and the hive does not contain it because
it became more widely used after the hive has started. If we / you
introduce it, it does not mean we have to refactor every module with DI.
But we can try to identify some components where we would introduce it,
also we could create a docs for others on how to use and implement it.
Maybe just 1-2 components, others will come later as we touch it, if it
does make sense. We won't remove every static utils class, because it would
not make sense, but with baby steps we could try to introduce, and for new
development we could introduce a loosely coupled standard, where every
dependency is more lightweight and also it would be easier to test these
components. (Which -could-  improves the quality as well)


#2 The quality of the 3.1.x vs 4.0.x is a bit different topic. I don't
think it has too many connections to the DI, but I think we should talk
about the root causes on different threads. You had several good points. We
- ALL - of us should be more careful about this type of issue. It was the
same in the past, especially when the hive 3 introduced there were several
similar issues. When new groundbreaking changes come to the repository it
could happen. Also I think the 4.0.0 alpha describes it as something that
is not solid stone. But anyhow you are right we have to be more careful!
But let's start a different thread about it


-Attila

On Wed, Apr 12, 2023 at 5:07 PM Sungwoo Park <glap...@gmail.com> wrote:

> Hello,
>
> I am not a committer, but I would like to add my opinion. At this stage of
> development, I think it is quite risky to switch to a DI framework for a
> couple of reasons.
>
> 1. A DI framework would have been a powerful tool if it had been
> incorporated into the project from the early stage. Now, however, Hive has
> way over 1 million lines of code and tens of thousands test cases, and my
> guess is that the overhead associated with introducing DI into Hive
> (whether gradually or globally at once) is very likely to outweigh the
> additional benefit, if any, of introducing DI, especially if we consider
> the stability of its development infrastructure.
>
> 2. Implementing new features, such as DI, in Hive can be an exciting
> sub-project and fun, but I think more pressing issues are to stabilize the
> current Hive code, although this is certainly less motivating and more
> boring. I hope that no new major features, such as DI, will be introduced
> until Hive becomes, say, as stable as Hive 3.1.
>
> For 2, I can give a few examples to substantiate my claim.
>
> 1) For the past few years, several new techniques for query compilation
> have been introduced. Unfortunately they were buggy and Hive started to
> return wrong results, on the assumption that Hive 3.1.2 was working
> correctly. (Yes, Hive 3.1.2 also has correctness bugs, but when tested
> against TPC-DS, Hive 3.1.2 returned the same results as other frameworks,
> so it can be used as a basis for comparison.) From our own testing, Hive
> 4.0.0-SNAPSHOT returns wrong results on several queries in TPC-DS, and this
> should be a major setback for Hive. If interested, please see [1] and [2].
>
> 2) Perhaps due to the same reason as in 1), Hive 4.0.0-SNAPSHOT is
> noticeably slower than Hive 3.1.2 on the TPC-DS benchmark. However, this is
> only from my own testing (using 10TB TPC-DS), and I hope that someone in
> the Hive team will try similar experiments to confirm/refute my claim.
>
> 3) Currently many q tests are run against MapReduce (which is not
> officially supported as far as I remember). However, some of these q tests
> fail when run against Tez. If Tez and LLAP are the new execution engines,
> these tests should be migrated as well.
>
> Sungwoo Park
>
> [1] https://issues.apache.org/jira/browse/HIVE-26654
> [2] https://issues.apache.org/jira/browse/HIVE-27226
>
> On Wed, Apr 12, 2023 at 10:12 PM Stamatis Zampetakis <zabe...@gmail.com>
> wrote:
>
> > Hey Laszlo,
> >
> > Dependency injection is a very powerful and useful tool/design pattern.
> >
> > I don't think there is a particular reason for which Hive does not use
> > DI framework apart maybe from the fact that we have lots of legacy
> > code that existed before DI became that popular.
> >
> > I am open to ideas and suggestions about parts of the code that we
> > could improve via DI. I would probably avoid big refactorings to core
> > components of Hive for the sake of introducing a DI framework but I
> > see no big issue using such frameworks in new code. As usual when we
> > are about to introduce a new dependency to the project we should be
> > mindful of all the implications that this might have.
> >
> > It's hard to make a generally applicable claim that we should use this
> > or that framework since I guess it has to do a lot with personal
> > preferences; we tend to prefer things that we have already used. I
> > haven't used DI frameworks that much so don't have a strong opinion on
> > which framework is the best so I am willing to follow the majority.
> >
> > Best,
> > Stamatis
> >
> > On Tue, Apr 4, 2023 at 1:19 PM Laszlo Vegh <lv...@cloudera.com.invalid>
> > wrote:
> > >
> > >
> > > Hi all,
> > >
> > > I would like to start a conversation about introducing some Dependency
> > Injection framework (like Spring, Guice, Weld, etc.) in Hive.
> > >
> > > IMHO the lack of such framework makes the codebase way less organised,
> > and harder to maintain. Moreover, I think it also lead to introducing a
> > huge amount of static/utility methods and classes (which is highly
> > discouraged when using DI frameworks). When there is no DI framework,
> > utility classes with static methods often seem to be the simplest and
> best
> > way to share code across different Hive components/classes, but these
> > constructs are really killing testability. For example it is much harder
> to
> > mock static method calls, than mocking service/component instances. Poor
> > testability is a major issue on its own, but having a DI framework could
> > have much more benefit, like greater flexibility (modularity), better
> > organised services, etc.
> > >
> > >
> > > I’m interested if there’s any reason why there is no DI in Hive so far.
> > I know there’s no way to introduce it everywhere in a single step, but we
> > could start using it where it is easy to start, and continuously expand
> its
> > usage from class to class. If there is no strong reason why no to do it,
> I
> > would like to start an open conversation around this topic. (Possible
> > benefits, drawbacks, which framework to use, where to introduce it first,
> > etc.)
> > >
> > > If anybody is interested in this initiative, please join the
> > conversation, and add your thoughts, ideas, doubts, anything.
> > >
> > > Thanks,
> > >
> > > Laszlo Vegh
> > > veghlac...@gmail.com <mailto:veghlac...@gmail.com>
> >
>

Reply via email to