Hi Jacques and Micah,

Thanks a lot for your valuable feedback.

The purpose of this change;
1. reduce the amount of code (by referencing implementations in the super
class)
2. remove duplicated code by centralizing the code for get/set integers

I agree that it violates the principle of accessing single value from
sub-classes.
However, it seems we sometimes breaks some principles in favor of others
for better code structure.
For example, the set/get int logics for TimestampXXVectors are based on
implementations in super class TimestampVector.

I can understand that there is a good reason for the current class
hierarchy.
This is just an tentative proposal. The PR is only to make discussions
easier.

Best,
Liya Fan

On Thu, Aug 29, 2019 at 12:08 PM Jacques Nadeau <jacq...@apache.org> wrote:

> On Wed, Aug 28, 2019, 7:42 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > Hi Jacques,
> >
> >
> > > What problem are you trying to solve? It seems like you're proposing
> > > refactoring for refactoring's sake.
> >
> > I think Liya Fan is trying to eliminate duplicate code, which as long as
> it
> > doesn't violate design principles seems like a good thing.
> >
>
> I both agree and disagree. Change always introduce risk. With highly used
> code, the bar for cosmetic change needs to be higher. If the maintenance
> burden is high than it can make more sense. In this case, this seems like
> cosmetic change on something that changes rarely and thus has a low burden
> of maintebabce. Add to this that we just refactored all of this code, among
> other things trying hard to minimize complex class hierarchies. During that
> time we tried several different class hierarchies before settling on this
> one.
>
>
>
> > From the PR [1]:
> >
> > >  One of the fundamentals of vector design is single value level access
> > > doesn't go through subclasses. This change violates that design
> criteria.
> >
> > Is there a design document someplace that enumerates the design
> > fundamentals someplace and the rationale for them?  I think it would help
> > maintainers and contributors alike to understand the invariants the code
> is
> > supposed to adhere to and why.
>
>
> Yeah, this is an excellent point. When these things are in someone's head
> (mine in this case), it can feel very arbtrirary to others. I think a bunch
> of this was covered extensively on the list and in the reviews during the
> major refractor done last year. I'll also see if I can dig up some more
> background.
>
>
>
> > Thanks,
> > Micah
> >
> > [1] https://github.com/apache/arrow/pull/5213
> >
> >
> > On Wed, Aug 28, 2019 at 12:11 PM Jacques Nadeau <jacq...@apache.org>
> > wrote:
> >
> > > What problem are you trying to solve? It seems like you're proposing
> > > refactoring for refactoring's sake.
> > >
> > > On Mon, Aug 26, 2019, 7:13 PM Fan Liya <liya.fa...@gmail.com> wrote:
> > >
> > > > Hi Jacques,
> > > >
> > > > Thanks for the valuable feedback.
> > > >
> > > > I agree that it is a good idea to explicitly make field vectors
> final.
> > > >
> > > > However, I think there are still lots of duplicated code for
> > > > getInt/getLong/setInt/setLong/setIntSafe/setLongSafe. In addition, we
> > > have
> > > > a large number of time related vector classes.
> > > >
> > > > So after making field vectors final, maybe we can:
> > > > 1. Extract static utility methods for get/set int/long methods.
> > > > 2. Push the get/set int/long methods to BaseFixedWidthVector base
> > class.
> > > >
> > > > What do you think?
> > > >
> > > > Best,
> > > > Liya Fan
> > > >
> > > > On Tue, Aug 27, 2019 at 7:16 AM Jacques Nadeau <jacq...@apache.org>
> > > wrote:
> > > >
> > > > > I think you'd have to refractor differently since one of the
> patterns
> > > is
> > > > > get(index, object holder) is consistently matched. That means you
> > > should
> > > > > never subclass a field vector I believe. In fact we should probably
> > > just
> > > > > formally finalize them.
> > > > >
> > > > > Once you do that, I'm not sure how much duplicated code you're
> > actually
> > > > > removing.
> > > > >
> > > > > On Mon, Aug 26, 2019, 6:13 PM Fan Liya <liya.fa...@gmail.com>
> wrote:
> > > > >
> > > > > > Dear all,
> > > > > >
> > > > > > Currently, we have two classes of time related vectors. One class
> > are
> > > > > named
> > > > > > TimeXXVector, while the other class are named TimeStampXXVector.
> We
> > > > found
> > > > > > there are some redundant code for these classes, so we want to do
> > > some
> > > > > > refactorings for them.
> > > > > >
> > > > > > 1. For TimeXXVector, all classes extend BaseFixedWidthVector.
> > > > Internally,
> > > > > > they are just IntVector or BigIntVector. The only differences
> are:
> > > > > >
> > > > > > 1) minor type
> > > > > > 2) the result of getObject method.
> > > > > >
> > > > > > Proposed way of refactoring: we make them extend IntVector or
> > > > > BigIntVector,
> > > > > > so much get/set code for int/big int can be reused.
> > > > > >
> > > > > > 2. For TimeStampXXVector, all classes extend TimeStampVector.
> > > > Internally,
> > > > > > they are BigIntVectors.
> > > > > >
> > > > > > Proposed way of refactoring: we remove the TimeStampVector, and
> > make
> > > > all
> > > > > > classes extend BigIntVector, so much of the get/set methods can
> be
> > > > reused
> > > > > > from super class.
> > > > > >
> > > > > > Please give your valuable feedback.
> > > > > >
> > > > > > Best,
> > > > > > Liya Fan
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to