On Thu, Jun 4, 2020 at 11:09 AM Rémi Dettai <rdet...@gmail.com> wrote:
>
> It makes sense to me that the default behaviour of such a low level api as
> kernel does not do any automagic promotion, but shouldn't this kind of
> promotion still be requestable by the so called "system developer" user ?
> Otherwise he would need to materialize a promoted version of each original
> array before the kernel operation, wouldn't he ?

I assume yes, for 1 million elements:

Promotion inside kernel:
u32 + u32 = u64
4MB + 4MB -> 8MB
New allocation: 8MB

Promotion outside kernel:
(u32 -> u64) + (u32 -> u64) = u64
(4MB -> 8MB) + (4MB -> 8MB) -> 8MB
New allocation: 24MB

>
> Le mer. 3 juin 2020 à 18:27, Wes McKinney <wesmck...@gmail.com> a écrit :
>
> > On Wed, Jun 3, 2020 at 11:25 AM Krisztián Szűcs
> > <szucs.kriszt...@gmail.com> wrote:
> > >
> > > On Wed, Jun 3, 2020 at 6:16 PM Krisztián Szűcs
> > > <szucs.kriszt...@gmail.com> wrote:
> > > >
> > > > On Wed, Jun 3, 2020 at 5:52 PM Wes McKinney <wesmck...@gmail.com>
> > wrote:
> > > > >
> > > > > On Wed, Jun 3, 2020 at 10:49 AM Krisztián Szűcs
> > > > > <szucs.kriszt...@gmail.com> wrote:
> > > > > >
> > > > > > From the user perspective I find the following pretty confusing:
> > > > > >
> > > > > > In [1]: np.array([-128, 127], dtype=np.int8()) * 2
> > > > > > Out[1]: array([ 0, -2], dtype=int8)
> > > > > >
> > > > > > In [2]: np.array([-128, 127], dtype=np.int16()) * 2
> > > > > > Out[2]: array([-256,  254], dtype=int16)
> > > > > >
> > > > > > In my opinion somewhere (on a higher level maybe) we should provide
> > > > > > the correct results promoted to a wider type implicitly.
> > > > >
> > > > > Yes, I agree with you, but I agree that the best place to address
> > this
> > > > > is at a higher level rather than having this logic implemented at the
> > > > > lowest level (kernels) -- I think database systems handle this during
> > > > > logical->physical planning.
> > > >
> > > > It raises another question: where to incorporate the implicit
> > promotions?
> > > > // correct me if I'm wrong but these implicit promotions are operation
> > > > // dependent and distinct from kernel dispatching issue [1]
> > > >
> > > > The numpy example above can be roughly translated to:
> > > > >>> a = pa.array([-128, 127])
> > > > >>> pa.compute.add(a, a)
> > > > array([ 0, -2]
> > > >
> > > > Which is rather surprising from the user's perspective.
> > >
> > > Would it be enough to document the exact behavior  and advice the user
> > > to place casts until we have logical -> phisycal machinery?
> >
> > I think it's enough to clearly document the behavior and assume that
> > the "user" will act according to what semantics are desired for their
> > use cases. Per my comments in my last e-mail I don't think the users
> > of these functions need to be handled with "kid's gloves".
> >
> > > I'm updating my PR as discussed.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/ARROW-8919
> > > > >
> > > > > > Clickhouse for example does the type promotion.
> > > > > >
> > > > > > On Wed, Jun 3, 2020 at 5:29 PM Antoine Pitrou <solip...@pitrou.net>
> > wrote:
> > > > > > >
> > > > > > > On Wed, 3 Jun 2020 10:47:38 -0400
> > > > > > > Ben Kietzman <ben.kietz...@rstudio.com> wrote:
> > > > > > > >
> > https://github.com/apache/arrow/pull/7341#issuecomment-638241193
> > > > > > > >
> > > > > > > > How should arithmetic kernels handle integer overflow?
> > > > > > > >
> > > > > > > > The approach currently taken in the linked PR is to promote
> > such that
> > > > > > > > overflow will not occur, for example `(int8, int8)->int16` and
> > `(uint16,
> > > > > > > > uint16)->uint32`.
> > > > > > > >
> > > > > > > > I'm not sure that's desirable. For one thing this leads to
> > inconsistent
> > > > > > > > handling of 64 bit integer types, which are currently allowed
> > to overflow
> > > > > > > > since we cannot promote further (NB: that means this kernel
> > includes
> > > > > > > > undefined behavior for int64).
> > > > > > >
> > > > > > > I agree with you.  I would strongly advise against implicit
> > promotion
> > > > > > > accross arithmetic operations.  We initially did that in Numba
> > and it
> > > > > > > quickly became a can of worms.
> > > > > > >
> > > > > > > The most desirable behaviour IMHO is to keep the original type,
> > so:
> > > > > > > - (int8, int8) -> int8
> > > > > > > - (uint16, uint16) -> uint16
> > > > > > >
> > > > > > > Then the question is what happens when the actual overflow
> > occurs.  I
> > > > > > > think this should be directed by a kernel option.  By default an
> > error
> > > > > > > should probably be raised (letting errors pass and silently
> > produce
> > > > > > > erroneous data is wrong), but we might want to allow people to
> > bypass
> > > > > > > overflow checks for speed.
> > > > > > >
> > > > > > > Note that even if overflow detection is enabled, it *should* be
> > possible
> > > > > > > to enable vectorization, e.g. by making overflow detection a
> > separate
> > > > > > > pass (itself vectorizable).
> > > > > > >
> > > > > > > Regards
> > > > > > >
> > > > > > > Antoine.
> > > > > > >
> > > > > > >
> >

Reply via email to