Sure. PFA the JIRA
https://issues.apache.org/jira/browse/ARROW-12554

On Mon, Apr 26, 2021 at 4:31 PM Wes McKinney <wesmck...@gmail.com> wrote:

> In principle I don't see an issue with having duplicates in the value set,
> could you open a Jira issue?
>
> On Mon, Apr 26, 2021 at 3:27 PM Niranda Perera <niranda.per...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > In the arrow release-4.0.0 branch, the compute::is_in operation rejects
> > duplicate values in the value_set [1]. This was not the case in arrow 2.0
> > >=.
> >
> > I was wondering if this strict restriction is required? Because
> ultimately,
> > a hash set would be created from the value_set values, and there's no
> harm
> > in having duplicates while doing so, isn't it?
> > PS: I understand that the param name "value_set" indicates that the
> values
> > need to be unique, but in the useability perspective, this can be relaxed
> > IMO. ex: Pandas isin [2].
> >
> > Would like to know your thoughts on this?
> >
> > Best
> >
> > [1]
> >
> >
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_set_lookup.cc#L53
> > [2]
> > https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html
> > --
> > Niranda Perera
> > https://niranda.dev/
> > @n1r44 <https://twitter.com/N1R44>
> >
>


-- 
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>

Reply via email to