Sure. PFA the JIRA https://issues.apache.org/jira/browse/ARROW-12554
On Mon, Apr 26, 2021 at 4:31 PM Wes McKinney <wesmck...@gmail.com> wrote: > In principle I don't see an issue with having duplicates in the value set, > could you open a Jira issue? > > On Mon, Apr 26, 2021 at 3:27 PM Niranda Perera <niranda.per...@gmail.com> > wrote: > > > Hi all, > > > > In the arrow release-4.0.0 branch, the compute::is_in operation rejects > > duplicate values in the value_set [1]. This was not the case in arrow 2.0 > > >=. > > > > I was wondering if this strict restriction is required? Because > ultimately, > > a hash set would be created from the value_set values, and there's no > harm > > in having duplicates while doing so, isn't it? > > PS: I understand that the param name "value_set" indicates that the > values > > need to be unique, but in the useability perspective, this can be relaxed > > IMO. ex: Pandas isin [2]. > > > > Would like to know your thoughts on this? > > > > Best > > > > [1] > > > > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_set_lookup.cc#L53 > > [2] > > https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html > > -- > > Niranda Perera > > https://niranda.dev/ > > @n1r44 <https://twitter.com/N1R44> > > > -- Niranda Perera https://niranda.dev/ @n1r44 <https://twitter.com/N1R44>