Re: [DISCUSS][Format][C++] Improvement of sparse tensor format and implementation

Wes McKinney Tue, 27 Aug 2019 16:19:06 -0700

On Tue, Aug 27, 2019 at 6:07 PM Neal Richardson
<neal.p.richard...@gmail.com> wrote:
>
> Forgive me if this is off topic; I haven't been following this closely
> and I haven't used scipy.sparse. But there are some very reasonable
> cases where you might want to fill sparse data with a value other than
> 0:
>
> * The sparseness is missing data, and 0 is not the same as NA
> * Better compression: figure out which value is most common in the
> data and make that the default that gets filled. E.g. how many fingers
> a person has.
>

Definitely. I am the original author of pandas's Sparse* family of
types, and they were created for the case where the data is mostly
null/NA. But, as far as I'm aware, this component of pandas is
relatively unique and was never intended as an alternatives to sparse
matrix libraries.

It seems like the sparse-with-fill-value might be better discussed on
Micah's thread regarding Array compression and encoding.

> Neal
>
> On Tue, Aug 27, 2019 at 3:46 PM Rok Mihevc <rok.mih...@gmail.com> wrote:
> >
> > On Tue, Aug 27, 2019 at 11:05 PM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > > I don't think this has been discussed. I think the SparseTensor
> > > discussions have been intended to reach compatibility with "sparse
> > > matrix" projects like scipy.sparse. pandas's "SparseArray" objects are
> > > a distinct thing -- I don't know many examples of sparse matrices with
> > > fill values other than 0
> >
> >
> > The reason for implementing fill_value would be the case where user wants
> > another value to be '0' and it's practical for the SparseTensor object to
> > keep that value for them. I am not sure how common would such a case be and
> > since scipy.sparse is time tested I'd agree with compatibility as the
> > current goal.

Re: [DISCUSS][Format][C++] Improvement of sparse tensor format and implementation

Reply via email to