Hi Christoph, the thing with the current implementation of the SparseVector is that you can only modify entries which are “non-zero”. All other entries are not represented in the underlying data structures. This means that you have to create a new SparseVector if you want to set a zero entry to non-zero. If the user specifies non-zero entries, then he might modify these entries later on. Therefore, we have implemented the SparseVector initialization in such a way that elements which add up to *0* are explicitly represented and thus modifiable. I agree that this might not be intuitive and maybe the other way around, meaning filtering out these 0 values might be be better.
I’m not so sure whether it makes sense to initialize a SparseVector from an array of values. My gut feeling is that you would use an array to represent a DenseVector because you have to specify for each index a value. If you have only few non-zero entries, then a different data structure, e.g. a set of pairs (index, value), seems to be more efficient to me. But adding such a initialization method is not a big deal. What kind of use case do you have in mind? Cheers, Till On Fri, May 8, 2015 at 3:20 PM, Christoph Alt <christoph....@posteo.de> wrote: > Hi, > > Felix and I are currently working on the implementation of the > FeatureHasher (Issue #1735), which in the end returns a SparseVector. > > When using “SparseVector.fromCOO" I’m facing some odd behaviour I haven’t > expected. > > Assume I create a SparseVector.fromCOO(numFeatures, Map((0, 1.0), (1, > 1.0), (1, -1.0))), this returns a SparseVector((0, 1.0), *(1, 0.0)*). > I would have expected that after summing up the values of similar indices, > an index with a resulting value of 0.0 would be dropped during the creation > of a SparseVector. > Is this the expected behaviour or does this need to be fixed? > > Furthermore, are there any plans to extend the SparseVector implementation > by a SparseVector.fromArray(), which takes an array like Array(0.0, 1.0, > 2.0, 0.0, 3.2) as parameter and creates a SparseVector((1, 1.0), (2, 2.0), > (4, 3.2)) of array.length while only keeping non-zero entries? > > Best, > Christoph >