Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Ted Dunning Wed, 17 Aug 2011 11:46:51 -0700

I would think neither.

I think it should mean that the sparseIterator will be enough faster than
the dense iterator to make it worth using.  The indication doesn't have to
be be crisp or even always quite correct.  Thus, a densely filled
OpenMapReal might just return true because that usage is out of the
ordinary.


Regarding what the sparseIterator returns, skipping zeroes is a fine idea,
but the contract should really be that all non-zeros will be returned rather
than no zeros will be returned.  There should be flexibility to return zero
values if that is much more convenient, but there should be no case where a
non-zero is missed.

Regarding the additional indicators, these are occasionally useful in
particular cases, but many of those special cases actually work out just as
well if you simply treat them as sparse.  As an example, multiplication by a
diagonal matrix can be special cased, but simply iterating over the
non-zeros (i.e. the diagonal) is just as good.  Matrix power is a
counter-example where the computation can be reduced to element-wise power
for diagonal, but not for sparse.  Even so, the sparse implementation is
surprisingly competitive even there especially for small powers if you have
a special sparse-sparse multiply.  Mahout has an additional indicator
function that tells whether sparseIterator returns items in a sequential
order which can reduce many of these cases to a merge.

For banded arrays, the economies available beyond simple sparse algorithms
are even more limited.

Symmetric and triangular matrices also have special properties but it is
hard to decide what is really important there.  Many of the special
operations for these kinds of matrix are subject to solving by overloads
instead of indicators since we aren't dealing with binary operations.  For
example, left and right inverse multiplication with triangular matrices is
handled by normal single dispatch and qualifying an argument for real
Cholesky decomposition is specific to the Cholesky decomposition itself.

On Wed, Aug 17, 2011 at 11:28 AM, Arne Ploese <aplo...@gmx.de> wrote:

> So what exactly is the meaning of isSparse?
>
> * the vector at hand implements some kind of sparse storage? I.E:
> OpenMapRealVector
>
> or
>
> * the vector is actually sparsely filled (then what is sparse anyway
> 10%, 50%, 90%?) in this case a SparserelVector interface is useless for
> the compiler.
>
> My suggestion is: ArrayRealVector implements a sparse iterator which
> returns only values != 0?
>
> Having the backing storage implementation open, one could think of:
>
> * isArray
> * isBanded
> * isSymetrical
> * isDiagonal
>
> as additional markers or put the whole thing in an enum or EnumSet?
>
> Am Mittwoch, den 17.08.2011, 08:51 -0700 schrieb Ted Dunning:
> > I think that all vectors and matrices should be able to answer the
> question
> > about whether they are sparse and they should support sparse iterators,
> > defaulting to the normal iterators in the general case.  So yes to the
> first
> > question.  This allows the application programmer to be much less
> concerned
> > about whether they are using a sparse or dense matrix without losing much
> > performance, if any.
> >
> > The second question I think is much less clear.  It might well be a good
> > idea to keep the interface to help the compiler in cases where the
> > programmer knows which general class of matrix they have.
> >
> > On Wed, Aug 17, 2011 at 8:29 AM, Arne Ploese <aplo...@gmx.de> wrote:
> >
> > > Am Mittwoch, den 17.08.2011, 07:25 -0700 schrieb Ted Dunning:
> > > > Arne,
> > > >
> > > > Please read the thread again.  I am providing an example of how I
> think
> > > > things *should* be.
> > > OK.
> > > if I understand you right: isSparse() should be added to RealVector and
> > > the interface SparseRealVector can be dropped?
> > >
> > > >
> > > > The point of doing so is that things are not that way now.  Telling
> me
> > > that
> > > > they are not that way is pretty redundant.
> > > >
> > > > On Wed, Aug 17, 2011 at 2:37 AM, Arne Ploese <aplo...@gmx.de> wrote:
> > > >
> > > > > Currently sparseIterator is only used in RealVector, no matrix
> class
> > > > > will be affected, because there is no sparseIterator.
> > > > > Search you sources for "sparseIterator" ?
> > > > >
> > > > > Arne
> > > > > Am Dienstag, den 16.08.2011, 14:09 -0700 schrieb Ted Dunning:
> > > > > > Here is an example from the perspective of somebody adding a new
> kind
> > > of
> > > > > > matrix.
> > > > > >
> > > > > > Take the two kinds of matrix as RandomTrinaryMatrix(rows,
> columns, p)
> > > > > that
> > > > > > has elements that are -1, 0 or 1.  1 and -1 have equal
> probabilities
> > > of
> > > > > p/2.
> > > > > >  The value of p should be in [0,1].
> > > > > >
> > > > > > It would be very nice if the implementor of this matrix could
> extend
> > > an
> > > > > > abstract matrix and over-ride get() to generate a value and set()
> to
> > > > > throw
> > > > > > an unsupported operation exception.  If p < 0.1, then the matrix
> > > should
> > > > > be
> > > > > > marked as sparse, else as dense.
> > > > > >
> > > > > > All operations against other matrices, sparse or dense should
> work
> > > well
> > > > > > without any special handling by the implementor of this matrix.
> > > > > >
> > > > > > This works in Mahout for instance by having the default
> operations in
> > > > > > AbstractMatrix test for sparseness of left or right operands and
> do
> > > the
> > > > > > right thing.  Obviously, a type test will not tell you whether
> this
> > > > > matrix
> > > > > > is sparse or not.
> > > > > >
> > > > > > This matrix and siblings is very important in compressed sensing
> and
> > > > > > stochastic projection algorithms.
> > > > > >
> > > > > > On Tue, Aug 16, 2011 at 1:55 PM, Phil Steitz <
> phil.ste...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > On 8/16/11 4:46 AM, Gilles Sadowski wrote:
> > > > > > > > Hi.
> > > > > > > >
> > > > > > > >> I understood what he was suggesting.  I still disagree.
>  Dynamic
> > > > > > > dispatch
> > > > > > > >> and non-lattice typing structure is still required to make
> this
> > > all
> > > > > > > work.
> > > > > > > >>  Java doesn't really do that.  Pretending that what Java
> does is
> > > > > > > sufficient
> > > > > > > >> is hammer-looking-for-a-nail, not solving the problems at
> hand.
> > > > > > > > Maybe that *I* don't understand what you are hinting at.
> Sorry
> > > for
> > > > > being
> > > > > > > > dense. [Although that seems appropriate in this discussion
> :-).]
> > > > > > > >
> > > > > > > > Polymorphism provides dynamic dispatch, overloading does not;
> > > that's
> > > > > why
> > > > > > > my
> > > > > > > > proposition is that when you manipulate "unknown" types,
> those
> > > should
> > > > > > > come
> > > > > > > > as "this", not as the argument of the method.
> > > > > > > >
> > > > > > > > What's wrong with that?
> > > > > > > >
> > > > > > > > As for "hammer-looking-for-a-nail", I also don't see what you
> > > mean:
> > > > > What
> > > > > > > is
> > > > > > > > the problem? I guess that there are lots of applications who
> > > never
> > > > > need
> > > > > > > to
> > > > > > > > know about sparse vectors/matrices. In those cases, the added
> > > > > complexity
> > > > > > > is
> > > > > > > > not a "feature". The issue reported contends that the current
> > > design
> > > > > in
> > > > > > > CM
> > > > > > > > can cause problems for dense implementations. I'm not even
> sure
> > > that
> > > > > the
> > > > > > > > current design is usable for the type of applications that
> make
> > > heavy
> > > > > use
> > > > > > > of
> > > > > > > > sparseness. Those are problems, IMHO.
> > > > > > >
> > > > > > > I have been out of pocket the last couple of days and may not
> have
> > > > > > > time to dig into this until late tonight, but I agree with
> Gilles
> > > > > > > that we need to get the conversation here more concrete.  I
> know we
> > > > > > > discussed this before and Ted and others had good examples
> > > > > > > justifying the current setup.  Can we revisit these, please?
> What
> > > > > > > would be great would be some examples both from the perspective
> of
> > > > > > > the [math] developer looking to add a new or specialized class
> and
> > > > > > > [math] users writing code that leverages the setup.
> > > > > > >
> > > > > > > Phil
> > > > > > > >
> > > > > > > >
> > > > > > > > Gilles
> > > > > > > >
> > > > > > > >> On Mon, Aug 15, 2011 at 6:52 PM, Greg Sterijevski <
> > > > > > > gsterijev...@gmail.com>wrote:
> > > > > > > >>
> > > > > > > >>> Forgive me for pushing my nose under the tent... I couldn't
> > > resist.
> > > > > > > >>>
> > > > > > > >>> I think Gilles is saying that each specialization of the
> > > > > matrix/vector
> > > > > > > >>> objects would need to support pre (and post) multiplication
> > > with a
> > > > > > > dense.
> > > > > > > >>> So
> > > > > > > >>> the type issue would not be problematic.
> > > > > > > >>>
> > > > > > > >>> On Mon, Aug 15, 2011 at 6:34 PM, Ted Dunning <
> > > > > ted.dunn...@gmail.com>
> > > > > > > >>> wrote:
> > > > > > > >>>
> > > > > > > >>>> No.
> > > > > > > >>>>
> > > > > > > >>>> You can't.  This is because the type is lost as you enter
> the
> > > > > generic
> > > > > > > >>>> library.
> > > > > > > >>>>
> > > > > > > >>>> On Mon, Aug 15, 2011 at 4:28 PM, Gilles Sadowski <
> > > > > > > >>>> gil...@harfang.homelinux.org> wrote:
> > > > > > > >>>>
> > > > > > > >>>>>> They know that their own object is dense, but they don't
> > > know
> > > > > what
> > > > > > > >>> kind
> > > > > > > >>>>> of
> > > > > > > >>>>>> input they were given.  They should still run fast if
> the
> > > input
> > > > > is
> > > > > > > >>>>> sparse.
> > > > > > > >>>>>
> > > > > > > >>>>> Couldn't we still rely on polymorphism by implementing
> > > > > "preTimes":
> > > > > > > >>>>>   unknown.preTimes(dense)
> > > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > > > > > > For additional commands, e-mail: dev-h...@commons.apache.org
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > > > > > For additional commands, e-mail: dev-h...@commons.apache.org
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > > > For additional commands, e-mail: dev-h...@commons.apache.org
> > > > >
> > > > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > For additional commands, e-mail: dev-h...@commons.apache.org
> > >
> > >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

Re: [Math] "iterator" and "sparseIterator" in "RealVector" hierarchy

Reply via email to