Re: FeatureEncoder Question

Ted Dunning Wed, 01 Sep 2010 21:25:01 -0700

You could go down that sort of path, but you lose all the power of
quasi-orthogonality that way.

The basic idea is that random points on the sphere are almost all orthogonal
to within an epsilon that depends on the dimensionality of the space and the
number of vectors.  For epsilon = 0, then the number n of vectors that you
can place into the space is clearly the dimension d.  For non-trivial values
of epsilon, however, the number is exponential in d.  This holds
constructively as a hard bound, or in probability for random vectors.

The exponentiality means that if you divide the space into two sub-spaces,
you get vastly less than half capacity in each sub-space.

This behavior is related to Bloom filters where a fixed false positive rate
leads to a size proportional to the number of documents and the number of
bits per document is proportional to the negative log of the false positive
rate.

The upshot is that it is better to mix everything into the same space and
take our chances.

On Wed, Sep 1, 2010 at 7:24 AM, Robin Anil <[email protected]> wrote:

>
> What if you dont mix it in the same vector length. Make vector 2l and add
> these in the second half so they wont mix right?
>
>

Re: FeatureEncoder Question

Reply via email to