You could go down that sort of path, but you lose all the power of quasi-orthogonality that way.
The basic idea is that random points on the sphere are almost all orthogonal to within an epsilon that depends on the dimensionality of the space and the number of vectors. For epsilon = 0, then the number n of vectors that you can place into the space is clearly the dimension d. For non-trivial values of epsilon, however, the number is exponential in d. This holds constructively as a hard bound, or in probability for random vectors. The exponentiality means that if you divide the space into two sub-spaces, you get vastly less than half capacity in each sub-space. This behavior is related to Bloom filters where a fixed false positive rate leads to a size proportional to the number of documents and the number of bits per document is proportional to the negative log of the false positive rate. The upshot is that it is better to mix everything into the same space and take our chances. On Wed, Sep 1, 2010 at 7:24 AM, Robin Anil <[email protected]> wrote: > > What if you dont mix it in the same vector length. Make vector 2l and add > these in the second half so they wont mix right? > >
