On Wed, Sep 1, 2010 at 7:49 PM, Ted Dunning <[email protected]> wrote:
> The goal of the interaction encoder is to produce a vector that is > orthogonal to the original. The current strategy is to add the hash values > together which should leave the new locations in different values from > either of the originals (on average). The place that this gets trickier is when text interacts with something, > especially text. This is because text encodes as a vector with (nearly) as > many non-zeros as unique words in the original text for each probe. When > text with n words interacts with text with m words, you get n x m non-zeros > in the result. I think that is the best thing to do, but it can be costly > if you have gobs of words in your text. > > I am wide open for suggestions on this. What we have so far is good enough > for the current application and the current tests verify the orthogonality > for a few examples, but more thought would be good. What if you dont mix it in the same vector length. Make vector 2l and add these in the second half so they wont mix right? > > On Wed, Sep 1, 2010 at 6:44 AM, Robin Anil <[email protected]> wrote: > > > I am trying to put FeatureEncoder in front of Mahout Bayes trainer and > > classifier, I have a doubt about the interaction encoder. How does > > difference in the hash bit correlate with an interaction ? > > > > > > Robin > > >
