Re: FeatureEncoder Question

Ted Dunning Wed, 01 Sep 2010 07:20:05 -0700

The goal of the interaction encoder is to produce a vector that is
orthogonal to the original.  The current strategy is to add the hash values
together which should leave the new locations in different values from
either of the originals (on average).

The place that this gets trickier is when text interacts with something,
especially text.  This is because text encodes as a vector with (nearly) as
many non-zeros as unique words in the original text for each probe.  When
text with n words interacts with text with m words, you get n x m non-zeros
in the result.  I think that is the best thing to do, but it can be costly
if you have gobs of words in your text.

I am wide open for suggestions on this.  What we have so far is good enough
for the current application and the current tests verify the orthogonality
for a few examples, but more thought would be good.

On Wed, Sep 1, 2010 at 6:44 AM, Robin Anil <[email protected]> wrote:

> I am trying to put FeatureEncoder in front of Mahout Bayes trainer and
> classifier, I have a doubt about the interaction encoder. How does
> difference in the hash bit correlate with an interaction ?
>
>
> Robin
>

Re: FeatureEncoder Question

Reply via email to