If a sentence can be rewritten in 1000 different ways without changing
its meaning, then that only adds 10 bits.

Yes, provided that you have an efficient encoding/decoding scheme for that particular sentence. Now, what is the overhead for having efficient encoding/decoding schemes for *all* possible sentences?

You state that "The amount of extra knowledge needed to encode the choice of representations is small." I strenuously disagree with this statement. While the number of bits required in the encoded text is small, the amount of extra knowledge required in the encoder and decoder is much, *MUCH* larger. What model did you have in mind that joins both deep knowledge and the very shallow lossless algorithms that you cite? I don't believe that you can cite *any* deep knowledge algorithm/model that doesn't suffer when you try to add losslessness.

----- Original Message ----- From: "Matt Mahoney" <[EMAIL PROTECTED]>
To: <agi@v2.listbox.com>
Sent: Wednesday, April 18, 2007 9:19 PM
Subject: Re: Goals of AGI (was Re: [agi] AGI interests)


--- Mark Waser <[EMAIL PROTECTED]> wrote:

>> I could have used a lossy test by using human subjects to judge the
>> equivalence of the reproduced output text, but it seemed like more
>> trouble than it is worth.  The lossless test is fair because everyone
>> still has to encode the (incompressible) choice of representation.

Whether or not the lossless test is fair is irrelevant and you entirely
failed to address my argument that "Requiring an AI to decompress the same knowledge into a variety of different forms based upon what was input is a
tremendously more difficult problem than AI without that requirement (and
having that requirement doesn't seem to have any benefit)."

The benefit of a lossless test is that you don't need human judges to do a
subjective evaluation.

I don't believe that it is tremendously more difficult to decompress text
exactly than to decompress to different text that has the same meaning. The amount of extra knowledge needed to encode the choice of representations is small. If a sentence can be rewritten in 1000 different ways without changing
its meaning, then that only adds 10 bits.

As for whether more computation is required, that is debatable. It depends on
how you implement the model.  Efficient, lossless language models already
exist, for example, PPM, distant bigram, and LSA. What model did you have in
mind where losslessness would be a hardship?



-- Matt Mahoney, [EMAIL PROTECTED]

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&;



-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=231415&user_secret=fabd7936

Reply via email to