Removing the ordering constraint on enwik8  should reduce the compressed
size by about 50K bytes, or 2 bytes per article. But it wouldn't affect the
nature of the research. Here is more about the data.
http://mattmahoney.net/dc/textdata.html

On Tue, Jan 14, 2020, 7:59 AM James Bowery <jabow...@gmail.com> wrote:

> Here's a simple modification to The Hutter Prize
> <http://prize.hutter1.net/> and the Large Text Compression Benchmark
> <http://mattmahoney.net/dc/text.html> to illustrate my point:
>
> Split the Wikipedia corpus into separate files, one per Wikipedia
> article.  An entry qualifies only if the set of checksums of the files
> produced by the self-extracting archive matches that of the original corpus.
>
> This reduces the over-constraint imposed by the strictly serialized corpus.
>
>
> On Sun, Jan 5, 2020 at 12:12 PM James Bowery <jabow...@gmail.com> wrote:
>
>> In reality, sensors and effectors exist in space as well as time.
>> Serializing the spatial dimension of observations to formalize their
>> Kolmogorov Complexity, so they conform to the serialized input to a
>> Universal Turing machine, over-constrains the observations, introducing
>> order not relevant to their natural information content, hence artificially
>> inflating the, so-defined, KC.
>>
>> Since virtually all models in machine learning are based on tabular data,
>> even if they can be cast as time series, row-indexed by a timestamp, each
>> row is an observation with multiple dimensions.   So it seems rather
>> interesting, if not frustrating, that the default assumption in Algorithmic
>> Information Theory is of a serial UTM.
>>
>>
>> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> + delivery
> options <https://agi.topicbox.com/groups/agi/subscription> Permalink
> <https://agi.topicbox.com/groups/agi/Tc33b8ed7189d2a18-M46367e236f33c84c655e8e76>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tc33b8ed7189d2a18-Ma929612907338546069466a8
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to