Paragraphs and sections in an article share mutual information. However, I
saw on the forum that a transform to group footers that link to the article
in other languages improves compression. You also have to save the original
order. With articles, you can restore the original order by sorting by page
ID, which are sequential in enwik9.

About a third of the articles are redirects. It is easy to group these
together to improve compression. Another 10-15% are about places that were
automatically generated from a US census table. These are highly
compressible and can be grouped.

-- Matt Mahoney, [email protected]

On Sat, Jan 10, 2026, 8:37 AM James Bowery <[email protected]> wrote:

>
>
> On Fri, Jan 9, 2026 at 9:44 PM Matt Mahoney <[email protected]>
> wrote:
>
>> 2. Improved article sort order by Kaitz. I believe this is based on
>> k-means clustering on a 1K vector space model. I was never able to
>> produce the same result myself so I just used the list he supplied.
>>
>
> I wonder to what extent in-line intra-article reordering may:
>
> 1) Be reasonably fast
> 2) Contribute to both speed and compression
> ?
>
> The most obvious granularity would be paragraphs.
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> +
> delivery options <https://agi.topicbox.com/groups/agi/subscription>
> Permalink
> <https://agi.topicbox.com/groups/agi/T0518db1e3a0c25c5-Mb704d1ce04a06824e0334906>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T0518db1e3a0c25c5-Ma4ac8c726e0e7f28299f7acc
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to