On Tue, Dec 2, 2025 at 12:25 AM <[email protected]> wrote: > > https://encode.su/threads/3595-Star-Engine-AI-data-compressor?p=86553#post86553
I'm glad somebody here is still doing concrete work toward AGI. The encode forum gives good advice. Learn C/C++. Almost all data compression code is written in it, and it is 100 times faster than Python. Document your code. What does your program do? How do you run it? Describe the compressed format and the algorithm for decoding it. Then describe the encoding algorithm. It looks like your program uses some kind of PPM or context model with arithmetic coding. I'm not sure. It produced a 56,487 byte file filled with random digits, which I assume is the arithmetic coder output in base 10 instead of base 256. I tested it in Ubuntu and it looks like it compressed pre-processed-enwik5 in 65 seconds. It reports a compressed size of 23,465 (about the actual output size divided by log(256)) on my Lenovo Core i7-1165G7, 2.80 GHz, 16 GB. Here are some results I got with zip -9, 7zip, zpaq -m5, and paq8px_v67 -8. 54,781 pre-processed enwik5.txt 29,895 x.zip 26,631 x.7z 23,479 x-m5.zpaq 20,582 x.paq8px paq8px_v67 compressed in about 5 seconds. The others were less than 1 second. I didn't compare with enwik5 because preprocessing by cmix -s hides information in the external dictionary, which has to be present to decompress. For the Hutter prize, cmix appends a compressed copy of the dictionary to the compressed file. When I compress 100,000 byte enwik5 directly with paq8px_v67 I get 24,838 bytes. -- -- Matt Mahoney, [email protected] ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tf0bedfcd44454678-M076e00c1d6b0835aec5fb0ab Delivery options: https://agi.topicbox.com/groups/agi/subscription
