Re: [fonc] Error trying to compile COLA

2012-03-11 Thread BGB

On 3/11/2012 4:51 PM, Martin Baldan wrote:

I won't pretend I really know what I'm talking about, I'm just
guessing here, but don't you think the requirement for "independent
and identically-distributed random variable data" in Shannon's source
coding theorem may not be applicable to pictures, sounds or frame
sequences normally handled by compression algorithms?


that is a description of random data, which granted, doesn't apply to 
most (compressible) data.

that wasn't really the point though.

once one gets to a point where ones' data looks like this, then further 
compression is no longer possible (hence why there is a limit).


typically, compression will transform low-entropy data (with many 
repeating patterns and redundancies) into a smaller amount of 
high-entropy compressed data (with almost no repeating patterns or 
redundancy).




I mean, many
compression techniques rely on domain knowledge about the things to be
compressed. For instance, a complex picture or video sequence may
consist of a well-known background with a few characters from a
well-known inventory in well-known positions. If you know those facts,
you can increase the compression dramatically. A practical example may
be Xtranormal stories, where you get a cute 3-D animated dialogue from
a small script.


yes, but this can only compress what redundancies exist.
once the redundancies are gone, one is at a limit.

specialized knowledge allows one to do a little better, but does not 
change the basic nature of the limit.


for example, I was able to devise a compression scheme which reduced 
S-Expressions to only 5% their original size. now what if I want 3%, or 
1%? this is not an easy problem. it is much easier to get from 10% to 5% 
than to get from 5% to 3%.



the big question then is how much redundancy exists within a typical OS, 
or other large piece of software?


I expect one can likely reduce it by a fair amount (such as by 
aggressive refactoring and DSLs), but there will likely be a bit of a 
limit, and once one approaches this limit, there is little more that can 
be done (as it quickly becomes a fight against diminishing returns).


otherwise, one can start throwing away features, but then there is still 
a limit, namely how much can one discard and still keep the "essence" of 
the software intact.



although many current programs are, arguably, huge, the vast majority of 
the code is likely still there for a reason, and is unlikely the result 
of programmers just endlessly writing the same stuff over and over 
again, or resulting from other simple patterns. rather, it is more 
likely piles of special case logic and optimizations and similar.



(BTW: now have in-console text editor, but ended up using full words for 
most command names, seems basically workable...).



Best,

-Martin

On Sun, Mar 11, 2012 at 7:53 PM, BGB  wrote:

On 3/11/2012 5:28 AM, Jakub Piotr Cłapa wrote:

On 28.02.12 06:42, BGB wrote:

but, anyways, here is a link to another article:
http://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem


Shannon's theory applies to lossless transmission. I doubt anybody here
wants to reproduce everything down to the timings and bugs of the original
software. Information theory is not thermodynamics.


Shannon's theory also applies some to lossy transmission, as it also sets a
lower bound on the size of the data as expressed with a certain degree of
loss.

this is why, for example, with JPEGs or MP3s, getting a smaller size tends
to result in reduced quality. the higher quality can't be expressed in a
smaller size.

___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


Re: [fonc] Error trying to compile COLA

2012-03-11 Thread Martin Baldan
I won't pretend I really know what I'm talking about, I'm just
guessing here, but don't you think the requirement for "independent
and identically-distributed random variable data" in Shannon's source
coding theorem may not be applicable to pictures, sounds or frame
sequences normally handled by compression algorithms? I mean, many
compression techniques rely on domain knowledge about the things to be
compressed. For instance, a complex picture or video sequence may
consist of a well-known background with a few characters from a
well-known inventory in well-known positions. If you know those facts,
you can increase the compression dramatically. A practical example may
be Xtranormal stories, where you get a cute 3-D animated dialogue from
a small script.

Best,

-Martin

On Sun, Mar 11, 2012 at 7:53 PM, BGB  wrote:
> On 3/11/2012 5:28 AM, Jakub Piotr Cłapa wrote:
>>
>> On 28.02.12 06:42, BGB wrote:
>>>
>>> but, anyways, here is a link to another article:
>>> http://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem
>>
>>
>> Shannon's theory applies to lossless transmission. I doubt anybody here
>> wants to reproduce everything down to the timings and bugs of the original
>> software. Information theory is not thermodynamics.
>>
>
> Shannon's theory also applies some to lossy transmission, as it also sets a
> lower bound on the size of the data as expressed with a certain degree of
> loss.
>
> this is why, for example, with JPEGs or MP3s, getting a smaller size tends
> to result in reduced quality. the higher quality can't be expressed in a
> smaller size.
___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


Re: [fonc] Error trying to compile COLA

2012-03-11 Thread BGB

On 3/11/2012 5:28 AM, Jakub Piotr Cłapa wrote:

On 28.02.12 06:42, BGB wrote:

but, anyways, here is a link to another article:
http://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem


Shannon's theory applies to lossless transmission. I doubt anybody 
here wants to reproduce everything down to the timings and bugs of the 
original software. Information theory is not thermodynamics.




Shannon's theory also applies some to lossy transmission, as it also 
sets a lower bound on the size of the data as expressed with a certain 
degree of loss.


this is why, for example, with JPEGs or MP3s, getting a smaller size 
tends to result in reduced quality. the higher quality can't be 
expressed in a smaller size.



I had originally figured that the assumption would have been to try to 
recreate everything in a reasonably feature-complete way.



this means such things in the OS as:
an OpenGL implementation;
a command-line interface, probably implementing ANSI / VT100 style 
control-codes (even in my 3D engine, my in-program console currently 
implements a subset of these codes);

a loader for program binaries (ELF or PE/COFF);
POSIX or some other similar OS APIs;
probably a C compiler, assembler, linker, run-time libraries, ...;
network stack, probably a web-browser, ...;
...

then it would be a question of how small one could get everything while 
still implementing a reasonably complete (if basic) feature-set, using 
any DSLs/... one could think up to shave off lines of code.


one could probably shave off OS-specific features which few people use 
anyways (for example, no need to implement support for things like GDI 
or the X11 protocol). a "simple" solution being that OpenGL largely is 
the interface for the GUI subsystem (probably with a widget toolkit 
built on this, and some calls for things not directly supported by 
OpenGL, like managing mouse/keyboard/windows/...).


also, potentially, a vast amount of what would be standalone tools, 
could be reimplemented as library code and merged (say, one has the 
"shell" as a kernel module, which directly implements nearly all of the 
basic command-line tools, like ls/cp/sed/grep/...).


the result of such an effort, under my estimates, would likely still end 
up in the Mloc range, but maybe one could get from say, 200 Mloc (for a 
Linux-like configuration) down to maybe about 10-15 Mloc, or if one 
tried really hard, maybe closer to 1 Mloc, and much smaller is fairly 
unlikely.



apparently this wasn't the plan though, rather the intent was to 
substitute something entirely different in its place, but this sort of 
implies that it isn't really feature-complete per-se (and it would be a 
bit difficult trying to port existing software to it).


someone asks: "hey, how can I build Quake 3 Arena for your OS?", and 
gets back a response roughly along the lines of "you will need to 
largely rewrite it from the ground up".


much nicer and simpler would be if it could be reduced to maybe a few 
patches and modifying some of the OS glue stubs or something.



(tangent time):

but, alas, there seems to be a bit of a philosophical split here.

I tend to be a bit more conservative, even if some of this stuff is put 
together in dubious ways. one adds features, but often ends up 
jerry-rigging things, and using bits of functionality in different 
contexts: like, for example, an in-program command-entry console, is not 
normally where one expects ANSI codes, but at the time, it seemed a sane 
enough strategy (adding ANSI codes was a fairly straightforward way to 
support things like embedding color information in console message 
strings, ...). so, the basic idea still works, and so was applied in a 
new context (a console in a 3D engine, vs a terminal window in the OS).


side note: internally, the console is represented as a 2D array of 
characters, and another 2D array to store color and modifier flags 
(underline, strikeout, blink, italic, ...).


the console can be used both for program-related commands, accessing 
"cvars", and for evaluating script fragments (sadly, limited to what can 
be reasonably typed into a console command, which can be a little 
limiting for much more than "make that thing over there explode" or 
similar). functionally, the console is less advanced than something like 
bash or similar.


I have also considered the possibility of supporting multiple consoles, 
and maybe a console-integrated text-editor, but have yet to decide on 
the specifics (I am torn between a specialized text-editor interface, or 
making the text editor be a console command which hijacks the console 
and probably does most of its user-interface via ANSI codes or similar...).


but, it is not obvious what is the "best" way to integrate a text-editor 
into the UI for a 3D engine, hence why I have had this idea floating 
around for months, but haven't really acted on it (out of humor, it 
could be given a VIM-like user-interface... ok, probably not, I was 
imagining mo

Re: [fonc] Error trying to compile COLA

2012-03-11 Thread Jakub Piotr Cłapa

On 28.02.12 06:42, BGB wrote:

but, anyways, here is a link to another article:
http://en.wikipedia.org/wiki/Shannon%27s_source_coding_theorem


Shannon's theory applies to lossless transmission. I doubt anybody here 
wants to reproduce everything down to the timings and bugs of the 
original software. Information theory is not thermodynamics.


--
regards,
Jakub Piotr Cłapa
___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc