On Fri, Nov 21, 2014 at 9:57 PM, Alan Grimes via AGI <[email protected]> wrote: > My real problem is software because building for this kind of > application will require new research along with the usual heavy lifting...
Yeah, software can be a problem. You are building something with the same complexity as a human. We have some idea how many bits that is. Using the best known compression algorithms, your DNA is about the same size as 300 million lines of code. I did a quick analysis of the OpenCog download from https://github.com/opencog/opencog since this is one of the most ambitious open source AGI projects that I am aware of. It is the second oldest (17 years) after Cyc (30 years). Of course there are better funded closed source projects like IBM (Watson), Google, and Facebook. But I digress. To do the analysis I unzipped the 7 MB download and packed it into a zpaq archive so I could analyse the code. The "zpaq list -summary" command shows the biggest file types and directories. C:\tmp>zpaq list opencog.zpaq -summary 30 zpaq v6.55 journaling archiver, compiled Jul 24 2014 opencog.zpaq: 1 versions, 3017 files, 2842 fragments, 6.190125 MB Rank Size (MB) Ratio Files File, Directory/, or .Type ---- -------------- ------ --------- -------------------------- 1 23.049800 0.2651 3017 2 23.049800 0.2651 3018 opencog-master/ 3 17.257392 0.2616 2101 opencog-master/opencog/ 4 5.828416 0.2390 423 opencog-master/opencog/embodiment/ 5 5.444238 0.2970 534 .cc 6 3.537585 0.2794 414 opencog-master/opencog/learning/ 7 3.227577 0.2970 611 .h 8 3.120504 0.2657 617 opencog-master/tests/ 9 3.033294 0.1787 186 .txt 10 2.646998 0.2736 299 opencog-master/opencog/learning/moses/ 11 2.551398 0.1800 20 opencog-master/opencog/embodiment/AutomatedSystemTest/ 12 2.522369 0.1787 6 opencog-master/opencog/embodiment/AutomatedSystemTest/GoldenStandardFiles/ 13 1.723003 0.2937 203 opencog-master/opencog/embodiment/Control/ 14 1.718453 0.2458 251 opencog-master/opencog/python/ 15 1.619435 0.2970 5 .pdf 16 1.584644 0.1787 343 .scm 17 1.567517 0.2970 182 .cxxtest 18 1.498255 0.2364 238 opencog-master/opencog/nlp/ 19 1.204053 0.2697 100 opencog-master/opencog/python/pln/ 20 1.110829 0.1787 1 opencog-master/opencog/embodiment/AutomatedSystemTest/GoldenStandardFiles/gsfile_0.txt 21 1.065273 0.2873 49 opencog-master/doc/ 22 0.901893 0.2962 136 opencog-master/opencog/comboreduct/ 23 0.894117 0.2664 112 opencog-master/opencog/learning/moses/diary/ 24 0.886689 0.1787 197 .py 25 0.856565 0.1787 1 opencog-master/opencog/embodiment/AutomatedSystemTest/GoldenStandardFiles/gsfile_1.txt 26 0.813223 0.2970 447 . 27 0.801463 0.2968 65 opencog-master/opencog/embodiment/Control/OperationalAvatarController/ 28 0.788446 0.2970 4 opencog-master/opencog/python/pln/notes/ 29 0.787802 0.2966 93 opencog-master/opencog/spatial/ 30 0.787222 0.2970 1 opencog-master/opencog/python/pln/notes/distributions.pdf Shares Fragments Deduplicated MB Extracted MB ------ --------- --------------- --------------- 1 2810 22.763673 22.763673 2 27 0.142732 0.285464 5 3 0.000033 0.000165 9 1 0.000022 0.000198 10+ 1 0.000030 0.000300 Total 2842 22.906490 23.049800 Ver Last frag Date Time (UT) Files Deleted Original MB Compressed MB ---- -------- ---------- -------- ------ ------ -------------- -------------- 1 2842 2014-11-22 02:57:19 3017 0 23.049800 6.190125 0 references to 0 of 2842 fragments have unknown size. 2 of 2 blocks used. Compression 23.049800 -> 6.190125 MB (ratio 26.855%) 0.187 seconds (all OK) So it looks like about 9 MB of C++, 0.8 MB of Python, and 12 MB of various data and documentation files. A quick test with wc shows a typical 30-40 characters per line of code, but of course it depends a lot on coding style. So to get a better measure of the information content I extract just the *.cc, *.h, and *.py files to a new directory tree and compress again with zpaq's highest compression level (-method 5) to 1.08 MB. In my previous comparison tests with DNA [1], I measured 32 bytes per line (averaged over 1M lines of Gimp and MinGW sources), compressing to 2 bytes per line, suggesting about 540K lines. Typical software productivity is 10 lines per day or 2000 lines per person per year. So this is about a 270 person-year effort. A typical budget at $250K per person-year (Ph.D. level salary plus overhead) would imply a cost of $67 million. Using my original estimate of 300M lines for AGI, OpenCog is 0.2% complete. And yes, I know you can write 100 lines per day. That would be nice if all you did was code and never threw any of it away. About 80 to 85% of the cost of large projects is maintenance. You spend 10% of your time actually writing new code that other people will use. zpaq is 10K lines that took me 5 years to write. [1] The Cost of AI. https://docs.google.com/document/d/1Z0kr3XDoM6cr5TgHH0GXQTjyikr7WpCkpWFn9IglW3o (see appendix for DNA and source code compression tests). ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
