On Fri, Nov 21, 2014 at 9:57 PM, Alan Grimes via AGI <[email protected]> wrote:
> My real problem is software because building for this kind of
> application will require new research along with the usual heavy lifting...

Yeah, software can be a problem. You are building something with the
same complexity as a human. We have some idea how many bits that is.
Using the best known compression algorithms, your DNA is about the
same size as 300 million lines of code.

I did a quick analysis of the OpenCog download from
https://github.com/opencog/opencog
since this is one of the most ambitious open source AGI projects that
I am aware of. It is the second oldest (17 years) after Cyc (30
years). Of course there are better funded closed source projects like
IBM (Watson), Google, and Facebook. But I digress. To do the analysis
I unzipped the 7 MB download and packed it into a zpaq archive so I
could analyse the code. The "zpaq list -summary" command shows the
biggest file types and directories.

C:\tmp>zpaq list opencog.zpaq -summary 30
zpaq v6.55 journaling archiver, compiled Jul 24 2014
opencog.zpaq: 1 versions, 3017 files, 2842 fragments, 6.190125 MB

Rank      Size (MB) Ratio     Files File, Directory/, or .Type
---- -------------- ------ --------- --------------------------
   1      23.049800 0.2651      3017
   2      23.049800 0.2651      3018 opencog-master/
   3      17.257392 0.2616      2101 opencog-master/opencog/
   4       5.828416 0.2390       423 opencog-master/opencog/embodiment/
   5       5.444238 0.2970       534 .cc
   6       3.537585 0.2794       414 opencog-master/opencog/learning/
   7       3.227577 0.2970       611 .h
   8       3.120504 0.2657       617 opencog-master/tests/
   9       3.033294 0.1787       186 .txt
  10       2.646998 0.2736       299 opencog-master/opencog/learning/moses/
  11       2.551398 0.1800        20
opencog-master/opencog/embodiment/AutomatedSystemTest/
  12       2.522369 0.1787         6
opencog-master/opencog/embodiment/AutomatedSystemTest/GoldenStandardFiles/
  13       1.723003 0.2937       203 opencog-master/opencog/embodiment/Control/
  14       1.718453 0.2458       251 opencog-master/opencog/python/
  15       1.619435 0.2970         5 .pdf
  16       1.584644 0.1787       343 .scm
  17       1.567517 0.2970       182 .cxxtest
  18       1.498255 0.2364       238 opencog-master/opencog/nlp/
  19       1.204053 0.2697       100 opencog-master/opencog/python/pln/
  20       1.110829 0.1787         1
opencog-master/opencog/embodiment/AutomatedSystemTest/GoldenStandardFiles/gsfile_0.txt
  21       1.065273 0.2873        49 opencog-master/doc/
  22       0.901893 0.2962       136 opencog-master/opencog/comboreduct/
  23       0.894117 0.2664       112
opencog-master/opencog/learning/moses/diary/
  24       0.886689 0.1787       197 .py
  25       0.856565 0.1787         1
opencog-master/opencog/embodiment/AutomatedSystemTest/GoldenStandardFiles/gsfile_1.txt
  26       0.813223 0.2970       447 .
  27       0.801463 0.2968        65
opencog-master/opencog/embodiment/Control/OperationalAvatarController/
  28       0.788446 0.2970         4 opencog-master/opencog/python/pln/notes/
  29       0.787802 0.2966        93 opencog-master/opencog/spatial/
  30       0.787222 0.2970         1
opencog-master/opencog/python/pln/notes/distributions.pdf

Shares Fragments Deduplicated MB    Extracted MB
------ --------- --------------- ---------------
     1      2810       22.763673       22.763673
     2        27        0.142732        0.285464
     5         3        0.000033        0.000165
     9         1        0.000022        0.000198
   10+         1        0.000030        0.000300
 Total      2842       22.906490       23.049800

Ver Last frag Date      Time (UT) Files Deleted   Original MB  Compressed MB
---- -------- ---------- -------- ------ ------ -------------- --------------
   1     2842 2014-11-22 02:57:19   3017      0      23.049800       6.190125

0 references to 0 of 2842 fragments have unknown size.
2 of 2 blocks used.
Compression 23.049800 -> 6.190125 MB (ratio 26.855%)
0.187 seconds (all OK)

So it looks like about 9 MB of C++, 0.8 MB of Python, and 12 MB of
various data and documentation files. A quick test with wc shows a
typical 30-40 characters per line of code, but of course it depends a
lot on coding style. So to get a better measure of the information
content I extract just the *.cc, *.h, and *.py files to a new
directory tree and compress again with zpaq's highest compression
level (-method 5) to 1.08 MB. In my previous comparison tests with DNA
[1], I measured 32 bytes per line (averaged over 1M lines of Gimp and
MinGW sources), compressing to 2 bytes per line, suggesting about 540K
lines.

Typical software productivity is 10 lines per day or 2000 lines per
person per year. So this is about a 270 person-year effort. A typical
budget at $250K per person-year (Ph.D. level salary plus overhead)
would imply a cost of $67 million. Using my original estimate of 300M
lines for AGI, OpenCog is 0.2% complete.

And yes, I know you can write 100 lines per day. That would be nice if
all you did was code and never threw any of it away. About 80 to 85%
of the cost of large projects is maintenance. You spend 10% of your
time actually writing new code that other people will use. zpaq is 10K
lines that took me 5 years to write.

[1] The Cost of AI.
https://docs.google.com/document/d/1Z0kr3XDoM6cr5TgHH0GXQTjyikr7WpCkpWFn9IglW3o
(see appendix for DNA and source code compression tests).


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to