Here is one way you could generate a list of programming languages

Look at the bottom of

http://en.wikipedia.org/wiki/D_programming_language

and you see categories like “C Programming Language Family” and then if you 
look at

http://en.wikipedia.org/wiki/Category:C_programming_language_family

you’ll see that is a member of category

http://en.wikipedia.org/wiki/Category:Programming_language_families

by traversing this graph you can find categories that contain programming 
languages and programming languages. All of the category links are in DBpedia 
so this is straightforward to do.

The great thing is you can seed this with a query that gets partial results; 
for instance, you can use your search for “programming language” in the name. 
To be fair you’ll need to put some human effort into this. You’ll find some 
categories that turn up that are wrong, and probably get some items like 
“Generics in Java” and “Dennis Richie”. Still my experience is that I can 
create categories of 10,000 or so things (like “things in new york city that 
don’t have coordinates” or “things related to sex and drugs”) in a few hours of 
work. It’s helpful to sort results with a subjective importance score so at 
least you can see the worst outliers.  (At one point I got Hillary Clinton as 
the top “sex” topic, for instance, because she was the victim of adultery. It’s 
quite interesting that the perpetrator of adultery didn’t get flagged...)

The graph traversal has a similar structure to Kleinberg’s hubs and authorities 
algorithm and there’s probably some way to assign scores to the nodes that are 
related to probability of a topic or category being in the set.

Note also that Freebase has a programming language type, see

http://www.freebase.com/view/en/fortran

and you could get a list of programming languages there and then map the id’s 
back to DBpedia.

Reply via email to