Re: Social graph (was Re: [twitter-dev] Does this exist?)

2009-02-26 Thread Nick Arnett
On Thu, Feb 26, 2009 at 4:19 PM, Nick Arnett nick.arn...@gmail.com wrote:


 A relational database falls down very fast on this kind of analysis.  For
 example, I have more than 300 followers, which is a simple query... but it
 returns 300 users and now the query needs to ask who the followers of those
 300 are, to answer question No. 1.  That's a big, slow query, since it has
 to specify the ids of the 300 that I follow... or it is 300 smaller queries.
  Either way, ugh.  That query is going to return a very large number of
 items, many of which need to be compared with one another.


FYI, there are 345,000 nodes and 1.4 million edges in the graph of me, my
followers and their followers.  I'm sure this could be pared down
considerably by eliminating a handful of extremely popular people, but it's
still a hard problem to scale.

Nick


Re: Social graph (was Re: [twitter-dev] Does this exist?)

2009-02-26 Thread Pete Warden
(Privately mailed, since I'm nervous about edging off-topic)

I'm working on some related areas, capturing conversation data from Twitter
at http://twitter.mailana.com/ . My approach has been the classic disk-space
trade off, creating massive indices to pre-cache queries. You're right
though, even with that approach the overhead of updating the denormalized
data of a complete friends-of-friends list for all users every time a link
changed would be enormous.

Pete

On Thu, Feb 26, 2009 at 4:39 PM, Nick Arnett nick.arn...@gmail.com wrote:



 On Thu, Feb 26, 2009 at 4:19 PM, Nick Arnett nick.arn...@gmail.comwrote:


 A relational database falls down very fast on this kind of analysis.  For
 example, I have more than 300 followers, which is a simple query... but it
 returns 300 users and now the query needs to ask who the followers of those
 300 are, to answer question No. 1.  That's a big, slow query, since it has
 to specify the ids of the 300 that I follow... or it is 300 smaller queries.
  Either way, ugh.  That query is going to return a very large number of
 items, many of which need to be compared with one another.


 FYI, there are 345,000 nodes and 1.4 million edges in the graph of me, my
 followers and their followers.  I'm sure this could be pared down
 considerably by eliminating a handful of extremely popular people, but it's
 still a hard problem to scale.

 Nick



Re: Social graph (was Re: [twitter-dev] Does this exist?)

2009-02-26 Thread Dossy Shiobara


On 2/26/09 7:19 PM, Nick Arnett wrote:

Yes, you probably don't, unless you just want to do it occasionally.
  I'm working on some graph manipulations like that... the challenge is
that Twitter is so open that the graphs are enormous for many people.
This isn't a very hard problem on a small scale, but generalizing it
looks like it will take a lot of resources.


Huh?  Now that Twitter has added the two social graph APIs that return 
the entire list of IDs, this kind of social graph inspection becomes 
trivially easy: two API calls, and one set operation (intersect).  Even 
at 1M followers, if the two lists are sorted, this is an O(N) operation.



A relational database falls down very fast on this kind of analysis.
  For example, I have more than 300 followers, which is a simple
query... but it returns 300 users and now the query needs to ask who the
followers of those 300 are, to answer question No. 1.  That's a big,
slow query, since it has to specify the ids of the 300 that I follow...
or it is 300 smaller queries.  Either way, ugh.  That query is going to
return a very large number of items, many of which need to be compared
with one another.


Say WHAT?  Are we looking at the same question?  Here's what I saw:

 1) Does anyone I follow also Ed?

Two API calls - one to fetch everyone Joe follows, one to fetch all of 
Ed's followers.  Compute the set intersection ... done.  This should 
take no longer than 5 minutes to implement, and should perform just fine.



So... I'm also curious who else is working on social graph problems.

Also... show of hands, please... who else would use this kind of
capability if it were available?  I may be able to come up with
resources to make it happen.


Seriously, I could have put this into Twitter Karma when the new social 
graph API methods were announced, but who cares?  Studying the social 
graph is only interesting to ivory tower academic wankers who want an 
excuse to hang out on social networking sites as part of their PhD study.


If anyone can come up with an actual valuable use case for these kinds 
of reports that aren't purely navel gazing and ego fluffing exercises, 
I'll implement it for you in Twitter Karma.  It's not hard stuff.


--
Dossy Shiobara  | do...@panoptic.com | http://dossy.org/
Panoptic Computer Network   | http://panoptic.com/
  He realized the fastest way to change is to laugh at your own
folly -- then you can let go and quickly move on. (p. 70)


Re: Social graph (was Re: [twitter-dev] Does this exist?)

2009-02-26 Thread Nick Arnett
On Thu, Feb 26, 2009 at 5:54 PM, Dossy Shiobara do...@panoptic.com wrote:


 On 2/26/09 7:39 PM, Nick Arnett wrote:

 FYI, there are 345,000 nodes and 1.4 million edges in the graph of me,
 my followers and their followers.  I'm sure this could be pared down
 considerably by eliminating a handful of extremely popular people, but
 it's still a hard problem to scale.


 Considering the modern GPU can handle in the range of 1.5 billion vertices
 per second, does 1.4M edges really sound like a large number any more?


No.  But that's for ONE Twitter user.  There is more than one.  ;-)

Nick


Re: Social graph (was Re: [twitter-dev] Does this exist?)

2009-02-26 Thread Nick Arnett
On Thu, Feb 26, 2009 at 6:10 PM, Andrew Badera and...@badera.us wrote:

 If you're trying to use cheap-a$$ unlimited web hosting in lieu of true
 application/cloud/SaaS hosting, sure, 1.4M sounds like a lot.

 If you're willing to spend a couple bucks a month, or write Python against
 GAE, 1.4M is hotdog-down-a-hallway ...


GAE??  Google Application Engine?  Python is, indeed, what I'm working with
and I've been looking at GAE... but to make the point a second time, this
was for only ONE user with only 300 followers.  Yes, easy to do as a
one-off.  Harder if you want to answer the same questions quickly about
large numbers of people.

Nick