[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-05-30 Thread Dossy Shiobara


On 5/30/09 3:46 PM, David W wrote:

[... David asks about bulk resolving of Twitter user IDs to screen_name ...]


I don't know what the Twitter TOS says, but I've got a sizable cache of 
(reasonably fresh) Twitter user data thanks to Twitter Karma.


Would it be a Twitter TOS violation for me to publish an API to allow 
bulk resolution of IDs to screen_name?  Is this something that folks 
would use if I made it available?


--
Dossy Shiobara  | do...@panoptic.com | http://dossy.org/
Panoptic Computer Network   | http://panoptic.com/
  "He realized the fastest way to change is to laugh at your own
folly -- then you can let go and quickly move on." (p. 70)


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-05-30 Thread David M. Wilson



On May 30, 11:28 pm, Dossy Shiobara  wrote:
> On 5/30/09 3:46 PM, David W wrote:
>
> > [... David asks about bulk resolving of Twitter user IDs to screen_name ...]
>
> I don't know what the Twitter TOS says, but I've got a sizable cache of
> (reasonably fresh) Twitter user data thanks to Twitter Karma.
>
> Would it be a Twitter TOS violation for me to publish an API to allow
> bulk resolution of IDs to screen_name?  Is this something that folks
> would use if I made it available?

In comment to your TOS question: Twitter as a company seem a whole lot
more liberal (and realistic) when it comes to their data. I think I
may have even read this somewhere semiofficial in the past. Profile
information itself is also available to the public, and so, keeping a
local cache is probably no more harmful (from Twitter's perspective)
than what happens when a search engine crawls a user's profile page.

Compare and contrast to Facebook's approach. :P


David.
>
> --
> Dossy Shiobara              | do...@panoptic.com |http://dossy.org/
> Panoptic Computer Network   |http://panoptic.com/
>    "He realized the fastest way to change is to laugh at your own
>      folly -- then you can let go and quickly move on." (p. 70)


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-05-30 Thread David M. Wilson

Hey Dossy,

This sounds awesome, and I'd be very tempted, however I've come up
with a solution which should cater for any size of user account.

Right now when I periodically (~24 hours) note a set of changes to the
social graph, I create an entry in a (persistent) ring buffer
recording the new graph state. Previously, I wanted to use my existing
uid->name cache and a bunch of API calls to resolve all the changes
IDs in this entry. The solution is really simple.

Instead if there are more id->name calls required than there is quota,
I simply use up half the remaining quota resolving some names, and
reschedule the graph check for 1 hours time rather than 24. The next
check will pick up where the previous one left off doing resolution,
until eventually the entire entry is resolved (and my cache is bigger
for the future:).

This also neatly breaks down the amount of work done for very large
sets of changes into chunks of at most 35 (quota/2) HTTP requests per
hour per user.


David.

On May 30, 11:28 pm, Dossy Shiobara  wrote:
> On 5/30/09 3:46 PM, David W wrote:
>
> > [... David asks about bulk resolving of Twitter user IDs to screen_name ...]
>
> I don't know what the Twitter TOS says, but I've got a sizable cache of
> (reasonably fresh) Twitter user data thanks to Twitter Karma.
>
> Would it be a Twitter TOS violation for me to publish an API to allow
> bulk resolution of IDs to screen_name?  Is this something that folks
> would use if I made it available?
>
> --
> Dossy Shiobara              | do...@panoptic.com |http://dossy.org/
> Panoptic Computer Network   |http://panoptic.com/
>    "He realized the fastest way to change is to laugh at your own
>      folly -- then you can let go and quickly move on." (p. 70)


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-05-30 Thread Dossy Shiobara


On 5/30/09 7:06 PM, David M. Wilson wrote:

In comment to your TOS question: Twitter as a company seem a whole lot
more liberal (and realistic) when it comes to their data. I think I
may have even read this somewhere semiofficial in the past. Profile
information itself is also available to the public, and so, keeping a
local cache is probably no more harmful (from Twitter's perspective)
than what happens when a search engine crawls a user's profile page.

Compare and contrast to Facebook's approach. :P


Yeah, Facebook has very strict guidelines as to what you can cache, how 
long you can cache it, and I'm almost positive they have a 
no-redistribute policy.


I can totally understand Twitter not allowing third-party API consumers 
to redistribute data retrieved by the API - their valuation probably 
relies greatly on the number of requests ("hits") they receive - if a 
third-party service adds a layer of indirection in front of Twitter, 
then that traffic is no longer hitting Twitter directly which makes them 
appear less active than they really are.


Can someone either point to the clause in the Twitter TOS that says a 
third-party application can redistribute Twitter data to other services 
directly, or can someone from Twitter issue an official statement to 
this effect?  Or, equally useful would be a statement that clearly 
states that this would be forbidden ... so I know not to waste my time 
even thinking about this.  :-)


--
Dossy Shiobara  | do...@panoptic.com | http://dossy.org/
Panoptic Computer Network   | http://panoptic.com/
  "He realized the fastest way to change is to laugh at your own
folly -- then you can let go and quickly move on." (p. 70)


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-05-30 Thread jstrellner

We also have a sizable cache of this data (around 4 million users)
that we are already using in an API format internally at Twitturly. If
Twitter approves it, we can add it to our publicly available API.
Currently it allows conversion from both ID to username and username
to ID one at a time, and in bulk (up to 100 at a time).

Doug or Alex, does the TOS allow us to provide this via our API?

-Joel

On May 30, 3:28 pm, Dossy Shiobara  wrote:
> On 5/30/09 3:46 PM, David W wrote:
>
> > [... David asks about bulk resolving of Twitter user IDs to screen_name ...]
>
> I don't know what the Twitter TOS says, but I've got a sizable cache of
> (reasonably fresh) Twitter user data thanks to Twitter Karma.
>
> Would it be a Twitter TOS violation for me to publish an API to allow
> bulk resolution of IDs to screen_name?  Is this something that folks
> would use if I made it available?
>
> --
> Dossy Shiobara              | do...@panoptic.com |http://dossy.org/
> Panoptic Computer Network   |http://panoptic.com/
>    "He realized the fastest way to change is to laugh at your own
>      folly -- then you can let go and quickly move on." (p. 70)


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-05-31 Thread Stuart

Since there's clearly a lot of demand for this feature is it not
possible for it to be added to the official API? I'd hesitate before
building anything on top of Twitter that also relies on a third party
for something so basic.

-Stuart

-- 
http://stut.net/projects/twitter/

2009/5/31 jstrellner :
>
> We also have a sizable cache of this data (around 4 million users)
> that we are already using in an API format internally at Twitturly. If
> Twitter approves it, we can add it to our publicly available API.
> Currently it allows conversion from both ID to username and username
> to ID one at a time, and in bulk (up to 100 at a time).
>
> Doug or Alex, does the TOS allow us to provide this via our API?
>
> -Joel
>
> On May 30, 3:28 pm, Dossy Shiobara  wrote:
>> On 5/30/09 3:46 PM, David W wrote:
>>
>> > [... David asks about bulk resolving of Twitter user IDs to screen_name 
>> > ...]
>>
>> I don't know what the Twitter TOS says, but I've got a sizable cache of
>> (reasonably fresh) Twitter user data thanks to Twitter Karma.
>>
>> Would it be a Twitter TOS violation for me to publish an API to allow
>> bulk resolution of IDs to screen_name?  Is this something that folks
>> would use if I made it available?
>>
>> --
>> Dossy Shiobara              | do...@panoptic.com |http://dossy.org/
>> Panoptic Computer Network   |http://panoptic.com/
>>    "He realized the fastest way to change is to laugh at your own
>>      folly -- then you can let go and quickly move on." (p. 70)
>


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-05-31 Thread Dan Brickley


On 31/5/09 13:03, Stuart wrote:

Since there's clearly a lot of demand for this feature is it not
possible for it to be added to the official API? I'd hesitate before
building anything on top of Twitter that also relies on a third party
for something so basic.


Related suggestion: have common REST API for external services who can 
provide this information. You can probably get it from google social 
graph API too, for example.


Dan


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-05-31 Thread Philip Plante

I would like to hear a response from Twitter on the sharing of this
data.  My db has about 2 million active users, and I have another db
with 6 million or so I would gladly share.

Previously I think the response from Twitter is that they cannot
provide this as a bulk translation due to the demand it would place on
their servers.  They are able to provide the entire list of follower
IDs simply because that lives in memory and requires no joins.  The
joining to get this data would be too intensive for them.

If this is allowed maybe the community could take this a step further
and provide a common interface to share data like this.  Any thoughts?

On May 31, 1:42 pm, Dan Brickley  wrote:
> On 31/5/09 13:03, Stuart wrote:
>
> > Since there's clearly a lot of demand for this feature is it not
> > possible for it to be added to the official API? I'd hesitate before
> > building anything on top of Twitter that also relies on a third party
> > for something so basic.
>
> Related suggestion: have common REST API for external services who can
> provide this information. You can probably get it from google social
> graph API too, for example.
>
> Dan


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-05-31 Thread Nick Arnett
On Sun, May 31, 2009 at 1:39 PM, Philip Plante wrote:

>
>
> If this is allowed maybe the community could take this a step further
> and provide a common interface to share data like this.  Any thoughts?


Some sort of shared system that distributes the load across the various
databases (perhaps including the Google social graph API) would be cool from
the standpoint of collaborative activity, if nothing else.  That gets me
wondering what sort of architecture might make sense if it were to be
designed to be extended to additional data types.  One that comes to mind
from discussions here is the idea of tagging accounts business or personal.

I've been scratching my head a bit for the last few days about how all the
social media APIs might interact. A collaborative project like this might be
a very cool step in that direction, since it essentially would be a set of
Twitter-based social apps helping each other.

And... I'm game to participate.  I have about 800K in my database, selected
via some evolving social network analysis.  I'm tagging some as aggregators,
since I choose to ignore them in my analysis, and I've created a flag for
spammers, too, now that they have managed to get through my analytics a few
times.  Sharing that kind of categorization data could be quite powerful.

Memcached  seems like a potential platform for this (see
http://www.danga.com/memcached/).  Any others?

Nick


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-05-31 Thread Stuart

2009/5/31 Philip Plante :
>
> I would like to hear a response from Twitter on the sharing of this
> data.  My db has about 2 million active users, and I have another db
> with 6 million or so I would gladly share.
>
> Previously I think the response from Twitter is that they cannot
> provide this as a bulk translation due to the demand it would place on
> their servers.  They are able to provide the entire list of follower
> IDs simply because that lives in memory and requires no joins.  The
> joining to get this data would be too intensive for them.
>
> If this is allowed maybe the community could take this a step further
> and provide a common interface to share data like this.  Any thoughts?

Much as I respect Twitter and the great people who work there, I don't
buy that this would place too much demand on their servers. They
already use Memcached extensively, and this would be a pretty simple
addition to that data store.

Size-wise we're talking about no more than 50 bytes per user to store
a user ID to username. Even at 100 million users that's less than 5
gig of memory, which I'm sure is pretty small compared to their
overall Memcached footprint. And as for load on the servers each call
for up to 100 IDs would count as an API request, so it's unlikely this
method would add a huge amount to the existing usage.

Clearly I don't know much about Twitters architecture, but this seems
to me to be a pretty simple feature to implement, and relatively
cheap.

If Twitter won't implement it then maybe it's time to consider some of
us getting together to build a user cache. If enough of us get
together I'm sure we can build something that won't cost each of us
too much but will allow us to build the user API methods we need. I'd
hope that Twitter would be ok with this, and most of the useful data
could be kept up to date if they give us single user access to the
firehose. I'd be happy to lead such an effort.

-Stuart

-- 
http://stut.net/projects/twitter/


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-05-31 Thread Philip Plante

Depending on the response from Twitter on this I would like to sponsor
this effort.  I have a architecture that will already support a larger
number of tweets, and have the resources at my disposal to scale it
out as it grows.

We will just have to wait for Twitter to weigh in on the sharing
aspect.  *crosses fingers*

On May 31, 5:57 pm, Stuart  wrote:
> 2009/5/31 Philip Plante :
>
>
>
> > I would like to hear a response from Twitter on the sharing of this
> > data.  My db has about 2 million active users, and I have another db
> > with 6 million or so I would gladly share.
>
> > Previously I think the response from Twitter is that they cannot
> > provide this as a bulk translation due to the demand it would place on
> > their servers.  They are able to provide the entire list of follower
> > IDs simply because that lives in memory and requires no joins.  The
> > joining to get this data would be too intensive for them.
>
> > If this is allowed maybe the community could take this a step further
> > and provide a common interface to share data like this.  Any thoughts?
>
> Much as I respect Twitter and the great people who work there, I don't
> buy that this would place too much demand on their servers. They
> already use Memcached extensively, and this would be a pretty simple
> addition to that data store.
>
> Size-wise we're talking about no more than 50 bytes per user to store
> a user ID to username. Even at 100 million users that's less than 5
> gig of memory, which I'm sure is pretty small compared to their
> overall Memcached footprint. And as for load on the servers each call
> for up to 100 IDs would count as an API request, so it's unlikely this
> method would add a huge amount to the existing usage.
>
> Clearly I don't know much about Twitters architecture, but this seems
> to me to be a pretty simple feature to implement, and relatively
> cheap.
>
> If Twitter won't implement it then maybe it's time to consider some of
> us getting together to build a user cache. If enough of us get
> together I'm sure we can build something that won't cost each of us
> too much but will allow us to build the user API methods we need. I'd
> hope that Twitter would be ok with this, and most of the useful data
> could be kept up to date if they give us single user access to the
> firehose. I'd be happy to lead such an effort.
>
> -Stuart
>
> --http://stut.net/projects/twitter/


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-06-01 Thread Nick Arnett
On Sun, May 31, 2009 at 3:57 PM, Stuart  wrote:

>
> Much as I respect Twitter and the great people who work there, I don't
> buy that this would place too much demand on their servers. They
> already use Memcached extensively, and this would be a pretty simple
> addition to that data store.


For that very reason, I'm not sure it makes sense for third parties to
collaborate on a single-purpose distributed store.  There are user/account
properties that Twitter won't implement, at least not until there's a lot of
demonstrated value.  In other words, the developer community could
collaborate on problems that have marginal value to Twitter in the short
run.

Nick


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-06-01 Thread Doug Williams
There is currently nothing in our TOS preventing this type of project from
developing. However, we are writing our Terms for the API and at this time I
cannot speak of how a service redistributing our data will be classified.
So, proceed as you will, but be warned that we may have to have a discussion
down the road when the API Terms of Service better defines our relationship
with developers.
Thanks,
Doug
--

Doug Williams
Twitter Platform Support
http://twitter.com/dougw




On Mon, Jun 1, 2009 at 8:52 AM, Nick Arnett  wrote:

>
>
> On Sun, May 31, 2009 at 3:57 PM, Stuart  wrote:
>
>>
>> Much as I respect Twitter and the great people who work there, I don't
>> buy that this would place too much demand on their servers. They
>> already use Memcached extensively, and this would be a pretty simple
>> addition to that data store.
>
>
> For that very reason, I'm not sure it makes sense for third parties to
> collaborate on a single-purpose distributed store.  There are user/account
> properties that Twitter won't implement, at least not until there's a lot of
> demonstrated value.  In other words, the developer community could
> collaborate on problems that have marginal value to Twitter in the short
> run.
>
> Nick
>
>


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-06-01 Thread Dossy Shiobara


On 6/1/09 6:59 PM, Doug Williams wrote:

There is currently nothing in our TOS preventing this type of project
from developing. However, we are writing our Terms for the API and at
this time I cannot speak of how a service redistributing our data will
be classified.

So, proceed as you will, but be warned that we may have to have a
discussion down the road when the API Terms of Service better defines
our relationship with developers.


Doug, any kind of rough timeline for such an API TOS?  Weeks?  Months? 
Years?


--
Dossy Shiobara  | do...@panoptic.com | http://dossy.org/
Panoptic Computer Network   | http://panoptic.com/
  "He realized the fastest way to change is to laugh at your own
folly -- then you can let go and quickly move on." (p. 70)


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-06-01 Thread Doug Williams
Dossy,Within the next few months.

Thanks,
Doug
--

Doug Williams
Twitter Platform Support
http://twitter.com/dougw


On Mon, Jun 1, 2009 at 4:05 PM, Dossy Shiobara  wrote:

>
> On 6/1/09 6:59 PM, Doug Williams wrote:
>
>> There is currently nothing in our TOS preventing this type of project
>> from developing. However, we are writing our Terms for the API and at
>> this time I cannot speak of how a service redistributing our data will
>> be classified.
>>
>> So, proceed as you will, but be warned that we may have to have a
>> discussion down the road when the API Terms of Service better defines
>> our relationship with developers.
>>
>
> Doug, any kind of rough timeline for such an API TOS?  Weeks?  Months?
> Years?
>
> --
> Dossy Shiobara  | do...@panoptic.com | http://dossy.org/
>
> Panoptic Computer Network   | http://panoptic.com/
>  "He realized the fastest way to change is to laugh at your own
>folly -- then you can let go and quickly move on." (p. 70)
>


[twitter-dev] Re: Bulk id -> screen_name resolution.

2009-06-02 Thread Stuart

2009/6/1 Nick Arnett :
>
>
> On Sun, May 31, 2009 at 3:57 PM, Stuart  wrote:
>>
>> Much as I respect Twitter and the great people who work there, I don't
>> buy that this would place too much demand on their servers. They
>> already use Memcached extensively, and this would be a pretty simple
>> addition to that data store.
>
> For that very reason, I'm not sure it makes sense for third parties to
> collaborate on a single-purpose distributed store.  There are user/account
> properties that Twitter won't implement, at least not until there's a lot of
> demonstrated value.  In other words, the developer community could
> collaborate on problems that have marginal value to Twitter in the short
> run.

I'm not suggesting that it would only be usable as an ID =>
screen_name repository. I'm suggesting that we could build our own
"copy" of the user data so we can provide API calls that Twitter don't
or won't. Clearly this is not ideal, but if there's no other choice I
definitely believe it's worth the effort.

At the end of the day it comes down to this would you pay to have
higher API limits? Would Twitter be interested in providing higher
limits to paying developers?

At any rate, based on what Doug has just said it's probably not worth
doing anything until the new TOS are published, just in case it turns
out to be wasted effort.

-Stuart

-- 
http://stut.net/projects/twitter/