Re: [twitter-dev] Re: parsing out entities from tweets (a.k.a. parsing out hashtags is hard!)
Disambiguating short URLs and delivering the true URL and title would be a real plus, not just for developers, but for the target of a URL. While it does add a load to twitter's servers, it will save many, many useless hits to the target. Imagine 100,000 Twitter apps resolving each short URL found in a tweet. All of them doing it within seconds of the tweet arriving via the streaming API. It would be an automatic DOS against every site mentioned in a tweet. If this sounds hyperbolic, read the APIwiki docs that say 2,000 followers is an expected max. Ha! On Fri, May 14, 2010 at 9:15 AM, Zhami wrote: > +1 for it being optional as well -- keep the bandwidth to a minimum > for scenarios where it's not needed. > > +1 for having short URLs' original (long) URL provided (perhaps also > an option?) >
Re: [twitter-dev] Re: parsing out entities from tweets (a.k.a. parsing out hashtags is hard!)
> > Besides, if this is the library used for web, you're not doing it > right. :) > For example, to mention URL parsing only, you don't check for valid > domain names (e.g. www.test.failure is matched as URL), > some characters are not recognized as part of a link (e.g. "|" in > "http://translate.google.com/?hl=en#auto|en|bonjour")... > all we're trying to do is help people standardize on how they parse stuff. making sure you can represent what is a hash tag, a url, a username, etc., in the same way that twitter.com does it, can be difficult. -- Raffi Krikorian Twitter Platform Team http://twitter.com/raffi
Re: [twitter-dev] Re: parsing out entities from tweets (a.k.a. parsing out hashtags is hard!)
yeah - i'm extremely sensitive to that not happening again. i'll keep that in mind. i expect there may be another draft floated around before we start to roll this out. On Thu, May 13, 2010 at 11:14 PM, Rich wrote: > I can see the inside some of the entities tag causing some > developers some problems as it's the same tag name as the status. Of > course all of us should be able to handle it, but just look what > happened with the extra user id tag inside a status > > On May 13, 11:11 pm, Raffi Krikorian wrote: > > hey glenn. > > > > i think something went wrong in the copy and paste -- there should have > been > > a space between the URL and the hashtag. > > > > > > > > > > > > On Thu, May 13, 2010 at 11:02 PM, glenn gillen > wrote: > > > Raffi, > > > > > This follows on nicely from the presentation at Warblecamp last week > > > discussing how difficult it is to do this right, and I think a > > > consistent approach across all clients (including twitter.com, > > > mobile.twitter, and 3rd party apps) should be priority number 1. > > > However looking at your example: > > > > > On May 13, 10:25 pm, Raffi Krikorian wrote: > > > > { > > > > "text" : "hey @raffi tell @noradio to check out > > >http://dev.twitter.com#hot";, > > > > > > > > { > > > > "url" : "http://dev.twitter.com";, > > > > "indices" : [38, 64] > > > > }, > > > > ], > > > > "hashtags" : [ > > > > { > > > > "text" : "#hot", > > > > "indices" : [66, 69] > > > > "url" : "http://search.twitter.com/search?q=%23hot"; > > > > } > > > > ] > > > > } > > > > > Without looking at how twitter.com would currently handle that > > > example, I would have expected the url to be "http://dev.twitter.com/ > > > #hot" and for the tweet to contain no hashtag. If the hashtag always > > > takes precedence I'd have no way to link to the following without > > > using a URL shortener:http://oauth.net/core/1.0a/#anchor41 > > > -- > > > Glenn Gillen > > >http://glenngillen.com/ > > > > -- > > Raffi Krikorian > > Twitter Platform Teamhttp://twitter.com/raffi > -- Raffi Krikorian Twitter Platform Team http://twitter.com/raffi
Re: [twitter-dev] Re: parsing out entities from tweets (a.k.a. parsing out hashtags is hard!)
hey glenn. i think something went wrong in the copy and paste -- there should have been a space between the URL and the hashtag. On Thu, May 13, 2010 at 11:02 PM, glenn gillen wrote: > Raffi, > > This follows on nicely from the presentation at Warblecamp last week > discussing how difficult it is to do this right, and I think a > consistent approach across all clients (including twitter.com, > mobile.twitter, and 3rd party apps) should be priority number 1. > However looking at your example: > > On May 13, 10:25 pm, Raffi Krikorian wrote: > > { > > "text" : "hey @raffi tell @noradio to check out > http://dev.twitter.com#hot";, > > > > { > > "url" : "http://dev.twitter.com";, > > "indices" : [38, 64] > > }, > > ], > > "hashtags" : [ > > { > > "text" : "#hot", > > "indices" : [66, 69] > > "url" : "http://search.twitter.com/search?q=%23hot"; > > } > > ] > > } > > Without looking at how twitter.com would currently handle that > example, I would have expected the url to be "http://dev.twitter.com/ > #hot" and for the tweet to contain no hashtag. If the hashtag always > takes precedence I'd have no way to link to the following without > using a URL shortener: http://oauth.net/core/1.0a/#anchor41 > -- > Glenn Gillen > http://glenngillen.com/ > -- Raffi Krikorian Twitter Platform Team http://twitter.com/raffi
RE: [twitter-dev] Re: parsing out entities from tweets (a.k.a. parsing out hashtags is hard!)
Glenn Gillen wrote: > Without looking at how twitter.com would currently handle that example, I > would have expected the url to be "http://dev.twitter.com/ #hot" and for the > tweet to contain no hashtag. If the hashtag always takes precedence I'd have no > way to link to the following without using a URL shortener: > http://oauth.net/core/1.0a/#anchor41 I think you are overlooking the space between the last slash and "#hot". URLs cannot contain (un-encoded) spaces. Regards, Brian