Re: [twitter-dev] Re: Apps that Site Hack

2011-02-25 Thread Pascal Jürgens
How about a competition to develop spam-detection algorithms :)

Pascal

On Feb 24, 2011, at 10:38 PM, Dewald Pretorius wrote:

> Apart from implementing reCAPTCHA on tweet submission, follow, and
> unfollow, I can't see what Twitter can do to prevent that kind of
> abuse (can you imagine the revolt by bona fide users?). How else do
> you determine that it is an actual human and not a piece of automated
> software behind the browser on the user's desktop or laptop? The only
> other option is legally, and that depends on the country of residence
> of the owners of the software. At this point in time, it appears that
> anyone who is able to and have the inclination to write desktop
> software that bypasses the API might have carte blanche to do so.

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk


Re: [twitter-dev] I've got the error:OAuth Authentication Failed.And I tried all the wordpress to twitter plugins.

2011-02-12 Thread Pascal Jürgens
I'm no OAuth expert, but did you make sure your system time is properly 
synchronized with a regional NTP server?

Pascal

On Feb 12, 2011, at 3:13 AM, Winson wrote:

> Hi there.
> 
> Using WP 3.0.5and WP to Twitter 2.2.6 on a CentOS server, which I
> don't manage at all. I've got the error "OAuth Authentication Failed.
> Check your credentials and verify that Twitter is running.".
> 
> I've also checked all the data from the application, including erasing
> the older and creating a new one, but this issue remains. Twitter is
> active and working at this very moment I'm writing this message.
> 
> Anybody help?
> 
> PS:And I thought the problem is the Wp to Twitter 2.2.6 Plugins.Then I
> tried to install the other Wordpree to twitter
> 
> Plugins..The result:  Authentication Failed..
> 
> -- 
> Twitter developer documentation and resources: http://dev.twitter.com/doc
> API updates via Twitter: http://twitter.com/twitterapi
> Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
> Change your membership to this group: 
> http://groups.google.com/group/twitter-development-talk

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk


Re: [twitter-dev] help me plz quiry speed and geocode

2010-11-28 Thread Pascal Jürgens
Hello noname,

the search api is rate limited and only allows a non-disclosed amount of 
queries per hour. You will need to look into the streaming api: consume the 
sample stream and extract geodata. This also gives you tweets from all over the 
world.

Have a look at

http://dev.twitter.com/pages/streaming_api

Cheers,
Pascal

On Nov 28, 2010, at 12:34 AM, disc31 wrote:

> search.twitter.com/1/statuses/filter.json?
> location=-168.75,9.79,158.90,83.02
> 
> The problem i am getting is that i am getting a twitter post about
> every 30 seconds with this and after about 5/10 posts it stops feeding
> me the posts and wont let me connect for about another hour.

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk


Re: [twitter-dev] Re: Sending 1600 DMs?

2010-07-28 Thread Pascal Jürgens
Just curious:

the limit is on sending, not receiving. Why exactly would one want to send more 
than 250 tweets for one incident? Wouldn't that many messages overwhelm any 
helpful agency and actually have a detrimental effect?

Pascal


On Jul 29, 2010, at 2:06 AM, Bess wrote:

> There is no way to lift this DM daily limit?
> 
> If I build an emergency system to report accidents then official
> twitter for police or Red Cross won't be able to receive more than 250
> DM per day.
> 
> If there is a major accidents that involve more than 250 injuries
> assuming each DM per injury report, Twitter will send out "Whale"
> error after exceeding that limit?



Re: [twitter-dev] Basic Auth deprecation August 16th?

2010-07-28 Thread Pascal Jürgens
http://countdowntooauth.com/


On Jul 29, 2010, at 1:22 AM, chinaski007 wrote:

> 
> Any word on if this is still planned?
> 
> Any further extensions?
> 
> Or is the drop-dead deadline still August 16th?



Re: [twitter-dev] Twitter feed showing incorrect dates

2010-07-26 Thread Pascal Jürgens
Ben,

did you account for UTC time?

http://apiwiki.twitter.com/Return-Values

Pascal

On 26.Jul2010, at 18:21, Ben Juneau wrote:

> The dates are incorrect on my website... http://www.bjuneau.com
> 
> I'm not sure what I'm doing wrong here? Any help is much appreciated.



Re: [twitter-dev] Re: Home_timeline after tweet destroy

2010-07-23 Thread Pascal Jürgens
Yes. You can't trust anything on twitter. Hope for good, valid results, prepare 
for anything else.

Pascal

On 23.Jul2010, at 15:03, luisg wrote:

> This means that the count property is not something that you can
> trust, right?
> 
> Luis



Re: [twitter-dev] Re: Home_timeline after tweet destroy

2010-07-23 Thread Pascal Jürgens
Hi Luis,

yes, that's what I mean. You can either get the second page, or just request 
some more, as in:

> http://api.twitter.com/1/statuses/home_timeline.xml?count=25


Pascal

On 23.Jul2010, at 11:40, luisg wrote:

> Hi Pascal,
> 
> Thanks for your reply.
> What you mean with cursors?
> 
> I have a way to solve this problem:
> 
> 1- get the home_timeline
> 2- count the number of tweets got from (1) and if length < 20 I do
> another home_timeline call with page=2
> 
> I think this might work. The problem is, I need to do 2 call to
> Twitter and I don't think that's a good way...
> 
> That's the type of solution your mean? You have better solution?
> 
> Thanks a lot,
> 
> Luis



Re: [twitter-dev] Home_timeline after tweet destroy

2010-07-23 Thread Pascal Jürgens
Hi Luis,

I might be wrong there, but I think this is the way it works because of 
twitter's caching and distribution architecture. You can never assume to get 
the full amount of tweets or users - some might be filtered, deleted or 
whatnot. If you need more, just get the next page/set using cursors.

Pascal

On 23.Jul2010, at 10:18, luisg wrote:

> Hi there...
> 
> I'm experience something strange, I think...
> If I do a tweet destroy through my application and after that I get
> the home_timeline with the property count=20, I'm not getting the 20
> tweets that I should get. I'm getting just 19.
> If I do another tweet destroy and execute home_timeline again, I will
> get only 18... and so on...
> 
> Am I doing something wrong? Is there a way to force to get the 20
> tweets?
> 
> Thanks,
> 
> Luis



Re: [twitter-dev] Re: Bug: friends/ids returning invalid user IDs

2010-07-22 Thread Pascal Jürgens
Stale caches.

Pascal

On 22.Jul2010, at 23:38, soung3 wrote:

> It's not only suspended users, but also users that are no longer
> found.  Why does Twitter return ids of users that no longer exist?



Re: [twitter-dev] Need a confirmation - no way to count number of tweets found when searching for a keyword?

2010-07-22 Thread Pascal Jürgens
Hi Terrence,

if you use the tracking stream, you will get limit notices telling you how many 
tweets you missed (because the stream didn't provide the volume). This should 
be a way to determine absolute N. Notice that this still means that spam tweets 
might be filtered.

Pascal

On 22.Jul2010, at 10:44, Terrence Tee wrote:

> Then only i know the number of results available?
> Any smarter way? No?



Re: [twitter-dev] Re: Retrieving new tweets for 40-60 thousand users

2010-07-21 Thread Pascal Jürgens
Patrick,

given this explanation, you will need a lot of streams.

60k users with an average of 100 friends (low estimate)
Let's guess that every user shares 50% of the friends with others

This gives us: 60.000 * 100 *.5 = 3 million

In order to track 3 million users with the follow stream, you need:

the biggest role for this ("birddog") allows following 400.000 users
(http://dev.twitter.com/pages/streaming_api_methods)

SO:

You will need at least eight highest-level tracking privileges.


Pascal


On 21.Jul2010, at 11:51, PBro wrote:

> The situation is:
> We have 60k users all with their own twitter account, so not a single
> account with 60k followers.
> Those 60k users all have people they follow and we want to give the
> user a message when someone they are following posted a tweet.
> Hope this clarify things a bit.
> I will have a look into the streaming api.
> 
> Patrick



Re: [twitter-dev] Method statuses/filter from Streaming API return only incoming tweets and no updates

2010-07-20 Thread Pascal Jürgens
Hello Rostand,

I did my master's thesis using twitter data. My recommendation:

- You can not will not and should not get DMs. They are *private*. Even if you 
do a closed study with 300 consenting people, it's unethical. If you're in the 
US, the ethics committee of your university will have you for lunch.

- Using the stream is easier because the API requires OAUTH, but you will need 
to apply and wait. However, the API will give you historical data.

- Coding your software will take more time than you think!

---
SUMMARY

Use an external service instead! There are several people who can collect 
tweets for you. Have a look at:

http://140kit.com/
http://www.contextminer.org/

Good luck!

Pascal

On 20.Jul2010, at 21:49, Rostand wrote:

> Hi All,
> 
> I need for master research project  the public timeline of approx 300
> users.
> 
> That is, all the outgoing messages (updates, retweets.., DM) + all
> incoming messages (replies, @user...).
> 
> I first thought that the option 'follow' from the Streaming API
> following will do.
> 
> http://stream.twitter.com/1/statuses/filter.json?follow=3004231
> 
> But I just discovered that I get only the incoming tweets and not the
> outgoing ones.
> 
> Is this correct? Any workaround this ? Suggestion
> 
> It is king of urgent. Any quick reaction will be appreciated
> 
> Greetings
> 
> Rostand



Re: [twitter-dev] Re: Twitter backup script

2010-07-19 Thread Pascal Jürgens
Thomas,

last time I heard from the project, they were busy sorting the technical 
details out and still not sure who would even get access. It'll probably be 
open to a selected group of researchers first.

Pascal

On Jul 18, 2010, at 8:16 PM, Tomas Roggero wrote:

> Hi Pascal
> 
> What I'm doing is requesting 150 per hour. I've 43 pages, so in about a week 
> I'll get my "almost full" backup. :D
> 
> (I've wrote a PHP script to do that automatically, of course)
> 
> Does the Congress Library have an API or some ?



Re: [twitter-dev] Re: Twitter backup script

2010-07-18 Thread Pascal Jürgens
Tom,

at least you know that the library of congress has a backup :)

Pascal

On Jul 18, 2010, at 7:07 , Tom Roggero wrote:

> I've tried your script on Mac, it only works for first 3 pages that's
> weird (i'm running darwin ports for xml functions)...
> Anyway, tried manually do it through firefox, latest page is 16.
> That's the limit. But if you have the ID of the previous tweets you
> could use statuses/SHOW for that ID... the problem is:
> 
> IF you are backing up your account because you have all your tweet
> IDs, you will need 1 request per tweet, and maximum is 350 (with HUGE
> luck) per hour In my case, I got 10k, 3k via REST and 7k left to
> do... 7 THOUSAND REQUEST???
> 
> Come on twitter, help us, developers, start thingking on backups, we
> know you are gonna explode!.



Re: [twitter-dev] A feed greater equivalent to the old gardenhose?

2010-07-16 Thread Pascal Jürgens
In addition to the note from Taylor, I think it's a good idea to remind people 
that stream contents are identical - it's absolutely no use and a waste of 
resources to consume more than one sample stream. Just pick the largest one - 
that will contain all messages you can get.


Pascal

On Jul 16, 2010, at 16:07 , Sanjay wrote:

> Just saw the posting about the reduction in the gardenhose (and
> sprtizer) feeds ( http://t.co/d6o1npx ).  So for those of us who need
> the additional data and are designed around it (and can consume it),
> is there a way to get that level of feed back?  For me in particular
> this is going to significantly hamper the app that I'm working on and
> was looking to launch in a few weeks.
> 
> Help...?
> 
> Sanjay



Re: [twitter-dev] Re: Gardenhose feed down to a trickle

2010-07-15 Thread Pascal Jürgens
Tomo,

John replied on another thread just minutes after you:

> I hoped we'd have an email out on Thursday about this, but I'd imagine it'll 
> go out on Friday. There isn't a problem with your client.

Pascal

On Jul 16, 2010, at 6:14 , Tomo Osumi wrote:

> Dear John,
> 
> Could I have any updates about streaming api issues? I met the same
> situation as Sanjay's. both 'sample' and 'garden hose' streaming API
> have still 1/3 - 1/5 as much traffic as usual.
> 
> Tomo
> http://twitter.com/elrana/



Re: [twitter-dev] Gardenhose feed down to a trickle

2010-07-15 Thread Pascal Jürgens
# Idle musing

Inflation adjustment?

# end

Pascal

On Jul 15, 2010, at 17:14 , John Kalucki wrote:

> This is a known issue. We'll have an email about the Gardenhose and Spritzer 
> later today.
> 
> -John Kalucki
> http://twitter.com/jkalucki
> Infrastructure, Twitter Inc.



Re: [twitter-dev] Twitter API HTTP Heading

2010-07-15 Thread Pascal Jürgens
Nicholas,

Did you just publish your account credentials?


Pascal

On Jul 15, 2010, at 14:02 , Nicholas Kingsley wrote:

>   INC send$, "Authorization: Basic FishyMcFlipFlop:burpmachine\r\n"



Re: [twitter-dev] Juitter - Some accounts aren't indexed by search?

2010-07-13 Thread Pascal Jürgens
Hi,

those are probably accounts which twitter filtered out. The docs are pretty 
clear on this and give practical advice:

http://dev.twitter.com/pages/streaming_api_concepts

> Both the Streaming API and the Search API filter statuses created by a small 
> proportion of accounts based upon status quality metrics. For example, 
> frequent and repetitious status updates may, in some instances, and in 
> combination with other metrics, result in a different status quality score 
> for a given account. Results that are not selected by user id, for example: 
> samples and keyword track, are filtered by this status quality metric. 
> Results that are selected by user id, currently only results from the follow 
> predicate, are unfiltered and allow all matching statuses to pass. If an 
> expected user's statuses are not present in a non-follow-predicate stream 
> type, manually cross-check the user against Search results. If the user's 
> statuses are also not returned in Search, you can assume that the user's 
> statuses will not be returned by non-follow-predicated streams.
> For more details see: http://help.twitter.com/forums/10713/entries/42646 
> which states, in part:
> In order to keep your search results relevant, Twitter filters search results 
> for quality. Our search results will not include suspended accounts, or 
> accounts that may jeopardize search quality. Material that degrades search 
> relevancy or creates a bad search experience for people using Twitter may be 
> permanently removed.


On Jul 13, 2010, at 23:30 , codeless wrote:

> Hey there, I'm working with Juitter (http://juitter.com) on a project
> and have noticed that some users tweets won't appear on the live
> stream. And not just accounts that are set to private, but regular
> twitter accounts. I have done some research and have found some saying
> that not all Twitter accounts are indexed by the Twitter search
> engine. Is there anything I can change in Juitter to fix it so it'll
> work with all non-private accounts? Or a way to force Twitter into
> indexing a certain account? Any and all information is welcomed. Thank
> you for your help.



Re: [twitter-dev] Re: Can not tweet to e.g. #Studentenjob anymore

2010-07-12 Thread Pascal Jürgens
Michael,

you can find out how to check here:

http://help.twitter.com/entries/15790-how-to-contest-account-suspension

Pascal

On Jul 12, 2010, at 10:32 , microcosmic wrote:

> Hello Pascal.
> 
> It's not the case that our account is disabled. Or is there a "hidden"
> message that is saying "account disabled"?
> 
> Regards,
> 
> Michael



Re: [twitter-dev] Can not tweet to e.g. #Studentenjob anymore

2010-07-11 Thread Pascal Jürgens
Hello Michael,

just an idea:
try to log into the twitter website with your account and see whether it was 
disabled for spam.

Pascal
On Jul 11, 2010, at 17:34 , microcosmic wrote:

> Hello there.
> 
> Since Friday I am not able to send tweets to e.g. #Studentenjob or
> #Nebenjob anymore. I found out that it is not possible to send tweets
> to any #... for our twitter account.
> 
> What can I do? Is there an error in our program we use?
> 
> Thanks in advance.
> 
> Regards,
> 
> Michael



Re: [twitter-dev] Streaming API time drifting problem and possible solutions

2010-07-08 Thread Pascal Jürgens
Larry,

moreover, I assume you checked I/O and CPU load. But even if that's not the 
issue, you should absolutely check if you have simplejson with c extension 
installed. The python included version is 1.9 which is decidedly slower than 
the new 2.x branch. You might see json decoding load drop by 50% or more.


Pascal


On Jul 8, 2010, at 17:31 , Larry Zhang wrote:

> Hi everyone,
> 
> I have a program calling the statuses/sample method of a garden hose
> of the Streaming API, and I am experiencing the following problem: the
> timestamps of the tweets that I downloaded constantly drift behind
> real-time, the time drift keeps increasing until it reaches around 25
> minutes, and then I get a timeout from the request, sleep for 5
> seconds and reset the connection. The time drift is also reset to 0
> when the connection is reset.
> 
> One solution for this I have now is to proactively reset the
> connection more frequently, e.g., if I reconnect every 1 minute, the
> time drift I get will be at most 1 minute. But I am not sure whether
> this is allow by the API.
> 
> So could anyone tell me if you have the same problem as mine or I am
> using the API in the wrong way. And is it OK to reset connection every
> minute?
> 
> I am using Tweepy (http://github.com/joshthecoder/tweepy) as the
> library for accessing the Streaming API.
> 
> Thanks a lot!
> -Larry



Re: [twitter-dev] Streaming API time drifting problem and possible solutions

2010-07-08 Thread Pascal Jürgens
Larry,

have you decoupled the processing code from tweepy's StreamListener, for 
example using a Queue.Queue oder some message queue server?

Pascal

On Jul 8, 2010, at 17:31 , Larry Zhang wrote:

> Hi everyone,
> 
> I have a program calling the statuses/sample method of a garden hose
> of the Streaming API, and I am experiencing the following problem: the
> timestamps of the tweets that I downloaded constantly drift behind
> real-time, the time drift keeps increasing until it reaches around 25
> minutes, and then I get a timeout from the request, sleep for 5
> seconds and reset the connection. The time drift is also reset to 0
> when the connection is reset.
> 
> One solution for this I have now is to proactively reset the
> connection more frequently, e.g., if I reconnect every 1 minute, the
> time drift I get will be at most 1 minute. But I am not sure whether
> this is allow by the API.
> 
> So could anyone tell me if you have the same problem as mine or I am
> using the API in the wrong way. And is it OK to reset connection every
> minute?
> 
> I am using Tweepy (http://github.com/joshthecoder/tweepy) as the
> library for accessing the Streaming API.
> 
> Thanks a lot!
> -Larry



Re: [twitter-dev] Re: Search API rate limit

2010-07-07 Thread Pascal Jürgens
Shan,

as far as I know twitter has been reluctant to state definite numbers, so 
you'll have to experiment and implement a backoff mechanism in your app. Here 
is the relevant part of the docs:

> Search API Rate Limiting
> The Search API is rate limited by IP address. The number of search requests 
> that originate from a given IP address are counted against the search rate 
> limiter. The specific number of requests a client is able to make to the 
> Search API for a given hour is not released. Note that the Search API is not 
> limited by the same 150 requests per hour limit as the REST API. The number 
> is quite a bit higher and we feel it is both liberal and sufficient for most 
> applications. We do not give the exact number because we want to discourage 
> unnecessary search usage.
>  
> Search API usage requires that applications include a unique and identifying 
> User Agent string. A HTTP Referrer is expected but is not required. Consumers 
> using the Search API but failing to include a User Agent string will receive 
> a lower rate limit.
>  
> An application that exceeds the rate limitations of the Search API will 
> receive HTTP 420 response codes to requests. It is a best practice to watch 
> for this error condition and honor the Retry-After header that instructs the 
> application when it is safe to continue. The Retry-After header's value is 
> the number of seconds your application should wait before submitting another 
> query (for example: Retry-After: 67).

Cheers,

Pascal


On Jul 7, 2010, at 1:55 , Ramanean wrote:

> Matt,
> 
> 
> What is exact limit..Whether I can write to twitter for whitelisting
> of the IP?
> 
> Whether whitelisting of the IP would do any good?
> 
> 
> Shan



Re: [twitter-dev] Re: Friend and Follower count - since timestamp

2010-07-07 Thread Pascal Jürgens
Just wanted to add,

it's a sad thing etags see hardly any use today. Back when the graph methods 
weren't paginated, you could just send a request with the etag header set and 
it would come back not modified, a very efficient thing to do. It won't give 
you the difference between arbitrary points in time, but for most applications, 
it's quite enough.

I don't think anybody ever confirmed that this even works with paginated calls, 
but I don't see why it couldn't (especially since pages are apparently newest 
first, as Raffi said).

Pascal

On Jul 7, 2010, at 10:50 , nischalshetty wrote:

> Raised an issue: http://code.google.com/p/twitter-api/issues/detail?id=1732
> 
> Hope one of you finds time to work on this, would be a big help for me
> as well whole lot of other apps that deal with a users friend and
> followers.
> 
> -Nischal



Re: [twitter-dev] Re: Rate Limiting

2010-07-06 Thread Pascal Jürgens
Just a sidenote: This can be coincidental. Unless you try several dozen times 
with each client, no valid inference can be drawn from the tests.

Pascal
On Jul 6, 2010, at 18:46 , Johnson wrote:

> I notice that the rate limit is application specific. I've tried a few
> clients, some of them goes thru, some don't.



Re: [twitter-dev] lockouts are the new black

2010-07-06 Thread Pascal Jürgens
With "multi-level loosely-coordinated best-effort distributed cache" you 
certainly got the naming, all that's left is the cache invalidation. :)

Pascal

On Jul 6, 2010, at 18:10 , John Kalucki wrote:

> These lockouts are almost certainly due to a performance optimization 
> intended to reduce network utilization by increasing physical reference 
> locality in a multi-level loosely-coordinated best-effort distributed cache. 
> Not easy to get right, and the engineers involved are working to resolve the 
> issue. There's absolutely no intention to lock people out.
> 
> -John Kalucki
> http://twitter.com/jkalucki
> Infrastructure, Twitter Inc.
> 



Re: [twitter-dev] Streaming API and Oauth

2010-07-05 Thread Pascal Jürgens
Quoting John Kalucki:

> We haven't announced our plans for streaming and oAuth, beyond stating that 
> User Streams will only be on oAuth.


Right now, basic auth and oAuth both work on streaming, and that won't change 
when basic for REST turns off. Since there's no set shutdown date yet for 
basic/streaming, I wouldn't expect it to happen soon.

Pascal

On Jul 5, 2010, at 20:25 , Zhami wrote:

> The Oauth Overview page 
> has sections for three APIs: REST, Search, and Streaming. The bottom
> of the page displays a ribbon stating that "The @twitterapi team will
> be shutting of basic authentication for the Twitter API."  Does this
> mean all of the Twitter APIs (REST, Search, and Streaming)? or just
> the REST API?
> 
> Most specifically, while I know that the Streaming API end-point now
> supports OAuth, I do not know if Streaming will require OAuth come
> August 16th...  can someone please clarify. TIA.



Re: [twitter-dev] Farsi Twitter App

2010-07-04 Thread Pascal Jürgens
Interesting. Your method is similar to the breadth-first crawl that many people 
do (for example, see the academic paper by Kwak et al. 2010).

You have to keep in mind, however, that you are only crawling the giant 
component of the network, the connected part. If there are any turkish users 
who have their *separate* subpopulation, which is not connected to the rest, 
you won't find those.

You could easily find those with a sample stream. Although I have to admit that 
the number of non-connected users is not so big, no one has really tested that 
so far.

Pascal

On Jul 3, 2010, at 20:00 , Furkan Kuru wrote:

> We have implemented the Turkish version: 
> Twitturkhttp://twitturk.com/home/lang/en
> 
> We skipped the first three steps but started with a few Turkish users and 
> crawled all the network and for each new user we tested if the description or 
> latest tweets are in Turkish language.
> 
> We have almost 100.000 Turkish users identified so far.
> 
> Using stream api we collect their tweets and we find out the popular people 
> and key-words, top tweets (most retweeted ones) among Turkish people.



Re: [twitter-dev] Re: Farsi Twitter App

2010-07-03 Thread Pascal Jürgens
Google Translate is easy, but *very* inaccurate. I tested it on a set of 30.000 
tweets, and more than 60% were unreliably classified (google will tell you the 
confidence of the classification inline).

Don't rely on that for language detection unless you pretty much don't care!

On Jul 4, 2010, at 4:43 , Sami wrote:

> A simple solution I experimented is using Google Ajax translation
> APIs, it is pretty 
> reliablehttp://code.google.com/apis/ajaxlanguage/documentation/
> but it works only for web apps and you have to push all the tweets
> from the sample stream to the client to filter.



Re: [twitter-dev] Farsi Twitter App

2010-07-03 Thread Pascal Jürgens
John,

yes, thanks a lot for the design proposal - that is what inspired my own 
system. I am not primarily filtering by language, however, but by country, so 
I'm using time zone and location data together with a list of cities from 
http://www.geonames.org/

The manual cross-check in my thesis shows that this gets you close to 1 in 
specificity and above .7 in sensitivity.

From my experience, the key is to develop efficient language-specific tests 
with as low an error rate as possible (this, sadly, largely excludes 
conventional SVM, HMM models etc, because tweets are so short and full of weird 
punctuation).

Pascal

On Jul 3, 2010, at 15:26 , John Kalucki wrote:

> It's great to hear that someone implemented all this. There's a similar 
> technique documented here: 
> http://dev.twitter.com/pages/streaming_api_concepts, under By Language and 
> Country. My suggestion was to start with a list of stop words to build your 
> user corpus -- but I don't know how well Farsi works with track, so random 
> sample method might indeed be better.
> 
> -John Kalucki
> http://twitter.com/jkalucki
> Infrastructure, Twitter Inc.



Re: [twitter-dev] Farsi Twitter App

2010-07-03 Thread Pascal Jürgens
Hi Lucas,

as someone who approached a similar problem, my recommendation would be to 
track users.  In order to get results quickly (rather than every few hours via 
user timeline calls), you need streaming access, which is a bit more 
complicated. I implemented such a system in order to track the german-speaking 
population of twitter users, and it works extremely well.

1) get access to the sample stream (5% or 15% type) (warning: the 15% stream is 
~10GB+ a day)

2) construct an efficient cascading language filter, ie:
- first test the computationally cheap AND precise attributes, such as a list 
of known farsi-only keywords or the location box
- if those attribute tests are negative, perform more computationally expensive 
tests
- if in doubt, count it as non-farsi! False positives will kill you if you 
sample a very small population!

3) With said filter, identify the accounts using farsi

4) Perform a first-degree network sweep and scan all their friends+followers, 
since those have a higher likelihood to speak farsi as well

5) compile a list of those known users

6) track those users with the shadow role stream (80.000 users) or higher.

If your language detection code is not efficient enough, you might want to 
include a cheap, fast and precise negative filter of known non-farsi 
attributes. Test that one before all the others and you should be able to 
filter out a large part of the volume.


Don't hesitate to ask for any clarification!

Pascal Juergens
Graduate Student / Mass Communication
U of Mainz, Germany

On Jul 3, 2010, at 0:36 , Lucas Vickers wrote:

> Hello,
> 
> I am trying to create an app that will show tweets and trends in
> Farsi, for native speakers.  I would like to somehow get a sample
> 'garden hose' of Farsi based tweets, but I am unable to come up with
> an elegant solution.
> 
> I see the following options:
> 
> - Sample all tweets, and run a language detection algorithm on the
> tweet to determine which are/could be Farsi.
>  * Problem: only a very very small % of the tweets will be in Farsi
> 
> - Use the location filter to try and sample tweets from countries that
> are known to speak Farsi, and then run a language detection algorithm
> on the tweets.
>  * Problem: I seem to be limited on the size of the coordinate box I
> can provide.  I can not even cover all of Iran for example.
> 
> - Filter a standard farsi term.
>  * Problem: will limit my results to only tweets with this term
> 
> - Search for laguage = farsi
>   * Problem: Not a stream, I will need to keep searching.
> 
> I think of the given options I mentioned what makes the most sense is
> to search for tweets where language=farsi, and use the since_id to
> keep my results new.  Given this method, I have three questions
> 1 - since_id I imagine is the highest tweet_id from the previous
> result set?
> 2 - How often can I search (given API limits of course) in order to
> ensure I get new data?
> 3 - Will the language filter provide me with users who's default
> language is farsi, or will it actually find tweets in farsi?
> 
> I am aware that the user can select their native language in the user
> profile, but I also know this is not 100% reliable.
> 
> Can anyone think of a more elegant solution?
> Are there any hidden/experimental language type filters available to
> us?
> 
> Thanks!
> Lucas



Re: [twitter-dev] My Client API was Decreased to 175

2010-07-01 Thread Pascal Jürgens
http://status.twitter.com/post/750140886/site-tweaks


On Jul 1, 2010, at 9:49 , PiPS wrote:

> Hi.
> 
> I am developing on twitter client.
> 
> My client uses xAuth.
> 
> But.. My Client API is 175
> 
> 
> That was before 350.
> 
> Why was suddenly reduced by half?
> 



Re: [twitter-dev] Re: Twitter 1500 search results

2010-06-07 Thread Pascal Jürgens
Good to know. Did you mean to say "consume … streaming results"? I don't really 
see where you use the stream here.

Also, please note that it's not a good idea to work with "since_id" and 
"max_id" any more, because those will soon be (already are?) NON-SEQUENTIAL. 
This means you will lose tweets if you rely on the IDs incrementing over time. 
To quote the relevant email from Taylor Singletary:

> Please don't depend on the exact format of the ID. As our infrastructure 
> needs evolve, we might need to tweak the generation algorithm again.
> 
> If you've been trying to divine meaning from status IDs aside from their role 
> as a primary key, you won't be able to anymore. Likewise for usage of IDs in 
> mathematical operations -- for instance, subtracting two status IDs to 
> determine the number of tweets in between will no longer be possible

Cheers.

On Jun 8, 2010, at 0:06 , sahmed10 wrote:

> yes it works! This algorithm works
> Its something like this
> Set the query to a string with appropriate To and From dates. Then
> consuem the 1500 streaming results and also save the status id of the
> very last tweet you got. As they are in order sequentially(with gaps)
> it wont be a problem. The very last tweet status id should be assigned
> as the MaxId for the next set of results and so on.



Re: [twitter-dev] Twitter 1500 search results

2010-06-07 Thread Pascal Jürgens
As stated in the API WIKI, the number of search results you can get at any 
given point in time for one search term is indeed ~1500.
(http://apiwiki.twitter.com/Twitter-Search-API-Method:-search)

There are several ways to go beyond that.

a) Do perpetual searches (say, one every day), and merge the results
b) Get streaming access and track keywords in real time
c) Vary search terms and combine the results

Good luck.


On Jun 7, 2010, at 22:53 , sahmed10 wrote:

> I am developing an application where i am trying to get more than 1500
> results for a search query. Is it possible?
> For example when i specify return all result from 2nd June to 6th June
> with the search string of "iphone" i only get 1500 latest tweets, but
> on the other hand i am interested in all the tweets which have metion
> of iphone from 2nd June to 6th June.Is there a work around this?



[twitter-dev] Re: 2 week advance notice: changes to /friends/ids and /followers/ids

2009-08-01 Thread Pascal Jürgens

Thanks for the notification.

What will this mean for etag checks?
I currently fetch a large number of graphs in regular intervals. Any
check that returns a 304 should incur little cost.

Will I need to crawl all the pages and check for their 304?
If I get a 304 on the first one, can I assume that the rest remains
equally unchanged?

Pascal

--
Pascal Juergens
twitter.com/pascal


On Jul 31, 7:35 pm, Alex Payne  wrote:
> The Twitter API currently has two methods for returning a user's
> denormalized "social graph": /friends/ids [1] and /followers/ids [2].
> These methods presently allow pagination by use of a "?page=n"
> parameter; without that parameter, they attempt to return all user IDs
> in the specified set. If you've used this methods, particularly for
> exploring the social graphs of users that are following or followed by
> a large number of other users, you've probably run into lag and server
> errors.
>
> In two weeks, we'll be addressing this with a change in back-end
> infrastructure. The "page" parameter will be replaced with a "cursor"
> parameter, which in turn will result in a change in the response
> bodies for these two methods. Whereas currently you'd receive an array
> response like this (in JSON):
>
>   [1,2,3,...]
>
> You will now receive:
>
>   {ids: [1,2,3], next_id: 1231232}
>
> You can then use the "next_id" value to paginate through the set:
>
>   /followers/ids.json?cursor=1231232
>
> To "start" paginating:
>
>   /followers/ids.json?cursor=-1
>
> The negative one (-1) indicates that you want to begin paginating.
> When the next_id value is zero (0), you're at the last page.
>
> Documentation of the new functionality will, of course, be provided on
> the API Wiki in advance of the change going live. If you have any
> questions or concerns, please contact us as soon as possible.
>
> [1] http://apiwiki.twitter.com/Twitter-REST-API-Method%3A-friends%C2%A0ids
> [2] http://apiwiki.twitter.com/Twitter-REST-API-Method%3A-followers%C2%A0ids
>
> --
> Alex Payne - Platform Lead, Twitter, Inc.http://twitter.com/al3x