[twitter-dev] Re: Languages available on Twitter.

2010-12-05 Thread Jim DeLaHunt
Adam:

On Dec 5, 5:59 am, a.przewo...@yahoo.pl a.przewo...@yahoo.pl
wrote:
 Hello,

 I'd like to ask will Twitter available in other languages? I think
 twitter might gain more popularity if it might be available in many
 languages. I hope You will think about that.

Twitter is gradually translating the UI of their web site into other
languages. See their blog posts: Growing Around the World http://
blog.twitter.com/2010/04/growing-around-world.html and Coming Soon:
Twitter in More Languages http://blog.twitter.com/2009/10/coming-
soon-twitter-in-more-languages.html. Perhaps you are interested in
Twitter.com localised into Polish. I'm only an outside developer, not
a Twitter decision maker, but I expect that Twitter agrees with the
value of translating Twitter.com into many languages. It's only a
matter of when, with what priority relative to other projects.

If you use a different Twitter client than Twitter.com, then of course
the developer of that client is responsible for making it available in
other languages.

And, of course, the Twitter message stream has been open to messages
in most languages from the very start.

I hope this is helpful for you.

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk


[twitter-dev] Re: parsing out entities from tweets (a.k.a. parsing out hashtags is hard!)

2010-05-13 Thread Jim DeLaHunt
Raffi:



On May 13, 2:25 pm, Raffi Krikorian ra...@twitter.com wrote:
 as shown above, we'll be parsing out all mentioned users, all lists, all
 included URLs, and all hashtags

This is an interesting step forward.  The internationalisation
considerations can be sticky, though.  I did some entity-parsing from
tweets as part of my Twanguages project (a language census of
Twitter). One discover was that people are in fact using hashtags with
non-latin scripts. Another is that some people are using the '#'
character without intending to create a hashtage (e.g. we are #2 in
line). How will your entity parsing handle non-latin hashtags, latin-
character hashtags with accented characters, and strings starting with
'#' not intended as hashtags?

Also note that URLs can now have non-Latin top-level domain names as
well as second-level domain names and other path parts. For instance,
http://وزارة-الأتصالات.مصر is a valid URL in the .مصر top-level
domain. Will your entity parsing code handle such URLs?

In any case, it would be very helpful if the platform team would
document exactly what regular expressions govern the entities you
recognise. I might not agree with your definition of hashtag syntax,
but at least I want to know what it is.  See for example the running
questions on how to measure the length of a status message. 

 matt sanford
 (@mzsanford) on our internationalization team released the twitter-text
 library (http://github.com/mzsanford/twitter-text-rb) to help making parsing
 easier and standardized (in fact, we use this library ourselves), but we on
 the Platform team wondered if we could make this even easier for our
 developers. ...

I wasn't aware of this, and I'll take a look.  Thank you for the tip!
— Jim DeLaHunt, Vancouver, Canada


[twitter-dev] Re: Empty reply from server on Streaming API?

2009-11-18 Thread Jim DeLaHunt
Just to close the loop on my issue: I got some off-list help, Twitter
investigated, and it turns out my IP address had been blacklisted in
error. The blacklisting is removed, and I'm back in business.

I must say it's nice that I could ask a question on this list, and get
pretty much immediate attention from the proper Twitter person, and
over a weekend at that. Thanks, John Kalucki, and thanks, Twitter.

  --Jim DeLaHunt, Vancouver, Canada@jdlh   
http://jdlh.com/en/pr/twanguages.html
   Twanguages: a language census of Twitter @twanguages


On Nov 15, 6:52 am, John Kalucki jkalu...@gmail.com wrote:
 There are two levels of blacklisting. One is a temporary band that
 resets every few minutes. This one gives you 401 errors. Then there's
 an IP black hole that is removed by an operator. Currently the IP
 black hole sends a TCP RST, but we might might also null route you.
 You can verify an IP block by attempting to connect from a different
 network.

 If you provide an account name, I can look through the logs and see
 what happened. An IP address can also be helpful. In the absence of
 these keys, I can only speculate as to what occurred.

 -John Kaluckihttp://twitter.com/jkalucki
 Services, Twitter Inc.

 On Nov 15, 12:54 am, Jim DeLaHunt jim.delah...@gmail.com wrote:

  John:

  Thanks very much for the reply.

  On Nov 14, 8:30 pm, John Kalucki jkalu...@gmail.com wrote:

   This sounds like you were ignoring HTTP error codes and eventually got
   blacklisted. 
   Consider:http://apiwiki.twitter.com/Streaming-API-Documentation#Connecting

  Hmm... I was launching single curl requests, making one connection
  then breaking it after max 3 seconds. I would then wait 6 minutes
  before trying to connect again.  I didn't record the HTTP result code
  I got back, but it seems that according to Streaming-API-
  Documentation#Connecting I was being tremendously conservative.  That
  doc recommends backing off for 10 to 240 seconds on an HTTP error code
  (200); I always backed off for 360 seconds immediately, whether the
  HTTP error code was good or bad.

  How would backing off by *more* than the docs call for get me
  blacklisted?

   You can tell for sure by turning off --silent and using -v to see
   what's going on. You should be getting some sort of message back, or
   absolutely nothing back. Those codes are not HTTP error codes, they
   must be some curl artifact.

  Correct, the codes 6 and 52 are defined by curl. 
  Seehttp://curl.haxx.se/docs/manpage.html. Using -v and other curl
  options, I see clearly that what I'm getting back is absolutely
  nothing back: 0 bytes in response to my HTTP query. (That's the
  meaning of the code 52.)

  For the last 6 hours, I've polled once per hour (once per 3600
  seconds), and this null response has not changed.

  The docs don't say how to confirm that I've been blacklisted. Any
  suggestions for how to confirm that? Nor do they say what to do if I
  am in fact blacklisted. They say that the blacklist lasts an
  indeterminate period of time, so maybe they are implying I should
  just wait and the system will list the blacklist itself.

  The biggest issue, though, is to understand why I could have become
  blacklisted, when I backed off for 360 seconds after each attempt.
  Because right now, I don't know what I should do differently.

  Thanks again for the guidance.
     --Jim DeLaHunt, Vancouver, Canada   �...@jdlh
     Twanguages: a language census of Twitter 
  @twanguageshttp://jdlh.com/en/pr/twanguages.html

   Tcpdump is also sometimes useful.

   -John Kaluckihttp://twitter.com/jkalucki
   Services, Twitter Inc.

   On Nov 14, 6:13 pm, Jim DeLaHunt jim.delah...@gmail.com wrote:

Am I the only one seeing this? I call the Streaming API 10x/hour. For
the last 23 hours or so, I've been getting bad responses every time.

I use a cron job to call from the Linux shell:

curl --user myid:mypassword --silent --fail --max-time 3 --retry 
0http://stream.twitter.com/1/statuses/sample.xml

and I get usually a curl return code (52) Empty reply from server,
though sometimes (6) name lookup timed out. Same thing happens when
I ask for .json instead of .xml.

The failures started at the rate of 1-2/hour on 2009/11/13 09:00h UTC
(Friday early morning PST), though they became continuous as of
200/11/14 03:24h UTC (Friday evening PST), and remain continuous.

Is anyone else calling this API and failing? Or succeeding? in the
last 24 hours?

Thank you,
   --Jim DeLaHunt, Vancouver, Canada   �...@jdlh
   Twanguages: a language census of Twitter   
@twanguageshttp://jdlh.com/en/pr/twanguages.html


[twitter-dev] Re: Empty reply from server on Streaming API?

2009-11-15 Thread Jim DeLaHunt

John:

Thanks very much for the reply.

On Nov 14, 8:30 pm, John Kalucki jkalu...@gmail.com wrote:
 This sounds like you were ignoring HTTP error codes and eventually got
 blacklisted. 
 Consider:http://apiwiki.twitter.com/Streaming-API-Documentation#Connecting

Hmm... I was launching single curl requests, making one connection
then breaking it after max 3 seconds. I would then wait 6 minutes
before trying to connect again.  I didn't record the HTTP result code
I got back, but it seems that according to Streaming-API-
Documentation#Connecting I was being tremendously conservative.  That
doc recommends backing off for 10 to 240 seconds on an HTTP error code
(200); I always backed off for 360 seconds immediately, whether the
HTTP error code was good or bad.

How would backing off by *more* than the docs call for get me
blacklisted?

 You can tell for sure by turning off --silent and using -v to see
 what's going on. You should be getting some sort of message back, or
 absolutely nothing back. Those codes are not HTTP error codes, they
 must be some curl artifact.

Correct, the codes 6 and 52 are defined by curl. See
http://curl.haxx.se/docs/manpage.html . Using -v and other curl
options, I see clearly that what I'm getting back is absolutely
nothing back: 0 bytes in response to my HTTP query. (That's the
meaning of the code 52.)

For the last 6 hours, I've polled once per hour (once per 3600
seconds), and this null response has not changed.

The docs don't say how to confirm that I've been blacklisted. Any
suggestions for how to confirm that? Nor do they say what to do if I
am in fact blacklisted. They say that the blacklist lasts an
indeterminate period of time, so maybe they are implying I should
just wait and the system will list the blacklist itself.

The biggest issue, though, is to understand why I could have become
blacklisted, when I backed off for 360 seconds after each attempt.
Because right now, I don't know what I should do differently.

Thanks again for the guidance.
   --Jim DeLaHunt, Vancouver, Canada@jdlh
   Twanguages: a language census of Twitter @twanguages
http://jdlh.com/en/pr/twanguages.html

 Tcpdump is also sometimes useful.

 -John Kaluckihttp://twitter.com/jkalucki
 Services, Twitter Inc.

 On Nov 14, 6:13 pm, Jim DeLaHunt jim.delah...@gmail.com wrote:

  Am I the only one seeing this? I call the Streaming API 10x/hour. For
  the last 23 hours or so, I've been getting bad responses every time.

  I use a cron job to call from the Linux shell:

  curl --user myid:mypassword --silent --fail --max-time 3 --retry 
  0http://stream.twitter.com/1/statuses/sample.xml

  and I get usually a curl return code (52) Empty reply from server,
  though sometimes (6) name lookup timed out. Same thing happens when
  I ask for .json instead of .xml.

  The failures started at the rate of 1-2/hour on 2009/11/13 09:00h UTC
  (Friday early morning PST), though they became continuous as of
  200/11/14 03:24h UTC (Friday evening PST), and remain continuous.

  Is anyone else calling this API and failing? Or succeeding? in the
  last 24 hours?

  Thank you,
     --Jim DeLaHunt, Vancouver, Canada   �...@jdlh
     Twanguages: a language census of Twitter   
  @twanguageshttp://jdlh.com/en/pr/twanguages.html


[twitter-dev] Empty reply from server on Streaming API?

2009-11-14 Thread Jim DeLaHunt

Am I the only one seeing this? I call the Streaming API 10x/hour. For
the last 23 hours or so, I've been getting bad responses every time.

I use a cron job to call from the Linux shell:

curl --user myid:mypassword --silent --fail --max-time 3 --retry 0
http://stream.twitter.com/1/statuses/sample.xml

and I get usually a curl return code (52) Empty reply from server,
though sometimes (6) name lookup timed out. Same thing happens when
I ask for .json instead of .xml.

The failures started at the rate of 1-2/hour on 2009/11/13 09:00h UTC
(Friday early morning PST), though they became continuous as of
200/11/14 03:24h UTC (Friday evening PST), and remain continuous.

Is anyone else calling this API and failing? Or succeeding? in the
last 24 hours?

Thank you,
   --Jim DeLaHunt, Vancouver, Canada@jdlh
   Twanguages: a language census of Twitter   @twanguages
http://jdlh.com/en/pr/twanguages.html


[twitter-dev] Re: Japanese Characters in Image URLs

2009-10-12 Thread Jim DeLaHunt

Happy Canadian Thanksgiving, Kyle:

On Oct 12, 12:00 am, Kyle Mulka repalvigla...@yahoo.com wrote:
... Twitter API
 developers have to deal with non-ASCII characters in image URLs
 because Twitter doesn't change the name the user gave their image file
 to something cleaner.

 The PHP code below is giving me the standard Amazon S3 access denied
 error message, but if I copy the URL of the image and paste it into my
 browser, that doesn't happen. What do I need to do to get this to
 work?

I expect the URLs you are getting back are UTF-8 encoded strings. I
believe the authoritative spec is RFC-3986 http://tools.ietf.org/html/
rfc3986. My understanding of its contents is:
a) the path part of the URL is an octet stream, and
b) the web server may interpret that octet stream as it pleases, so it
may be in any encoding, but
c) it's good practice for agents to present and interpret path parts
of URLs as UTF-8 encoded text, and
d) any octet in the path part of the URL which aren't in the subset of
ASCII permitted in URLs should be percent-encoded, but
e) it would be nice for agents to accept unpermitted byte values in
the path part of the URL, and
f) it would be nice for agents to interpret path parts of URLs as
being encoded in UTF-8 unless they know otherwise.

As usual, Wikipedia also has a nice writeup.  See
http://en.wikipedia.org/wiki/Percent-encoding and linked articles.

I did an experiment with Firefox 3 which showed it was respecting the
above spec.  I pasted a URL with non-ASCII UTF-8 characters in it, and
it blithely accepted them, perhaps percent-encoded them, and
successfully requested the page. Then I visited a URL with percent-
encoded characters (non-English versions of Wikipedia are a bounty of
such URLs), pasted one of those URLs in to the Firefox location field,
and Firefox removed the percent-encoding and displayed the URL as a
UTF-8 string.

Thus you might want to experiment with revising your code to which
handles URLs and other strings from the Twitter API to have UTF-
encoded strings, or byte strings with no encoding interpretation. Be
ready to apply your own percent-encoding of received URLs per d)
above.

I don't know the PHP incantations for string encoding, sorry. I do
know it differs between PHP 4, PHP 5, and PHP 6. (I just brushed up on
the corresponding Python incantations last night, as it happens.)

 $ch = curl_init('http://twitter.com/users/show.json?
 screen_name=rennri');
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
 $json = curl_exec($ch);
 curl_close($ch);

 $data = json_decode($json, true);

 $ch2 = curl_init($data['profile_image_url']);
 curl_exec($ch2);
 curl_close($ch2);

 --
 Kyle Mulkahttp://twilk.com- put your friends' faces on your Twitter background

Hope this helps!

—Jim DeLaHunt,  Vancouver, Canada, http://jdlh.com/ multilingual
websites consultant.