Re: [twitter-dev] Re: Snowflake: An update and some very important information
Signed bigint is what Snowflake itself uses. Tom On 12/3/10 7:26 PM, TCI wrote: Help. I had been using BINARY(16) to store tweet id's in MYSQL. I can't find a good resource to tell me the new data type I should use: Reading the mysql documentation, I believe it should be unsigned bigint - but would binary(64) be a better option? I also read this comment against about BIGINT, with minimal comments. http://blog.salientdigital.com/2010/11/13/twitter-api-snowflake-and-mysql/ Anybody storing tweet id's in MYSQL care to share what datatype they're using? TCI -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
[twitter-dev] Re: Snowflake: An update and some very important information
Help. I had been using BINARY(16) to store tweet id's in MYSQL. I can't find a good resource to tell me the new data type I should use: Reading the mysql documentation, I believe it should be unsigned bigint - but would binary(64) be a better option? I also read this comment against about BIGINT, with minimal comments. http://blog.salientdigital.com/2010/11/13/twitter-api-snowflake-and-mysql/ Anybody storing tweet id's in MYSQL care to share what datatype they're using? TCI -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
[twitter-dev] Re: Snowflake: An update and some very important information
Maybe this is a little naive, and I know you gotta' consider backwards compatibility, but it seems like a bad idea to rely on status ID for chronological sorting. If you want tweets displayed in order, sort by the timestamp. If you get rid of the sorting requirement on the ID. Why not do what Dean said? Just use a GUID for the ID. Languages without GUIDs would treat it as a unique string. It might be a pain to switch to GUIDs, but it sounds like switching to Snowflake is no walk in the park either. On Nov 22, 3:42 pm, jough wrote: > I gather the reason for the 64-bit int type was to maintain some > backwards-compatibility around the old sequential IDs, so both the old- > style and Snowflake IDs could be sorted and you could glean that > smaller IDs are older than larger integers. U/GUIDs wouldn't be > sortable in any meaningful fashion. > > - Jough > > On Nov 19, 10:42 pm, dean wrote: > > > > > > > > > Why not just use a GUID or UUID type for the ID type (IE: > > 3F2504E0-4F89-11D3-9A0C-0305E82C3301)? This way you're not restricted > > by using a numeric data type that each language could potentially > > define differently. > > > For languages that don't directly have a GUID or UUID type, they can > > treat that ID as a string, and the higher level languages can use the > > GUID data type directly. > > > On Oct 18, 7:19 pm, Matt Harris wrote: > > > > Last week you may remember Twitter planned to enable the new Status ID > > > generator - 'Snowflake' but didn't. The purpose of this email is to > > > explain > > > the reason why this didn't happen, what we are doing about it, and what > > > the > > > new release plan is. > > > > So what is Snowflake? > > > -- > > > Snowflake is a service we will be using to generate unique Tweet IDs. > > > These > > > Tweet IDs are unique 64bit unsigned integers, which, instead of being > > > sequential like the current IDs, are based on time. The full ID is > > > composed > > > of a timestamp, a worker number, and a sequence number. > > > > The problem > > > - > > > Before launch it came to our attention that some programming languages > > > such > > > as Javascript cannot support numbers with >53bits. This can be easily > > > examined by running a command similar to: (90071992547409921).toString() > > > in > > > your browsers console or by running the following JSON snippet through > > > your > > > JSON parser. > > > > {"id": 10765432100123456789, "id_str": "10765432100123456789"} > > > > In affected JSON parsers the ID will not be converted successfully and > > > will > > > lose accuracy. In some parsers there may even be an exception. > > > > The solution > > > > > > To allow javascript and JSON parsers to read the IDs we need to include a > > > string version of any ID when responding in the JSON format. What this > > > means > > > is Status, User, Direct Message and Saved Search IDs in the Twitter API > > > will > > > now be returned as an integer and a string in JSON responses. This will > > > apply to the main Twitter API, the Streaming API and the Search API. > > > > For example, a status object will now contain an id and an id_str. The > > > following JSON representation of a status object shows the two versions of > > > the ID fields for each data point. > > > > [ > > > { > > > "coordinates": null, > > > "truncated": false, > > > "created_at": "Thu Oct 14 22:20:15 + 2010", > > > "favorited": false, > > > "entities": { > > > "urls": [ > > > ], > > > "hashtags": [ > > > ], > > > "user_mentions": [ > > > { > > > "name": "Matt Harris", > > > "id": 777925, > > > "id_str": "777925", > > > "indices": [ > > > 0, > > > 14 > > > ], > > > "screen_name": "themattharris" > > > } > > > ] > > > }, > > > "text": "@themattharris hey how are things?", > > > "annotations": null, > > > "contributors": [ > > > { > > > "id": 819797, > > > "id_str": "819797", > > > "screen_name": "episod" > > > } > > > ], > > > "id": 12738165059, > > > "id_str": "12738165059", > > > "retweet_count": 0, > > > "geo": null, > > > "retweeted": false, > > > "in_reply_to_user_id": 777925, > > > "in_reply_to_user_id_str": "777925", > > > "in_reply_to_screen_name": "themattharris", > > > "user": { > > > "id": 6253282 > > > "id_str": "6253282" > > > }, > > > "source": "web", > > > "place": null, > > > "in_reply_to_status_id": 12738040524 > > > "in_reply_to_status_id_str": "12738040524" > > > } > > > ] > > > > What should you do - RIGHT NOW > > > -- > > > The first thing you should do is attempt to decode the JSON snippet above > > > using your production code parser. Observe the output to confirm the ID >
[twitter-dev] Re: Snowflake: An update and some very important information
I gather the reason for the 64-bit int type was to maintain some backwards-compatibility around the old sequential IDs, so both the old- style and Snowflake IDs could be sorted and you could glean that smaller IDs are older than larger integers. U/GUIDs wouldn't be sortable in any meaningful fashion. - Jough On Nov 19, 10:42 pm, dean wrote: > Why not just use a GUID or UUID type for the ID type (IE: > 3F2504E0-4F89-11D3-9A0C-0305E82C3301)? This way you're not restricted > by using a numeric data type that each language could potentially > define differently. > > For languages that don't directly have a GUID or UUID type, they can > treat that ID as a string, and the higher level languages can use the > GUID data type directly. > > On Oct 18, 7:19 pm, Matt Harris wrote: > > > Last week you may remember Twitter planned to enable the new Status ID > > generator - 'Snowflake' but didn't. The purpose of this email is to explain > > the reason why this didn't happen, what we are doing about it, and what the > > new release plan is. > > > So what is Snowflake? > > -- > > Snowflake is a service we will be using to generate unique Tweet IDs. These > > Tweet IDs are unique 64bit unsigned integers, which, instead of being > > sequential like the current IDs, are based on time. The full ID is composed > > of a timestamp, a worker number, and a sequence number. > > > The problem > > - > > Before launch it came to our attention that some programming languages such > > as Javascript cannot support numbers with >53bits. This can be easily > > examined by running a command similar to: (90071992547409921).toString() in > > your browsers console or by running the following JSON snippet through your > > JSON parser. > > > {"id": 10765432100123456789, "id_str": "10765432100123456789"} > > > In affected JSON parsers the ID will not be converted successfully and will > > lose accuracy. In some parsers there may even be an exception. > > > The solution > > > > To allow javascript and JSON parsers to read the IDs we need to include a > > string version of any ID when responding in the JSON format. What this means > > is Status, User, Direct Message and Saved Search IDs in the Twitter API will > > now be returned as an integer and a string in JSON responses. This will > > apply to the main Twitter API, the Streaming API and the Search API. > > > For example, a status object will now contain an id and an id_str. The > > following JSON representation of a status object shows the two versions of > > the ID fields for each data point. > > > [ > > { > > "coordinates": null, > > "truncated": false, > > "created_at": "Thu Oct 14 22:20:15 + 2010", > > "favorited": false, > > "entities": { > > "urls": [ > > ], > > "hashtags": [ > > ], > > "user_mentions": [ > > { > > "name": "Matt Harris", > > "id": 777925, > > "id_str": "777925", > > "indices": [ > > 0, > > 14 > > ], > > "screen_name": "themattharris" > > } > > ] > > }, > > "text": "@themattharris hey how are things?", > > "annotations": null, > > "contributors": [ > > { > > "id": 819797, > > "id_str": "819797", > > "screen_name": "episod" > > } > > ], > > "id": 12738165059, > > "id_str": "12738165059", > > "retweet_count": 0, > > "geo": null, > > "retweeted": false, > > "in_reply_to_user_id": 777925, > > "in_reply_to_user_id_str": "777925", > > "in_reply_to_screen_name": "themattharris", > > "user": { > > "id": 6253282 > > "id_str": "6253282" > > }, > > "source": "web", > > "place": null, > > "in_reply_to_status_id": 12738040524 > > "in_reply_to_status_id_str": "12738040524" > > } > > ] > > > What should you do - RIGHT NOW > > -- > > The first thing you should do is attempt to decode the JSON snippet above > > using your production code parser. Observe the output to confirm the ID has > > not lost accuracy. > > > What you do next depends on what happens: > > > * If your code converts the ID successfully without losing accuracy you are > > OK but should consider converting to the _str versions of IDs as soon as > > possible. > > * If your code has lost accuracy, convert your code to using the _str > > version immediately. If you do not do this your code will be unable to > > interact with the Twitter API reliably. > > * In some language parsers, the JSON may throw an exception when reading the > > ID value. If this happens in your parser you will need to ‘pre-parse’ the > > data, removing or replacing ID parameters with their _str versions. > > > Summary > > - > > 1) If you develop in Javascript, know that you will have to update your code > > to read the string version instead of t
[twitter-dev] Re: Snowflake: An update and some very important information
Why not just use a GUID or UUID type for the ID type (IE: 3F2504E0-4F89-11D3-9A0C-0305E82C3301)? This way you're not restricted by using a numeric data type that each language could potentially define differently. For languages that don't directly have a GUID or UUID type, they can treat that ID as a string, and the higher level languages can use the GUID data type directly. On Oct 18, 7:19 pm, Matt Harris wrote: > Last week you may remember Twitter planned to enable the new Status ID > generator - 'Snowflake' but didn't. The purpose of this email is to explain > the reason why this didn't happen, what we are doing about it, and what the > new release plan is. > > So what is Snowflake? > -- > Snowflake is a service we will be using to generate unique Tweet IDs. These > Tweet IDs are unique 64bit unsigned integers, which, instead of being > sequential like the current IDs, are based on time. The full ID is composed > of a timestamp, a worker number, and a sequence number. > > The problem > - > Before launch it came to our attention that some programming languages such > as Javascript cannot support numbers with >53bits. This can be easily > examined by running a command similar to: (90071992547409921).toString() in > your browsers console or by running the following JSON snippet through your > JSON parser. > > {"id": 10765432100123456789, "id_str": "10765432100123456789"} > > In affected JSON parsers the ID will not be converted successfully and will > lose accuracy. In some parsers there may even be an exception. > > The solution > > To allow javascript and JSON parsers to read the IDs we need to include a > string version of any ID when responding in the JSON format. What this means > is Status, User, Direct Message and Saved Search IDs in the Twitter API will > now be returned as an integer and a string in JSON responses. This will > apply to the main Twitter API, the Streaming API and the Search API. > > For example, a status object will now contain an id and an id_str. The > following JSON representation of a status object shows the two versions of > the ID fields for each data point. > > [ > { > "coordinates": null, > "truncated": false, > "created_at": "Thu Oct 14 22:20:15 + 2010", > "favorited": false, > "entities": { > "urls": [ > ], > "hashtags": [ > ], > "user_mentions": [ > { > "name": "Matt Harris", > "id": 777925, > "id_str": "777925", > "indices": [ > 0, > 14 > ], > "screen_name": "themattharris" > } > ] > }, > "text": "@themattharris hey how are things?", > "annotations": null, > "contributors": [ > { > "id": 819797, > "id_str": "819797", > "screen_name": "episod" > } > ], > "id": 12738165059, > "id_str": "12738165059", > "retweet_count": 0, > "geo": null, > "retweeted": false, > "in_reply_to_user_id": 777925, > "in_reply_to_user_id_str": "777925", > "in_reply_to_screen_name": "themattharris", > "user": { > "id": 6253282 > "id_str": "6253282" > }, > "source": "web", > "place": null, > "in_reply_to_status_id": 12738040524 > "in_reply_to_status_id_str": "12738040524" > } > ] > > What should you do - RIGHT NOW > -- > The first thing you should do is attempt to decode the JSON snippet above > using your production code parser. Observe the output to confirm the ID has > not lost accuracy. > > What you do next depends on what happens: > > * If your code converts the ID successfully without losing accuracy you are > OK but should consider converting to the _str versions of IDs as soon as > possible. > * If your code has lost accuracy, convert your code to using the _str > version immediately. If you do not do this your code will be unable to > interact with the Twitter API reliably. > * In some language parsers, the JSON may throw an exception when reading the > ID value. If this happens in your parser you will need to ‘pre-parse’ the > data, removing or replacing ID parameters with their _str versions. > > Summary > - > 1) If you develop in Javascript, know that you will have to update your code > to read the string version instead of the integer version. > > 2) If you use a JSON decoder, validate that the example JSON, above, decodes > without throwing exceptions. If exceptions are thrown, you will need to > pre-parse the data. Please let us know the name, version, and language of > the parser which throws the exception so we can investigate. > > Timeline > --- > by 22nd October 2010 (Friday): String versions of ID numbers will start > appearing in the API responses > 4th November 2010 (Thursday) : Snowflake will be turned on but at ~41bit > length > 26th November 2010 (Friday) : Sta
[twitter-dev] Re: Snowflake: An update and some very important information
+1 for bumping the version number and maintaining backwards compatibility. On Oct 20, 2:23 pm, Josh Roesslein wrote: > Isn't the point of having versioned API's so changes can be rolled out w/o > breaking a much of applications at once? > Why not increment to version 2 and replace all ID's as strings in the JSON > format? Keep version 1 around for a few months > allowing everyone to upgrade and then kill it off. This can also give > twitter a chance to make any other breaking changes. > > If Twitter is never going to take advantage of the versioning they added > what is the point of having it? > I think just creating new fields to avoid versioning issues is unclean and > messy. -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
[twitter-dev] Re: Snowflake: An update and some very important information
Thanks, Tom. I also hope the answer to my first question is No. I asked the 2nd question because in Matt's message that started this thread, he says: * If your code converts the ID successfully without losing accuracy you are OK but should consider converting to the _str versions of IDs as soon as possible. Why would we consider converting to the _str version if we can already parse the integers? On Nov 10, 10:56 pm, Tom van der Woerdt wrote: > 1) Please don't, I don't want to have to convert everything back to > integers within my code. I consider the string representation a hack > around some issues with certain programming languages, and not an > optimal solution. Wouldn't want this to become the default option. > > 2) No > > Tom > > On 11/11/10 6:34 AM, SM wrote: > > > > > > > > > Hello. Couple questions: > > > 1.) Are you planning on eventually eliminating the integer > > representation and only using strings for id's? > > > 2.) If an application doesn't use Javascript to parse JSON (for > > example, YAJL-OBJC and NSNumbers in Obj-C), is it necessary to make > > any changes at all? > > > Thanks. > > > On Oct 19, 3:52 pm, Matt Harris wrote: > >> Hey everyone, > > >> Thank you to all of you for your questions, patience and contributions to > >> this thread. Hearing your views and knowing how you use the API helps us > >> provide more information where there wasn't enough, and clarify details > >> where there was ambiguity. > > >> I've collated the questions i've received from you directly, over Twitter > >> to > >> @twitterapi and through this list. I hope the comments below provide > >> enough > >> information to answer those questions and explain the reasoning being our > >> decisions. > > >> Thanks for your support and patience, > >> @themattharris > > >> 1) Will search.twitter.com also include id_str and > >> in_reply_to_status_id_str? > >> Yes, Search will include the String representations of those IDs. > > >> 2) Which fields are affected by this change? > >> All IDs which are transmitted as Integers will have a String representation > >> in the API response. Only Tweet IDs (which includes mentions and retweets) > >> will be moving to new Snowflake IDs. Messages (DMs), Saved Searches and > >> Users may change to a Snowflake ID scheme in the future but this isn�t > >> planned for this year. > > >> We are adding String representations of the Integer IDs now so you can > >> update all of your code to use the String representations throughout. to > >> allow developers to make the change now for all the ID fields and be > >> prepared should any other IDs break the 53bit boundary. > > >> 3) Which fields will have String representations? > >> The fields which will have String representations are: > > >> id (DM, Saved Search, User, List ) > >> in_reply_to_status_id > >> in_reply_to_user_id > >> new_id (Streaming only. Will be removed when Snowflake is enabled) > >> current_user_retweet_id (When include_my_retweet=1 is passed) > > >> 4) Can you provide a complete Tweet example with Snowflake ID to test? > > >> [{"coordinates":null,"truncated":false,"created_at":"Thu Oct 14 22:20:15 > >> + > >> 2010","favorited":false,"entities":{"urls":[],"hashtags":[],"user_mentions" > >> :[{"name":"Matt > >> Harris","id":777925,"id_str":"777925","indices":[0,14],"screen_name":"thema > >> ttharris"}]},"text":"@themattharris > >> hey how are > >> things?","annotations":null,"contributors":[{"id":819797,"id_str":"819797", > >> > >> "screen_name":"episod"}],"id":10765432100123456789,"id_str":"10765432100123 > >> > >> 456789","retweet_count":0,"geo":null,"retweeted":false,"in_reply_to_user_id > >> > >> ":777925,"in_reply_to_user_id_str":"777925","in_reply_to_screen_name":"them > >> > >> attharris","user":{"id":6253282,"id_str":"6253282"},"source":"web","place": > >> > >> null,"in_reply_to_status_id":10586268426842688951,"in_reply_to_status_id_st > >> r":"10586268426842688951"}] > > >> 5) What is happening with new_id in the Streaming API? > >> new_id and new_id_str will be switched off when or soon after Snowflake is > >> enabled on November 4th. > > >> 6) Why not restrict IDs to 53bits? > > >> A Snowflake ID is composed: > >> * 41bits for millisecond precision time (69 years) > >> * 10bits for a configured machine identity (1024 machines) > >> * 12bits for a sequence number (4096 per machine) > > >> The factor influencing the length of the ID is the time. For a 53bit ID > >> this > >> would mean only 31bits are available for the time. 31bits is only enough > >> for > >> 24 days (2147483648/(1000*60*60*24)) of time. > > >> Reducing the resolution of the timestamp would prevent a K-sorted > >> resolution > >> of 1 second or less. > > >> Reducing the configured machine identity or sequence number by 1bit would > >> mean we couldn�t scale Twitter, or operate our infrastructure in an > >> uncoordinated high-available way. > > >> 7) When will the 53bit Integer overflow happen? > >> 24 days a
Re: [twitter-dev] Re: Snowflake: An update and some very important information
1) Please don't, I don't want to have to convert everything back to integers within my code. I consider the string representation a hack around some issues with certain programming languages, and not an optimal solution. Wouldn't want this to become the default option. 2) No Tom On 11/11/10 6:34 AM, SM wrote: > Hello. Couple questions: > > 1.) Are you planning on eventually eliminating the integer > representation and only using strings for id's? > > 2.) If an application doesn't use Javascript to parse JSON (for > example, YAJL-OBJC and NSNumbers in Obj-C), is it necessary to make > any changes at all? > > Thanks. > > > > On Oct 19, 3:52 pm, Matt Harris wrote: >> Hey everyone, >> >> Thank you to all of you for your questions, patience and contributions to >> this thread. Hearing your views and knowing how you use the API helps us >> provide more information where there wasn't enough, and clarify details >> where there was ambiguity. >> >> I've collated the questions i've received from you directly, over Twitter to >> @twitterapi and through this list. I hope the comments below provide enough >> information to answer those questions and explain the reasoning being our >> decisions. >> >> Thanks for your support and patience, >> @themattharris >> >> 1) Will search.twitter.com also include id_str and >> in_reply_to_status_id_str? >> Yes, Search will include the String representations of those IDs. >> >> 2) Which fields are affected by this change? >> All IDs which are transmitted as Integers will have a String representation >> in the API response. Only Tweet IDs (which includes mentions and retweets) >> will be moving to new Snowflake IDs. Messages (DMs), Saved Searches and >> Users may change to a Snowflake ID scheme in the future but this isn’t >> planned for this year. >> >> We are adding String representations of the Integer IDs now so you can >> update all of your code to use the String representations throughout. to >> allow developers to make the change now for all the ID fields and be >> prepared should any other IDs break the 53bit boundary. >> >> 3) Which fields will have String representations? >> The fields which will have String representations are: >> >> id (DM, Saved Search, User, List ) >> in_reply_to_status_id >> in_reply_to_user_id >> new_id (Streaming only. Will be removed when Snowflake is enabled) >> current_user_retweet_id (When include_my_retweet=1 is passed) >> >> 4) Can you provide a complete Tweet example with Snowflake ID to test? >> >> [{"coordinates":null,"truncated":false,"created_at":"Thu Oct 14 22:20:15 >> + >> 2010","favorited":false,"entities":{"urls":[],"hashtags":[],"user_mentions" >> :[{"name":"Matt >> Harris","id":777925,"id_str":"777925","indices":[0,14],"screen_name":"thema >> ttharris"}]},"text":"@themattharris >> hey how are >> things?","annotations":null,"contributors":[{"id":819797,"id_str":"819797", >> "screen_name":"episod"}],"id":10765432100123456789,"id_str":"10765432100123 >> 456789","retweet_count":0,"geo":null,"retweeted":false,"in_reply_to_user_id >> ":777925,"in_reply_to_user_id_str":"777925","in_reply_to_screen_name":"them >> attharris","user":{"id":6253282,"id_str":"6253282"},"source":"web","place": >> null,"in_reply_to_status_id":10586268426842688951,"in_reply_to_status_id_st >> r":"10586268426842688951"}] >> >> 5) What is happening with new_id in the Streaming API? >> new_id and new_id_str will be switched off when or soon after Snowflake is >> enabled on November 4th. >> >> 6) Why not restrict IDs to 53bits? >> >> A Snowflake ID is composed: >> * 41bits for millisecond precision time (69 years) >> * 10bits for a configured machine identity (1024 machines) >> * 12bits for a sequence number (4096 per machine) >> >> The factor influencing the length of the ID is the time. For a 53bit ID this >> would mean only 31bits are available for the time. 31bits is only enough for >> 24 days (2147483648/(1000*60*60*24)) of time. >> >> Reducing the resolution of the timestamp would prevent a K-sorted resolution >> of 1 second or less. >> >> Reducing the configured machine identity or sequence number by 1bit would >> mean we couldn’t scale Twitter, or operate our infrastructure in an >> uncoordinated high-available way. >> >> 7) When will the 53bit Integer overflow happen? >> 24 days after Snowflake starts counting. >> >> 8) Is it safe to parse and store IDs as signed 64bit Integers? >> Yes. >> >> 9) Why offer both the String and Integer versions of the ID? >> The String representation is needed to ensure languages which cannot convert >> the >53bit Integer can still use the ID in other API requests. >> >> The Integer value is being retained for languages which can handle >> numbers>53bit and to prevent applications which have not converted from being >> >> cut-off from Twitter. >> >> 10) When ID is null what will the _str representation be? >> The _str representation will also be null. >> >> 11) Did you really mean ‘unsigned’ 64bit Int
[twitter-dev] Re: Snowflake: An update and some very important information
Hello. Couple questions: 1.) Are you planning on eventually eliminating the integer representation and only using strings for id's? 2.) If an application doesn't use Javascript to parse JSON (for example, YAJL-OBJC and NSNumbers in Obj-C), is it necessary to make any changes at all? Thanks. On Oct 19, 3:52 pm, Matt Harris wrote: > Hey everyone, > > Thank you to all of you for your questions, patience and contributions to > this thread. Hearing your views and knowing how you use the API helps us > provide more information where there wasn't enough, and clarify details > where there was ambiguity. > > I've collated the questions i've received from you directly, over Twitter to > @twitterapi and through this list. I hope the comments below provide enough > information to answer those questions and explain the reasoning being our > decisions. > > Thanks for your support and patience, > @themattharris > > 1) Will search.twitter.com also include id_str and > in_reply_to_status_id_str? > Yes, Search will include the String representations of those IDs. > > 2) Which fields are affected by this change? > All IDs which are transmitted as Integers will have a String representation > in the API response. Only Tweet IDs (which includes mentions and retweets) > will be moving to new Snowflake IDs. Messages (DMs), Saved Searches and > Users may change to a Snowflake ID scheme in the future but this isn’t > planned for this year. > > We are adding String representations of the Integer IDs now so you can > update all of your code to use the String representations throughout. to > allow developers to make the change now for all the ID fields and be > prepared should any other IDs break the 53bit boundary. > > 3) Which fields will have String representations? > The fields which will have String representations are: > > id (DM, Saved Search, User, List ) > in_reply_to_status_id > in_reply_to_user_id > new_id (Streaming only. Will be removed when Snowflake is enabled) > current_user_retweet_id (When include_my_retweet=1 is passed) > > 4) Can you provide a complete Tweet example with Snowflake ID to test? > > [{"coordinates":null,"truncated":false,"created_at":"Thu Oct 14 22:20:15 > + > 2010","favorited":false,"entities":{"urls":[],"hashtags":[],"user_mentions" > :[{"name":"Matt > Harris","id":777925,"id_str":"777925","indices":[0,14],"screen_name":"thema > ttharris"}]},"text":"@themattharris > hey how are > things?","annotations":null,"contributors":[{"id":819797,"id_str":"819797", > "screen_name":"episod"}],"id":10765432100123456789,"id_str":"10765432100123 > 456789","retweet_count":0,"geo":null,"retweeted":false,"in_reply_to_user_id > ":777925,"in_reply_to_user_id_str":"777925","in_reply_to_screen_name":"them > attharris","user":{"id":6253282,"id_str":"6253282"},"source":"web","place": > null,"in_reply_to_status_id":10586268426842688951,"in_reply_to_status_id_st > r":"10586268426842688951"}] > > 5) What is happening with new_id in the Streaming API? > new_id and new_id_str will be switched off when or soon after Snowflake is > enabled on November 4th. > > 6) Why not restrict IDs to 53bits? > > A Snowflake ID is composed: > * 41bits for millisecond precision time (69 years) > * 10bits for a configured machine identity (1024 machines) > * 12bits for a sequence number (4096 per machine) > > The factor influencing the length of the ID is the time. For a 53bit ID this > would mean only 31bits are available for the time. 31bits is only enough for > 24 days (2147483648/(1000*60*60*24)) of time. > > Reducing the resolution of the timestamp would prevent a K-sorted resolution > of 1 second or less. > > Reducing the configured machine identity or sequence number by 1bit would > mean we couldn’t scale Twitter, or operate our infrastructure in an > uncoordinated high-available way. > > 7) When will the 53bit Integer overflow happen? > 24 days after Snowflake starts counting. > > 8) Is it safe to parse and store IDs as signed 64bit Integers? > Yes. > > 9) Why offer both the String and Integer versions of the ID? > The String representation is needed to ensure languages which cannot convert > the >53bit Integer can still use the ID in other API requests. > > The Integer value is being retained for languages which can handle > numbers>53bit and to prevent applications which have not converted from being > > cut-off from Twitter. > > 10) When ID is null what will the _str representation be? > The _str representation will also be null. > > 11) Did you really mean ‘unsigned’ 64bit Integer? > Strictly speaking the Snowflake is a signed 64bit long under the hood. That > being said, we will never use the negative bit and won’t require the full > 64bits for positive numbers for about 69 years: > > http://www.google.com/search?q=%282**41%29+%2F+%2860*60*24*1000%29+%2... > > 12) Why not make the strings opt-in? > We did consider this as an option but decided against it for a number of > reasons. The first reason is that th
[twitter-dev] Re: Snowflake: An update and some very important information
Hey everyone, Further to this mornings email, the Search API now contains the _str representations of it's IDs. Best @themattharris Developer Advocate, Twitter http://twitter.com/themattharris On Mon, Oct 25, 2010 at 1:33 PM, Matt Harris wrote: > The Search API hasn't rolled those fields in yet but they are due soon. The > API does include these fields but as Tim has highlighted the xyz/ids methods > don't yet have a String version. > > The engineering teams are aware of this and are working on it. I'll post an > update once those additional responses have the fields. > > Thanks for noticing they were missing and commenting on the thread. > > @themattharris > Developer Advocate, Twitter > http://twitter.com/themattharris > > > On Mon, Oct 25, 2010 at 3:28 AM, Johannes la Poutre wrote: > >> @Themattharris: was there any change to the implementation timeline? >> Quote: "by 22nd October 2010 (Friday): String versions of ID numbers >> will start appearing in the API responses" >> >> I'm still not seeing id_str, to_user_id_str and from_user_id_str etc. >> in the current search API output, example: >> >> http://search.twitter.com/search.json?geocode=52.155018%2C4.487658%2C1km >> >> is there an updated timeline or did I miss something? >> >> Best, >> >> -- Johannes / @jlapoutre / @tweepsaround >> >> >> On Oct 19, 2:19 am, Matt Harris wrote: >> > Last week you may remember Twitter planned to enable the new Status ID >> > generator - 'Snowflake' but didn't. The purpose of this email is to >> explain >> > the reason why this didn't happen, what we are doing about it, and what >> the >> > new release plan is. >> > >> > So what is Snowflake? >> > -- >> > Snowflake is a service we will be using to generate unique Tweet IDs. >> These >> > Tweet IDs are unique 64bit unsigned integers, which, instead of being >> > sequential like the current IDs, are based on time. The full ID is >> composed >> > of a timestamp, a worker number, and a sequence number. >> > >> > The problem >> > - >> > Before launch it came to our attention that some programming languages >> such >> > as Javascript cannot support numbers with >53bits. This can be easily >> > examined by running a command similar to: (90071992547409921).toString() >> in >> > your browsers console or by running the following JSON snippet through >> your >> > JSON parser. >> > >> > {"id": 10765432100123456789, "id_str": "10765432100123456789"} >> > >> > In affected JSON parsers the ID will not be converted successfully and >> will >> > lose accuracy. In some parsers there may even be an exception. >> > >> > The solution >> > >> > To allow javascript and JSON parsers to read the IDs we need to include >> a >> > string version of any ID when responding in the JSON format. What this >> means >> > is Status, User, Direct Message and Saved Search IDs in the Twitter API >> will >> > now be returned as an integer and a string in JSON responses. This will >> > apply to the main Twitter API, the Streaming API and the Search API. >> > >> > For example, a status object will now contain an id and an id_str. The >> > following JSON representation of a status object shows the two versions >> of >> > the ID fields for each data point. >> > >> > [ >> > { >> > "coordinates": null, >> > "truncated": false, >> > "created_at": "Thu Oct 14 22:20:15 + 2010", >> > "favorited": false, >> > "entities": { >> > "urls": [ >> > ], >> > "hashtags": [ >> > ], >> > "user_mentions": [ >> > { >> > "name": "Matt Harris", >> > "id": 777925, >> > "id_str": "777925", >> > "indices": [ >> > 0, >> > 14 >> > ], >> > "screen_name": "themattharris" >> > } >> > ] >> > }, >> > "text": "@themattharris hey how are things?", >> > "annotations": null, >> > "contributors": [ >> > { >> > "id": 819797, >> > "id_str": "819797", >> > "screen_name": "episod" >> > } >> > ], >> > "id": 12738165059, >> > "id_str": "12738165059", >> > "retweet_count": 0, >> > "geo": null, >> > "retweeted": false, >> > "in_reply_to_user_id": 777925, >> > "in_reply_to_user_id_str": "777925", >> > "in_reply_to_screen_name": "themattharris", >> > "user": { >> > "id": 6253282 >> > "id_str": "6253282" >> > }, >> > "source": "web", >> > "place": null, >> > "in_reply_to_status_id": 12738040524 >> > "in_reply_to_status_id_str": "12738040524" >> > } >> > ] >> > >> > What should you do - RIGHT NOW >> > -- >> > The first thing you should do is attempt to decode the JSON snippet >> above >> > using your production code parser. Observe the output to confirm the ID >> has >> > not lost accuracy. >> > >> > What you do next depends on what happens: >> > >> > * If your co
[twitter-dev] Re: Snowflake: An update and some very important information
The Search API hasn't rolled those fields in yet but they are due soon. The API does include these fields but as Tim has highlighted the xyz/ids methods don't yet have a String version. The engineering teams are aware of this and are working on it. I'll post an update once those additional responses have the fields. Thanks for noticing they were missing and commenting on the thread. @themattharris Developer Advocate, Twitter http://twitter.com/themattharris On Mon, Oct 25, 2010 at 3:28 AM, Johannes la Poutre wrote: > @Themattharris: was there any change to the implementation timeline? > Quote: "by 22nd October 2010 (Friday): String versions of ID numbers > will start appearing in the API responses" > > I'm still not seeing id_str, to_user_id_str and from_user_id_str etc. > in the current search API output, example: > > http://search.twitter.com/search.json?geocode=52.155018%2C4.487658%2C1km > > is there an updated timeline or did I miss something? > > Best, > > -- Johannes / @jlapoutre / @tweepsaround > > > On Oct 19, 2:19 am, Matt Harris wrote: > > Last week you may remember Twitter planned to enable the new Status ID > > generator - 'Snowflake' but didn't. The purpose of this email is to > explain > > the reason why this didn't happen, what we are doing about it, and what > the > > new release plan is. > > > > So what is Snowflake? > > -- > > Snowflake is a service we will be using to generate unique Tweet IDs. > These > > Tweet IDs are unique 64bit unsigned integers, which, instead of being > > sequential like the current IDs, are based on time. The full ID is > composed > > of a timestamp, a worker number, and a sequence number. > > > > The problem > > - > > Before launch it came to our attention that some programming languages > such > > as Javascript cannot support numbers with >53bits. This can be easily > > examined by running a command similar to: (90071992547409921).toString() > in > > your browsers console or by running the following JSON snippet through > your > > JSON parser. > > > > {"id": 10765432100123456789, "id_str": "10765432100123456789"} > > > > In affected JSON parsers the ID will not be converted successfully and > will > > lose accuracy. In some parsers there may even be an exception. > > > > The solution > > > > To allow javascript and JSON parsers to read the IDs we need to include a > > string version of any ID when responding in the JSON format. What this > means > > is Status, User, Direct Message and Saved Search IDs in the Twitter API > will > > now be returned as an integer and a string in JSON responses. This will > > apply to the main Twitter API, the Streaming API and the Search API. > > > > For example, a status object will now contain an id and an id_str. The > > following JSON representation of a status object shows the two versions > of > > the ID fields for each data point. > > > > [ > > { > > "coordinates": null, > > "truncated": false, > > "created_at": "Thu Oct 14 22:20:15 + 2010", > > "favorited": false, > > "entities": { > > "urls": [ > > ], > > "hashtags": [ > > ], > > "user_mentions": [ > > { > > "name": "Matt Harris", > > "id": 777925, > > "id_str": "777925", > > "indices": [ > > 0, > > 14 > > ], > > "screen_name": "themattharris" > > } > > ] > > }, > > "text": "@themattharris hey how are things?", > > "annotations": null, > > "contributors": [ > > { > > "id": 819797, > > "id_str": "819797", > > "screen_name": "episod" > > } > > ], > > "id": 12738165059, > > "id_str": "12738165059", > > "retweet_count": 0, > > "geo": null, > > "retweeted": false, > > "in_reply_to_user_id": 777925, > > "in_reply_to_user_id_str": "777925", > > "in_reply_to_screen_name": "themattharris", > > "user": { > > "id": 6253282 > > "id_str": "6253282" > > }, > > "source": "web", > > "place": null, > > "in_reply_to_status_id": 12738040524 > > "in_reply_to_status_id_str": "12738040524" > > } > > ] > > > > What should you do - RIGHT NOW > > -- > > The first thing you should do is attempt to decode the JSON snippet above > > using your production code parser. Observe the output to confirm the ID > has > > not lost accuracy. > > > > What you do next depends on what happens: > > > > * If your code converts the ID successfully without losing accuracy you > are > > OK but should consider converting to the _str versions of IDs as soon as > > possible. > > * If your code has lost accuracy, convert your code to using the _str > > version immediately. If you do not do this your code will be unable to > > interact with the Twitter API reliably. > > * In some language parsers, the JSON may throw an excep
Re: [twitter-dev] Re: Snowflake: An update and some very important information
It's there. But for some reason now in all types of json requests - On Oct 25, 2010, at 6:28 AM, Johannes la Poutre wrote: @Themattharris: was there any change to the implementation timeline? Quote: "by 22nd October 2010 (Friday): String versions of ID numbers will start appearing in the API responses" I'm still not seeing id_str, to_user_id_str and from_user_id_str etc. in the current search API output, example: http://search.twitter.com/search.json?geocode=52.155018%2C4.487658%2C1km is there an updated timeline or did I miss something? Best, -- Johannes / @jlapoutre / @tweepsaround On Oct 19, 2:19 am, Matt Harris wrote: Last week you may remember Twitter planned to enable the new Status ID generator - 'Snowflake' but didn't. The purpose of this email is to explain the reason why this didn't happen, what we are doing about it, and what the new release plan is. So what is Snowflake? -- Snowflake is a service we will be using to generate unique Tweet IDs. These Tweet IDs are unique 64bit unsigned integers, which, instead of being sequential like the current IDs, are based on time. The full ID is composed of a timestamp, a worker number, and a sequence number. The problem - Before launch it came to our attention that some programming languages such as Javascript cannot support numbers with >53bits. This can be easily examined by running a command similar to: (90071992547409921).toString() in your browsers console or by running the following JSON snippet through your JSON parser. {"id": 10765432100123456789, "id_str": "10765432100123456789"} In affected JSON parsers the ID will not be converted successfully and will lose accuracy. In some parsers there may even be an exception. The solution To allow javascript and JSON parsers to read the IDs we need to include a string version of any ID when responding in the JSON format. What this means is Status, User, Direct Message and Saved Search IDs in the Twitter API will now be returned as an integer and a string in JSON responses. This will apply to the main Twitter API, the Streaming API and the Search API. For example, a status object will now contain an id and an id_str. The following JSON representation of a status object shows the two versions of the ID fields for each data point. [ { "coordinates": null, "truncated": false, "created_at": "Thu Oct 14 22:20:15 + 2010", "favorited": false, "entities": { "urls": [ ], "hashtags": [ ], "user_mentions": [ { "name": "Matt Harris", "id": 777925, "id_str": "777925", "indices": [ 0, 14 ], "screen_name": "themattharris" } ] }, "text": "@themattharris hey how are things?", "annotations": null, "contributors": [ { "id": 819797, "id_str": "819797", "screen_name": "episod" } ], "id": 12738165059, "id_str": "12738165059", "retweet_count": 0, "geo": null, "retweeted": false, "in_reply_to_user_id": 777925, "in_reply_to_user_id_str": "777925", "in_reply_to_screen_name": "themattharris", "user": { "id": 6253282 "id_str": "6253282" }, "source": "web", "place": null, "in_reply_to_status_id": 12738040524 "in_reply_to_status_id_str": "12738040524" } ] What should you do - RIGHT NOW -- The first thing you should do is attempt to decode the JSON snippet above using your production code parser. Observe the output to confirm the ID has not lost accuracy. What you do next depends on what happens: * If your code converts the ID successfully without losing accuracy you are OK but should consider converting to the _str versions of IDs as soon as possible. * If your code has lost accuracy, convert your code to using the _str version immediately. If you do not do this your code will be unable to interact with the Twitter API reliably. * In some language parsers, the JSON may throw an exception when reading the ID value. If this happens in your parser you will need to ‘pre- parse’ the data, removing or replacing ID parameters with their _str versions. Summary - 1) If you develop in Javascript, know that you will have to update your code to read the string version instead of the integer version. 2) If you use a JSON decoder, validate that the example JSON, above, decodes without throwing exceptions. If exceptions are thrown, you will need to pre-parse the data. Please let us know the name, version, and language of the parser which throws the exception so we can investigate. Timeline --- by 22nd October 2010 (Friday): String versions of ID numbers will start appearing in the API responses 4th November 2010 (Thursday) : Snowflake will be turned o
[twitter-dev] Re: Snowflake: An update and some very important information
What about methods that list IDs without keys? e.g. blocks/ids which produces a array like [ 12345, 6789, ] These appear to be still cast as integers. I don't see the point in having integer IDs at all. Surely the case of people wishing to handle them as integers is far smaller than the case of people who simply need a unique reference. It is much easier for the minority to cast string IDs to integers than it is for the rest of us to do the reverse. -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
[twitter-dev] Re: Snowflake: An update and some very important information
@Themattharris: was there any change to the implementation timeline? Quote: "by 22nd October 2010 (Friday): String versions of ID numbers will start appearing in the API responses" I'm still not seeing id_str, to_user_id_str and from_user_id_str etc. in the current search API output, example: http://search.twitter.com/search.json?geocode=52.155018%2C4.487658%2C1km is there an updated timeline or did I miss something? Best, -- Johannes / @jlapoutre / @tweepsaround On Oct 19, 2:19 am, Matt Harris wrote: > Last week you may remember Twitter planned to enable the new Status ID > generator - 'Snowflake' but didn't. The purpose of this email is to explain > the reason why this didn't happen, what we are doing about it, and what the > new release plan is. > > So what is Snowflake? > -- > Snowflake is a service we will be using to generate unique Tweet IDs. These > Tweet IDs are unique 64bit unsigned integers, which, instead of being > sequential like the current IDs, are based on time. The full ID is composed > of a timestamp, a worker number, and a sequence number. > > The problem > - > Before launch it came to our attention that some programming languages such > as Javascript cannot support numbers with >53bits. This can be easily > examined by running a command similar to: (90071992547409921).toString() in > your browsers console or by running the following JSON snippet through your > JSON parser. > > {"id": 10765432100123456789, "id_str": "10765432100123456789"} > > In affected JSON parsers the ID will not be converted successfully and will > lose accuracy. In some parsers there may even be an exception. > > The solution > > To allow javascript and JSON parsers to read the IDs we need to include a > string version of any ID when responding in the JSON format. What this means > is Status, User, Direct Message and Saved Search IDs in the Twitter API will > now be returned as an integer and a string in JSON responses. This will > apply to the main Twitter API, the Streaming API and the Search API. > > For example, a status object will now contain an id and an id_str. The > following JSON representation of a status object shows the two versions of > the ID fields for each data point. > > [ > { > "coordinates": null, > "truncated": false, > "created_at": "Thu Oct 14 22:20:15 + 2010", > "favorited": false, > "entities": { > "urls": [ > ], > "hashtags": [ > ], > "user_mentions": [ > { > "name": "Matt Harris", > "id": 777925, > "id_str": "777925", > "indices": [ > 0, > 14 > ], > "screen_name": "themattharris" > } > ] > }, > "text": "@themattharris hey how are things?", > "annotations": null, > "contributors": [ > { > "id": 819797, > "id_str": "819797", > "screen_name": "episod" > } > ], > "id": 12738165059, > "id_str": "12738165059", > "retweet_count": 0, > "geo": null, > "retweeted": false, > "in_reply_to_user_id": 777925, > "in_reply_to_user_id_str": "777925", > "in_reply_to_screen_name": "themattharris", > "user": { > "id": 6253282 > "id_str": "6253282" > }, > "source": "web", > "place": null, > "in_reply_to_status_id": 12738040524 > "in_reply_to_status_id_str": "12738040524" > } > ] > > What should you do - RIGHT NOW > -- > The first thing you should do is attempt to decode the JSON snippet above > using your production code parser. Observe the output to confirm the ID has > not lost accuracy. > > What you do next depends on what happens: > > * If your code converts the ID successfully without losing accuracy you are > OK but should consider converting to the _str versions of IDs as soon as > possible. > * If your code has lost accuracy, convert your code to using the _str > version immediately. If you do not do this your code will be unable to > interact with the Twitter API reliably. > * In some language parsers, the JSON may throw an exception when reading the > ID value. If this happens in your parser you will need to ‘pre-parse’ the > data, removing or replacing ID parameters with their _str versions. > > Summary > - > 1) If you develop in Javascript, know that you will have to update your code > to read the string version instead of the integer version. > > 2) If you use a JSON decoder, validate that the example JSON, above, decodes > without throwing exceptions. If exceptions are thrown, you will need to > pre-parse the data. Please let us know the name, version, and language of > the parser which throws the exception so we can investigate. > > Timeline > --- > by 22nd October 2010 (Friday): String versions of ID numbers will start > appearing in the API responses > 4th November 2010 (Thu
[twitter-dev] Re: Snowflake: An update and some very important information
What's the point? That would still require a change in the JSON output, to replace integers with strings. On 21 oct, 15:38, David Nicol wrote: > Unless I did the math wrong, a 64 bit quantity is expressable in > > (64 * log(2)) / log(62) = 10.7487219 > > eleven characters drawn from A-Za-z0-9 > > and they can still be sortable! -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
[twitter-dev] Re: Snowflake: An update and some very important information
> Isn't the point of having versioned API's so changes can be rolled out w/o > breaking a much of applications at once? YES! 1000 times YES! This is exactly correct. We need to use the restful API correctly. > Why not increment to version 2 and replace all ID's as strings in the JSON > format? Keep version 1 around for a few months Or 24 days after SnowFlake, or foreverish... since anyone still using it isn't using an "affected" parser (e.g. one in a conformant JavaScript/EcmaScript). > allowing everyone to upgrade and then kill it off. This can also give > twitter a chance to make any other breaking changes. Like, you know, revisioning the search.twitter.com API endpoints start giving us _real_ TweetIDs, Tweep UserIDs, etc... you know, maybe giving us complete details like the rest of the API. I can dream, right? Marc -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
Re: [twitter-dev] Re: Snowflake: An update and some very important information
Unless I did the math wrong, a 64 bit quantity is expressable in (64 * log(2)) / log(62) = 10.7487219 eleven characters drawn from A-Za-z0-9 and they can still be sortable! -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
Re: [twitter-dev] Re: Snowflake: An update and some very important information
Isn't the point of having versioned API's so changes can be rolled out w/o breaking a much of applications at once? Why not increment to version 2 and replace all ID's as strings in the JSON format? Keep version 1 around for a few months allowing everyone to upgrade and then kill it off. This can also give twitter a chance to make any other breaking changes. If Twitter is never going to take advantage of the versioning they added what is the point of having it? I think just creating new fields to avoid versioning issues is unclean and messy. -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
[twitter-dev] Re: Snowflake: An update and some very important information
Matt, In your example you show an unsigned long that uses the full 64 bits bits, which fails to be properly represented on a Java platform. In a later explanation you say that there will only ever be 63 bits used. Can you confirm one way or the other? i.e. will we ever see an unsigned integer that will take more then 63 bits to represent? -Paul On Oct 19, 7:52 pm, Matt Harris wrote: > Hey everyone, > > Thank you to all of you for your questions, patience and contributions to > this thread. Hearing your views and knowing how you use the API helps us > provide more information where there wasn't enough, and clarify details > where there was ambiguity. > > I've collated the questions i've received from you directly, over Twitter to > @twitterapi and through this list. I hope the comments below provide enough > information to answer those questions and explain the reasoning being our > decisions. > > Thanks for your support and patience, > @themattharris > > 1) Will search.twitter.com also include id_str and > in_reply_to_status_id_str? > Yes, Search will include the String representations of those IDs. > > 2) Which fields are affected by this change? > All IDs which are transmitted as Integers will have a String representation > in the API response. Only Tweet IDs (which includes mentions and retweets) > will be moving to new Snowflake IDs. Messages (DMs), Saved Searches and > Users may change to a Snowflake ID scheme in the future but this isn’t > planned for this year. > > We are adding String representations of the Integer IDs now so you can > update all of your code to use the String representations throughout. to > allow developers to make the change now for all the ID fields and be > prepared should any other IDs break the 53bit boundary. > > 3) Which fields will have String representations? > The fields which will have String representations are: > > id (DM, Saved Search, User, List ) > in_reply_to_status_id > in_reply_to_user_id > new_id (Streaming only. Will be removed when Snowflake is enabled) > current_user_retweet_id (When include_my_retweet=1 is passed) > > 4) Can you provide a complete Tweet example with Snowflake ID to test? > > [{"coordinates":null,"truncated":false,"created_at":"Thu Oct 14 22:20:15 > + > 2010","favorited":false,"entities":{"urls":[],"hashtags":[],"user_mentions" > :[{"name":"Matt > Harris","id":777925,"id_str":"777925","indices":[0,14],"screen_name":"thema > ttharris"}]},"text":"@themattharris > hey how are > things?","annotations":null,"contributors":[{"id":819797,"id_str":"819797", > "screen_name":"episod"}],"id":10765432100123456789,"id_str":"10765432100123 > 456789","retweet_count":0,"geo":null,"retweeted":false,"in_reply_to_user_id > ":777925,"in_reply_to_user_id_str":"777925","in_reply_to_screen_name":"them > attharris","user":{"id":6253282,"id_str":"6253282"},"source":"web","place": > null,"in_reply_to_status_id":10586268426842688951,"in_reply_to_status_id_st > r":"10586268426842688951"}] > > 5) What is happening with new_id in the Streaming API? > new_id and new_id_str will be switched off when or soon after Snowflake is > enabled on November 4th. > > 6) Why not restrict IDs to 53bits? > > A Snowflake ID is composed: > * 41bits for millisecond precision time (69 years) > * 10bits for a configured machine identity (1024 machines) > * 12bits for a sequence number (4096 per machine) > > The factor influencing the length of the ID is the time. For a 53bit ID this > would mean only 31bits are available for the time. 31bits is only enough for > 24 days (2147483648/(1000*60*60*24)) of time. > > Reducing the resolution of the timestamp would prevent a K-sorted resolution > of 1 second or less. > > Reducing the configured machine identity or sequence number by 1bit would > mean we couldn’t scale Twitter, or operate our infrastructure in an > uncoordinated high-available way. > > 7) When will the 53bit Integer overflow happen? > 24 days after Snowflake starts counting. > > 8) Is it safe to parse and store IDs as signed 64bit Integers? > Yes. > > 9) Why offer both the String and Integer versions of the ID? > The String representation is needed to ensure languages which cannot convert > the >53bit Integer can still use the ID in other API requests. > > The Integer value is being retained for languages which can handle > numbers>53bit and to prevent applications which have not converted from being > > cut-off from Twitter. > > 10) When ID is null what will the _str representation be? > The _str representation will also be null. > > 11) Did you really mean ‘unsigned’ 64bit Integer? > Strictly speaking the Snowflake is a signed 64bit long under the hood. That > being said, we will never use the negative bit and won’t require the full > 64bits for positive numbers for about 69 years: > > http://www.google.com/search?q=%282**41%29+%2F+%2860*60*24*1000%29+%2... > > 12) Why not make the strings opt-in? > We did consider this as an option but decided against
Re: [twitter-dev] Re: Snowflake: An update and some very important information
Yes, you can. Tom On Oct 20, 2010, at 1:37 PM, Navin Kabra wrote: > On Oct 19, 5:19 am, Matt Harris wrote: >> So what is Snowflake? >> -- >> Snowflake is a service we will be using to generate unique Tweet IDs. These >> Tweet IDs are unique 64bit unsigned integers, which, instead of being >> sequential like the current IDs, are based on time. The full ID is composed >> of a timestamp, a worker number, and a sequence number. > > Will the new IDs be monotonically increasing? If my programming > language can handle 64bit integers, can I compare two IDs and expect > that the lower ID represents an older tweet (approximately)? > > -- > Twitter developer documentation and resources: http://dev.twitter.com/doc > API updates via Twitter: http://twitter.com/twitterapi > Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list > Change your membership to this group: > http://groups.google.com/group/twitter-development-talk -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
[twitter-dev] Re: Snowflake: An update and some very important information
On Oct 19, 5:19 am, Matt Harris wrote: > So what is Snowflake? > -- > Snowflake is a service we will be using to generate unique Tweet IDs. These > Tweet IDs are unique 64bit unsigned integers, which, instead of being > sequential like the current IDs, are based on time. The full ID is composed > of a timestamp, a worker number, and a sequence number. Will the new IDs be monotonically increasing? If my programming language can handle 64bit integers, can I compare two IDs and expect that the lower ID represents an older tweet (approximately)? -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
Re: [twitter-dev] Re: Snowflake: An update and some very important information
On 10/20/10 8:12 AM, M. Edward (Ed) Borasky wrote: >> 6) Why not restrict IDs to 53bits? >> >> A Snowflake ID is composed: >> * 41bits for millisecond precision time (69 years) >> * 10bits for a configured machine identity (1024 machines) >> * 12bits for a sequence number (4096 per machine) >> >> The factor influencing the length of the ID is the time. For a 53bit >> ID this >> would mean only 31bits are available for the time. 31bits is only >> enough for >> 24 days (2147483648/(1000*60*60*24)) of time. >> >> Reducing the resolution of the timestamp would prevent a K-sorted >> resolution >> of 1 second or less. >> >> Reducing the configured machine identity or sequence number by 1bit would >> mean we couldn’t scale Twitter, or operate our infrastructure in an >> uncoordinated high-available way. > > Interesting ... so you have the theoretical capacity to scale to 2**22 > (about 4 million) tweets per millisecond? Even 4 million tweets a second > seems unrealistic, as does a single "machine" only being able to > generate 4096 IDs. I think if you're really expecting this kind of > volume, the FPGA vendors probably can help you out. We are talking > clocks and counters, here, right, not Javascript interpreters or robust > linear regressions? ;-) > > Ah, well, I'll check back on you guys in 69 years to see how you're > holding up. ;-) > I'd say that you could remove a maximum of 2 bits from the time - this would divide the 69 years by 4, making it a max of 17 years. By then, I'd assume that we are past the 53-bit limit. You could remove 3 bits from the machine ID and 5 bits from the sequence number. It would mean that there could be only 128 ID servers with 128 IDs per millisecond per machine -> 16 million tweets per second. In total you would have removed 10 bits from a number that had only 63 bits -> 53 bits. The question is: do you want that? I don't think you do. I really prefer the current solution. Tom -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
Re: [twitter-dev] Re: Snowflake: An update and some very important information
6) Why not restrict IDs to 53bits? A Snowflake ID is composed: * 41bits for millisecond precision time (69 years) * 10bits for a configured machine identity (1024 machines) * 12bits for a sequence number (4096 per machine) The factor influencing the length of the ID is the time. For a 53bit ID this would mean only 31bits are available for the time. 31bits is only enough for 24 days (2147483648/(1000*60*60*24)) of time. Reducing the resolution of the timestamp would prevent a K-sorted resolution of 1 second or less. Reducing the configured machine identity or sequence number by 1bit would mean we couldn’t scale Twitter, or operate our infrastructure in an uncoordinated high-available way. Interesting ... so you have the theoretical capacity to scale to 2**22 (about 4 million) tweets per millisecond? Even 4 million tweets a second seems unrealistic, as does a single "machine" only being able to generate 4096 IDs. I think if you're really expecting this kind of volume, the FPGA vendors probably can help you out. We are talking clocks and counters, here, right, not Javascript interpreters or robust linear regressions? ;-) Ah, well, I'll check back on you guys in 69 years to see how you're holding up. ;-) -- M. Edward (Ed) Borasky http://borasky-research.net http://twitter.com/znmeb "A mathematician is a device for turning coffee into theorems." - Paul Erdos -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
[twitter-dev] Re: Snowflake: An update and some very important information
Hey everyone, Thank you to all of you for your questions, patience and contributions to this thread. Hearing your views and knowing how you use the API helps us provide more information where there wasn't enough, and clarify details where there was ambiguity. I've collated the questions i've received from you directly, over Twitter to @twitterapi and through this list. I hope the comments below provide enough information to answer those questions and explain the reasoning being our decisions. Thanks for your support and patience, @themattharris 1) Will search.twitter.com also include id_str and in_reply_to_status_id_str? Yes, Search will include the String representations of those IDs. 2) Which fields are affected by this change? All IDs which are transmitted as Integers will have a String representation in the API response. Only Tweet IDs (which includes mentions and retweets) will be moving to new Snowflake IDs. Messages (DMs), Saved Searches and Users may change to a Snowflake ID scheme in the future but this isn’t planned for this year. We are adding String representations of the Integer IDs now so you can update all of your code to use the String representations throughout. to allow developers to make the change now for all the ID fields and be prepared should any other IDs break the 53bit boundary. 3) Which fields will have String representations? The fields which will have String representations are: id (DM, Saved Search, User, List ) in_reply_to_status_id in_reply_to_user_id new_id (Streaming only. Will be removed when Snowflake is enabled) current_user_retweet_id (When include_my_retweet=1 is passed) 4) Can you provide a complete Tweet example with Snowflake ID to test? [{"coordinates":null,"truncated":false,"created_at":"Thu Oct 14 22:20:15 + 2010","favorited":false,"entities":{"urls":[],"hashtags":[],"user_mentions":[{"name":"Matt Harris","id":777925,"id_str":"777925","indices":[0,14],"screen_name":"themattharris"}]},"text":"@themattharris hey how are things?","annotations":null,"contributors":[{"id":819797,"id_str":"819797","screen_name":"episod"}],"id":10765432100123456789,"id_str":"10765432100123456789","retweet_count":0,"geo":null,"retweeted":false,"in_reply_to_user_id":777925,"in_reply_to_user_id_str":"777925","in_reply_to_screen_name":"themattharris","user":{"id":6253282,"id_str":"6253282"},"source":"web","place":null,"in_reply_to_status_id":10586268426842688951,"in_reply_to_status_id_str":"10586268426842688951"}] 5) What is happening with new_id in the Streaming API? new_id and new_id_str will be switched off when or soon after Snowflake is enabled on November 4th. 6) Why not restrict IDs to 53bits? A Snowflake ID is composed: * 41bits for millisecond precision time (69 years) * 10bits for a configured machine identity (1024 machines) * 12bits for a sequence number (4096 per machine) The factor influencing the length of the ID is the time. For a 53bit ID this would mean only 31bits are available for the time. 31bits is only enough for 24 days (2147483648/(1000*60*60*24)) of time. Reducing the resolution of the timestamp would prevent a K-sorted resolution of 1 second or less. Reducing the configured machine identity or sequence number by 1bit would mean we couldn’t scale Twitter, or operate our infrastructure in an uncoordinated high-available way. 7) When will the 53bit Integer overflow happen? 24 days after Snowflake starts counting. 8) Is it safe to parse and store IDs as signed 64bit Integers? Yes. 9) Why offer both the String and Integer versions of the ID? The String representation is needed to ensure languages which cannot convert the >53bit Integer can still use the ID in other API requests. The Integer value is being retained for languages which can handle numbers >53bit and to prevent applications which have not converted from being cut-off from Twitter. 10) When ID is null what will the _str representation be? The _str representation will also be null. 11) Did you really mean ‘unsigned’ 64bit Integer? Strictly speaking the Snowflake is a signed 64bit long under the hood. That being said, we will never use the negative bit and won’t require the full 64bits for positive numbers for about 69 years: http://www.google.com/search?q=%282**41%29+%2F+%2860*60*24*1000%29+%2F+365 12) Why not make the strings opt-in? We did consider this as an option but decided against it for a number of reasons. The first reason is that the ID is fundamental to being able to work with the data from the API so receiving the correct ID shouldn’t be something you have to opt into. The second, more influential reason, is that making the _str representations opt-in would create significant, performance affecting issues for the API. On Mon, Oct 18, 2010 at 5:34 PM, themattharris wrote: > Thanks to @gotwalt for spotting the missing commas. > > Fixed JSON sample ... > > [ > { >"coordinates": null, >"truncated": false, >"created_at": "Thu Oct 14 22:20
[twitter-dev] Re: Snowflake: An update and some very important information
It's kind of cool that Craig has been programming professionally longer than some people on this list have even been alive! I've thought about the problem, and at least came up with a catchy name for it to go with the crises of yore: "Bitpocalypse". Since Twittelator uses the JSON library YAJL, my approach will be to correctly populate the "id" with an unsigned long long created from the "id_str" field at the point of data arrival. Then it's done and no other code needs changing. But I'll find out Friday! On Oct 19, 12:27 pm, Craig Hockenberry wrote: > This still gives you some API legacy that you need to maintain, but > it's a much cleaner approach than what is currently being proposed. > > -ch > > On Oct 19, 10:39 am, Tom van der Woerdt wrote: > > > > > I wouldn't blame this on JSON, because it's not JSON that has the > > problems, but JavaScript. All of my Objective-C apps that communicate > > use JSON as well, and they don't have the limitation. The issue does not > > apply to XML either - there's no type specification in XML. > > > As far as I know, this issue will only cause trouble for a few > > applications that work with JavaScript and depend on the IDs a lot. > > > My suggestion to solve this issue would be to introduce an additional > > parameter (just like include_rts, just with a different name) that turns > > all IDs into strings. No extra fields, just an additional optional > > parameter. Won't cause trouble for the applications that can't parse it > > and requires minimal implementation effort for developers. > > > I hope I'm not too late with my suggestion :-) > > > Tom > > > On 10/19/10 7:10 PM, Craig Hockenberry wrote: > > > > This approach feels wrong to me. The red flag is the duplication of > > > data within the payload: in 30+ years of professional development, > > > I've never seen that work out well. > > > > The root of the problem is that you've chosen to deliver data in a > > > format (JSON) that can't support integers with a value greater than > > > 2^53 bits. And some of your data uses 2^64 bits. > > > > The result is that you're working around the problem in a language by > > > using a string. Avoiding the root problem will encumber you with > > > legacy that you'll regret later. > > > > Look at your proposed solution from a different point-of-view: say you > > > have a language that can't handle Unicode well (e.g. BASIC or Ruby.) > > > Would you solve this problem by adding another field called > > > "text_ascii"? > > > > "text": "@themattharris hey how are things in K benhavn?". > > > "text_ascii": "@themattharris hey how are things in Kobenhavn?". > > > > Seems silly, yet that is exactly what you're doing for Javascript and > > > long integers. > > > > A part of this legacy in your payload is future confusion for > > > developers. Someone new to the Twitter API is going to be confused as > > > to why your ID values have both numeric and string representations. > > > And smart developers are going to lean towards the numeric > > > representation: > > > > * 8 bytes of storage for 10765432100123456789 instead of 20 bytes. > > > * Faster sorting (less data to compare.) > > > * Correct sorting: "011" and "10" have different order depending on > > > whether you're sorting the string or numeric representation. > > > > They'll eventually pay the price for choosing incorrectly. > > > > Every ID in the API is going to need documentation as a result. For > > > example, are place IDs affected by this change? And what about the IDs > > > returned by the Search API? (there's no mention of "since_id_str" and > > > "max_id_str" above.) > > > > Losing consistency with the XML format is also a problem. Unless > > > you're planning on adding _str elements to the XML payload, you're > > > presenting developers with a one-way street. A consumer of JSON > > > "id_str" can't easily change the format of data they want to consume. > > > > In my mind, you really only have two good choices at this point: > > > > 1) Limit Snowflake's ID space to 2^53 bits. Easier for developers, > > > harder for Twitter. > > > > 2) Make all Twitter IDs into strings. Easier for Twitter, harder for > > > developers. > > > > The second choice is obviously more disruptive, but if you really need > > > the ID space, it's the right one. Even if it means I need to make > > > major changes to my code. > > > > On Oct 18, 5:19 pm, Matt Harris wrote: > > >> Last week you may remember Twitter planned to enable the new Status ID > > >> generator - 'Snowflake' but didn't. The purpose of this email is to > > >> explain > > >> the reason why this didn't happen, what we are doing about it, and what > > >> the > > >> new release plan is. > > > >> So what is Snowflake? > > >> -- > > >> Snowflake is a service we will be using to generate unique Tweet IDs. > > >> These > > >> Tweet IDs are unique 64bit unsigned integers, which, instead of being > > >> sequential like the current IDs, are based on tim
[twitter-dev] Re: Snowflake: An update and some very important information
This still gives you some API legacy that you need to maintain, but it's a much cleaner approach than what is currently being proposed. -ch On Oct 19, 10:39 am, Tom van der Woerdt wrote: > I wouldn't blame this on JSON, because it's not JSON that has the > problems, but JavaScript. All of my Objective-C apps that communicate > use JSON as well, and they don't have the limitation. The issue does not > apply to XML either - there's no type specification in XML. > > As far as I know, this issue will only cause trouble for a few > applications that work with JavaScript and depend on the IDs a lot. > > My suggestion to solve this issue would be to introduce an additional > parameter (just like include_rts, just with a different name) that turns > all IDs into strings. No extra fields, just an additional optional > parameter. Won't cause trouble for the applications that can't parse it > and requires minimal implementation effort for developers. > > I hope I'm not too late with my suggestion :-) > > Tom > > On 10/19/10 7:10 PM, Craig Hockenberry wrote: > > > > > This approach feels wrong to me. The red flag is the duplication of > > data within the payload: in 30+ years of professional development, > > I've never seen that work out well. > > > The root of the problem is that you've chosen to deliver data in a > > format (JSON) that can't support integers with a value greater than > > 2^53 bits. And some of your data uses 2^64 bits. > > > The result is that you're working around the problem in a language by > > using a string. Avoiding the root problem will encumber you with > > legacy that you'll regret later. > > > Look at your proposed solution from a different point-of-view: say you > > have a language that can't handle Unicode well (e.g. BASIC or Ruby.) > > Would you solve this problem by adding another field called > > "text_ascii"? > > > "text": "@themattharris hey how are things in K benhavn?". > > "text_ascii": "@themattharris hey how are things in Kobenhavn?". > > > Seems silly, yet that is exactly what you're doing for Javascript and > > long integers. > > > A part of this legacy in your payload is future confusion for > > developers. Someone new to the Twitter API is going to be confused as > > to why your ID values have both numeric and string representations. > > And smart developers are going to lean towards the numeric > > representation: > > > * 8 bytes of storage for 10765432100123456789 instead of 20 bytes. > > * Faster sorting (less data to compare.) > > * Correct sorting: "011" and "10" have different order depending on > > whether you're sorting the string or numeric representation. > > > They'll eventually pay the price for choosing incorrectly. > > > Every ID in the API is going to need documentation as a result. For > > example, are place IDs affected by this change? And what about the IDs > > returned by the Search API? (there's no mention of "since_id_str" and > > "max_id_str" above.) > > > Losing consistency with the XML format is also a problem. Unless > > you're planning on adding _str elements to the XML payload, you're > > presenting developers with a one-way street. A consumer of JSON > > "id_str" can't easily change the format of data they want to consume. > > > In my mind, you really only have two good choices at this point: > > > 1) Limit Snowflake's ID space to 2^53 bits. Easier for developers, > > harder for Twitter. > > > 2) Make all Twitter IDs into strings. Easier for Twitter, harder for > > developers. > > > The second choice is obviously more disruptive, but if you really need > > the ID space, it's the right one. Even if it means I need to make > > major changes to my code. > > > On Oct 18, 5:19 pm, Matt Harris wrote: > >> Last week you may remember Twitter planned to enable the new Status ID > >> generator - 'Snowflake' but didn't. The purpose of this email is to explain > >> the reason why this didn't happen, what we are doing about it, and what the > >> new release plan is. > > >> So what is Snowflake? > >> -- > >> Snowflake is a service we will be using to generate unique Tweet IDs. These > >> Tweet IDs are unique 64bit unsigned integers, which, instead of being > >> sequential like the current IDs, are based on time. The full ID is composed > >> of a timestamp, a worker number, and a sequence number. > > >> The problem > >> - > >> Before launch it came to our attention that some programming languages such > >> as Javascript cannot support numbers with >53bits. This can be easily > >> examined by running a command similar to: (90071992547409921).toString() in > >> your browsers console or by running the following JSON snippet through your > >> JSON parser. > > >> {"id": 10765432100123456789, "id_str": "10765432100123456789"} > > >> In affected JSON parsers the ID will not be converted successfully and will > >> lose accuracy. In some parsers there may even be an exception. > > >> The solution > >>
Re: [twitter-dev] Re: Snowflake: An update and some very important information
You normally don't need all 53 bits - but as snowflake simply skips a few IDs every now and then, you get a lot more IDs. Chance of it actually reaching 53 bits? I'd say that it happens at the end of November... Friday the 26th? Tom On 10/19/10 8:04 PM, M. Edward (Ed) Borasky wrote: > With all due respect, the root of the problem is that "computer > scientists" think in terms of abstract machines with infinitely-wide > registers, infinitely many addressable RAM cells, etc., and "business > people" think in terms of human populations and their tweet rates > growing geometrically for all time. Journalists believe neither of > these. ;-) And neither assumption is realistic, which is why we have to > make decisions like this from time to time, and why sometimes we predict > disasters like Y2K or 32-bit machines crumbling in 2038 that don't > actually happen. ;-) > > So - for Twitter: what is your *realistic* projection for when a 53-bit > integer ID will overflow? What are the underlying assumptions about > human population growth, spread of Twitter, revenue models, competition, > etc.? I know this is all highly confidential, so for sake of argument, > assume current tweet rates per user and the goal your executives have > stated of a billion users, with a plateau at that point. The question > I'm asking is whether you *really* need 64-bit integer IDs for tweets or > for users. ;-) > > By the way, I ask similar questions of all the "big data" geeks out > there - so many naked emperors, so little time. ;-) > -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
Re: [twitter-dev] Re: Snowflake: An update and some very important information
With all due respect, the root of the problem is that "computer scientists" think in terms of abstract machines with infinitely-wide registers, infinitely many addressable RAM cells, etc., and "business people" think in terms of human populations and their tweet rates growing geometrically for all time. Journalists believe neither of these. ;-) And neither assumption is realistic, which is why we have to make decisions like this from time to time, and why sometimes we predict disasters like Y2K or 32-bit machines crumbling in 2038 that don't actually happen. ;-) So - for Twitter: what is your *realistic* projection for when a 53-bit integer ID will overflow? What are the underlying assumptions about human population growth, spread of Twitter, revenue models, competition, etc.? I know this is all highly confidential, so for sake of argument, assume current tweet rates per user and the goal your executives have stated of a billion users, with a plateau at that point. The question I'm asking is whether you *really* need 64-bit integer IDs for tweets or for users. ;-) By the way, I ask similar questions of all the "big data" geeks out there - so many naked emperors, so little time. ;-) -- M. Edward (Ed) Borasky http://borasky-research.net http://twitter.com/znmeb "A mathematician is a device for turning coffee into theorems." - Paul Erdos Quoting Craig Hockenberry : This approach feels wrong to me. The red flag is the duplication of data within the payload: in 30+ years of professional development, I've never seen that work out well. The root of the problem is that you've chosen to deliver data in a format (JSON) that can't support integers with a value greater than 2^53 bits. And some of your data uses 2^64 bits. The result is that you're working around the problem in a language by using a string. Avoiding the root problem will encumber you with legacy that you'll regret later. Look at your proposed solution from a different point-of-view: say you have a language that can't handle Unicode well (e.g. BASIC or Ruby.) Would you solve this problem by adding another field called "text_ascii"? "text": "@themattharris hey how are things in København?". "text_ascii": "@themattharris hey how are things in Kobenhavn?". Seems silly, yet that is exactly what you're doing for Javascript and long integers. A part of this legacy in your payload is future confusion for developers. Someone new to the Twitter API is going to be confused as to why your ID values have both numeric and string representations. And smart developers are going to lean towards the numeric representation: * 8 bytes of storage for 10765432100123456789 instead of 20 bytes. * Faster sorting (less data to compare.) * Correct sorting: "011" and "10" have different order depending on whether you're sorting the string or numeric representation. They'll eventually pay the price for choosing incorrectly. Every ID in the API is going to need documentation as a result. For example, are place IDs affected by this change? And what about the IDs returned by the Search API? (there's no mention of "since_id_str" and "max_id_str" above.) Losing consistency with the XML format is also a problem. Unless you're planning on adding _str elements to the XML payload, you're presenting developers with a one-way street. A consumer of JSON "id_str" can't easily change the format of data they want to consume. In my mind, you really only have two good choices at this point: 1) Limit Snowflake's ID space to 2^53 bits. Easier for developers, harder for Twitter. 2) Make all Twitter IDs into strings. Easier for Twitter, harder for developers. The second choice is obviously more disruptive, but if you really need the ID space, it's the right one. Even if it means I need to make major changes to my code. On Oct 18, 5:19 pm, Matt Harris wrote: Last week you may remember Twitter planned to enable the new Status ID generator - 'Snowflake' but didn't. The purpose of this email is to explain the reason why this didn't happen, what we are doing about it, and what the new release plan is. So what is Snowflake? -- Snowflake is a service we will be using to generate unique Tweet IDs. These Tweet IDs are unique 64bit unsigned integers, which, instead of being sequential like the current IDs, are based on time. The full ID is composed of a timestamp, a worker number, and a sequence number. The problem - Before launch it came to our attention that some programming languages such as Javascript cannot support numbers with >53bits. This can be easily examined by running a command similar to: (90071992547409921).toString() in your browsers console or by running the following JSON snippet through your JSON parser. {"id": 10765432100123456789, "id_str": "10765432100123456789"} In affected JSON parsers the ID will not be converted successfully and will lose accuracy. In some parsers there may
Re: [twitter-dev] Re: Snowflake: An update and some very important information
I wouldn't blame this on JSON, because it's not JSON that has the problems, but JavaScript. All of my Objective-C apps that communicate use JSON as well, and they don't have the limitation. The issue does not apply to XML either - there's no type specification in XML. As far as I know, this issue will only cause trouble for a few applications that work with JavaScript and depend on the IDs a lot. My suggestion to solve this issue would be to introduce an additional parameter (just like include_rts, just with a different name) that turns all IDs into strings. No extra fields, just an additional optional parameter. Won't cause trouble for the applications that can't parse it and requires minimal implementation effort for developers. I hope I'm not too late with my suggestion :-) Tom On 10/19/10 7:10 PM, Craig Hockenberry wrote: > This approach feels wrong to me. The red flag is the duplication of > data within the payload: in 30+ years of professional development, > I've never seen that work out well. > > The root of the problem is that you've chosen to deliver data in a > format (JSON) that can't support integers with a value greater than > 2^53 bits. And some of your data uses 2^64 bits. > > The result is that you're working around the problem in a language by > using a string. Avoiding the root problem will encumber you with > legacy that you'll regret later. > > Look at your proposed solution from a different point-of-view: say you > have a language that can't handle Unicode well (e.g. BASIC or Ruby.) > Would you solve this problem by adding another field called > "text_ascii"? > > "text": "@themattharris hey how are things in København?". > "text_ascii": "@themattharris hey how are things in Kobenhavn?". > > Seems silly, yet that is exactly what you're doing for Javascript and > long integers. > > A part of this legacy in your payload is future confusion for > developers. Someone new to the Twitter API is going to be confused as > to why your ID values have both numeric and string representations. > And smart developers are going to lean towards the numeric > representation: > > * 8 bytes of storage for 10765432100123456789 instead of 20 bytes. > * Faster sorting (less data to compare.) > * Correct sorting: "011" and "10" have different order depending on > whether you're sorting the string or numeric representation. > > They'll eventually pay the price for choosing incorrectly. > > Every ID in the API is going to need documentation as a result. For > example, are place IDs affected by this change? And what about the IDs > returned by the Search API? (there's no mention of "since_id_str" and > "max_id_str" above.) > > Losing consistency with the XML format is also a problem. Unless > you're planning on adding _str elements to the XML payload, you're > presenting developers with a one-way street. A consumer of JSON > "id_str" can't easily change the format of data they want to consume. > > In my mind, you really only have two good choices at this point: > > 1) Limit Snowflake's ID space to 2^53 bits. Easier for developers, > harder for Twitter. > > 2) Make all Twitter IDs into strings. Easier for Twitter, harder for > developers. > > The second choice is obviously more disruptive, but if you really need > the ID space, it's the right one. Even if it means I need to make > major changes to my code. > > > On Oct 18, 5:19 pm, Matt Harris wrote: >> Last week you may remember Twitter planned to enable the new Status ID >> generator - 'Snowflake' but didn't. The purpose of this email is to explain >> the reason why this didn't happen, what we are doing about it, and what the >> new release plan is. >> >> So what is Snowflake? >> -- >> Snowflake is a service we will be using to generate unique Tweet IDs. These >> Tweet IDs are unique 64bit unsigned integers, which, instead of being >> sequential like the current IDs, are based on time. The full ID is composed >> of a timestamp, a worker number, and a sequence number. >> >> The problem >> - >> Before launch it came to our attention that some programming languages such >> as Javascript cannot support numbers with >53bits. This can be easily >> examined by running a command similar to: (90071992547409921).toString() in >> your browsers console or by running the following JSON snippet through your >> JSON parser. >> >> {"id": 10765432100123456789, "id_str": "10765432100123456789"} >> >> In affected JSON parsers the ID will not be converted successfully and will >> lose accuracy. In some parsers there may even be an exception. >> >> The solution >> >> To allow javascript and JSON parsers to read the IDs we need to include a >> string version of any ID when responding in the JSON format. What this means >> is Status, User, Direct Message and Saved Search IDs in the Twitter API will >> now be returned as an integer and a string in JSON responses. This will >> apply to the main Twi
Re: [twitter-dev] Re: Snowflake: An update and some very important information
I did some investigation into the snowflake algorithm recently and yes, it's safe for 64bit signed longs. Even if Twitter moved away from using scala/java longs internally (which are definitely signed), you'd still have something like 65 years from now before the algorithm rolled past the 2^63-1 barrier. I've posted a a gist[1] in ruby with a few little methods for playing with the time part of a snowflake id if you're interested. Hayes 1 http://gist.github.com/634586 On Tue, Oct 19, 2010 at 6:27 AM, Dan Checkoway wrote: > I'm also patiently awaiting a response from twitter about this. Are the > ids sane for 64-bit *signed* long? > > Dan > > > On Mon, Oct 18, 2010 at 9:08 PM, jon wrote: > >> Hi, >> >> You wrote that the IDs are "unsigned" 64 bit ints, but the IdWorker is >> pumping out java Longs which are signed. I'm assuming that was a >> typo, but please clarify. >> >> >> http://github.com/twitter/snowflake/blob/master/src/main/scala/com/twitter/service/snowflake/IdWorker.scala >> >> Thanks, >> >> - Jon >> >> On Oct 18, 8:19 pm, Matt Harris wrote: >> > Last week you may remember Twitter planned to enable the new Status ID >> > generator - 'Snowflake' but didn't. The purpose of this email is to >> explain >> > the reason why this didn't happen, what we are doing about it, and what >> the >> > new release plan is. >> > >> > So what is Snowflake? >> > -- >> > Snowflake is a service we will be using to generate unique Tweet IDs. >> These >> > Tweet IDs are unique 64bit unsigned integers, which, instead of being >> > sequential like the current IDs, are based on time. The full ID is >> composed >> > of a timestamp, a worker number, and a sequence number. >> > >> > The problem >> > - >> > Before launch it came to our attention that some programming languages >> such >> > as Javascript cannot support numbers with >53bits. This can be easily >> > examined by running a command similar to: (90071992547409921).toString() >> in >> > your browsers console or by running the following JSON snippet through >> your >> > JSON parser. >> > >> > {"id": 10765432100123456789, "id_str": "10765432100123456789"} >> > >> > In affected JSON parsers the ID will not be converted successfully and >> will >> > lose accuracy. In some parsers there may even be an exception. >> > >> > The solution >> > >> > To allow javascript and JSON parsers to read the IDs we need to include >> a >> > string version of any ID when responding in the JSON format. What this >> means >> > is Status, User, Direct Message and Saved Search IDs in the Twitter API >> will >> > now be returned as an integer and a string in JSON responses. This will >> > apply to the main Twitter API, the Streaming API and the Search API. >> > >> > For example, a status object will now contain an id and an id_str. The >> > following JSON representation of a status object shows the two versions >> of >> > the ID fields for each data point. >> > >> > [ >> > { >> > "coordinates": null, >> > "truncated": false, >> > "created_at": "Thu Oct 14 22:20:15 + 2010", >> > "favorited": false, >> > "entities": { >> > "urls": [ >> > ], >> > "hashtags": [ >> > ], >> > "user_mentions": [ >> > { >> > "name": "Matt Harris", >> > "id": 777925, >> > "id_str": "777925", >> > "indices": [ >> > 0, >> > 14 >> > ], >> > "screen_name": "themattharris" >> > } >> > ] >> > }, >> > "text": "@themattharris hey how are things?", >> > "annotations": null, >> > "contributors": [ >> > { >> > "id": 819797, >> > "id_str": "819797", >> > "screen_name": "episod" >> > } >> > ], >> > "id": 12738165059, >> > "id_str": "12738165059", >> > "retweet_count": 0, >> > "geo": null, >> > "retweeted": false, >> > "in_reply_to_user_id": 777925, >> > "in_reply_to_user_id_str": "777925", >> > "in_reply_to_screen_name": "themattharris", >> > "user": { >> > "id": 6253282 >> > "id_str": "6253282" >> > }, >> > "source": "web", >> > "place": null, >> > "in_reply_to_status_id": 12738040524 >> > "in_reply_to_status_id_str": "12738040524" >> > } >> > ] >> > >> > What should you do - RIGHT NOW >> > -- >> > The first thing you should do is attempt to decode the JSON snippet >> above >> > using your production code parser. Observe the output to confirm the ID >> has >> > not lost accuracy. >> > >> > What you do next depends on what happens: >> > >> > * If your code converts the ID successfully without losing accuracy you >> are >> > OK but should consider converting to the _str versions of IDs as soon as >> > possible. >> > * If your code has lost accuracy, convert your code to using the _str >> > version immediately. If you do not do this you
[twitter-dev] Re: Snowflake: An update and some very important information
This approach feels wrong to me. The red flag is the duplication of data within the payload: in 30+ years of professional development, I've never seen that work out well. The root of the problem is that you've chosen to deliver data in a format (JSON) that can't support integers with a value greater than 2^53 bits. And some of your data uses 2^64 bits. The result is that you're working around the problem in a language by using a string. Avoiding the root problem will encumber you with legacy that you'll regret later. Look at your proposed solution from a different point-of-view: say you have a language that can't handle Unicode well (e.g. BASIC or Ruby.) Would you solve this problem by adding another field called "text_ascii"? "text": "@themattharris hey how are things in København?". "text_ascii": "@themattharris hey how are things in Kobenhavn?". Seems silly, yet that is exactly what you're doing for Javascript and long integers. A part of this legacy in your payload is future confusion for developers. Someone new to the Twitter API is going to be confused as to why your ID values have both numeric and string representations. And smart developers are going to lean towards the numeric representation: * 8 bytes of storage for 10765432100123456789 instead of 20 bytes. * Faster sorting (less data to compare.) * Correct sorting: "011" and "10" have different order depending on whether you're sorting the string or numeric representation. They'll eventually pay the price for choosing incorrectly. Every ID in the API is going to need documentation as a result. For example, are place IDs affected by this change? And what about the IDs returned by the Search API? (there's no mention of "since_id_str" and "max_id_str" above.) Losing consistency with the XML format is also a problem. Unless you're planning on adding _str elements to the XML payload, you're presenting developers with a one-way street. A consumer of JSON "id_str" can't easily change the format of data they want to consume. In my mind, you really only have two good choices at this point: 1) Limit Snowflake's ID space to 2^53 bits. Easier for developers, harder for Twitter. 2) Make all Twitter IDs into strings. Easier for Twitter, harder for developers. The second choice is obviously more disruptive, but if you really need the ID space, it's the right one. Even if it means I need to make major changes to my code. On Oct 18, 5:19 pm, Matt Harris wrote: > Last week you may remember Twitter planned to enable the new Status ID > generator - 'Snowflake' but didn't. The purpose of this email is to explain > the reason why this didn't happen, what we are doing about it, and what the > new release plan is. > > So what is Snowflake? > -- > Snowflake is a service we will be using to generate unique Tweet IDs. These > Tweet IDs are unique 64bit unsigned integers, which, instead of being > sequential like the current IDs, are based on time. The full ID is composed > of a timestamp, a worker number, and a sequence number. > > The problem > - > Before launch it came to our attention that some programming languages such > as Javascript cannot support numbers with >53bits. This can be easily > examined by running a command similar to: (90071992547409921).toString() in > your browsers console or by running the following JSON snippet through your > JSON parser. > > {"id": 10765432100123456789, "id_str": "10765432100123456789"} > > In affected JSON parsers the ID will not be converted successfully and will > lose accuracy. In some parsers there may even be an exception. > > The solution > > To allow javascript and JSON parsers to read the IDs we need to include a > string version of any ID when responding in the JSON format. What this means > is Status, User, Direct Message and Saved Search IDs in the Twitter API will > now be returned as an integer and a string in JSON responses. This will > apply to the main Twitter API, the Streaming API and the Search API. > > For example, a status object will now contain an id and an id_str. The > following JSON representation of a status object shows the two versions of > the ID fields for each data point. > > [ > { > "coordinates": null, > "truncated": false, > "created_at": "Thu Oct 14 22:20:15 + 2010", > "favorited": false, > "entities": { > "urls": [ > ], > "hashtags": [ > ], > "user_mentions": [ > { > "name": "Matt Harris", > "id": 777925, > "id_str": "777925", > "indices": [ > 0, > 14 > ], > "screen_name": "themattharris" > } > ] > }, > "text": "@themattharris hey how are things?", > "annotations": null, > "contributors": [ > { > "id": 819797, > "id_str": "819797", > "screen_name": "episod" > } > ], > "id": 12738165059, > "
Re: [twitter-dev] Re: Snowflake: An update and some very important information
Java doesn't have unsigned types, so a (signed) long is the only way to transfer the value. IIRC from peeking at that code, the top bit is unused, which would mean there's no danger of creating an id value that's ambiguous. Storing and comparing ids in signed or unsigned 64-bit longs should be fine. -- brion vibber (brion @ status.net) On Oct 19, 2010 5:52 AM, "jon" wrote: > Hi, > > You wrote that the IDs are "unsigned" 64 bit ints, but the IdWorker is > pumping out java Longs which are signed. I'm assuming that was a > typo, but please clarify. > > http://github.com/twitter/snowflake/blob/master/src/main/scala/com/twitter/service/snowflake/IdWorker.scala > > Thanks, > > - Jon > > On Oct 18, 8:19 pm, Matt Harris wrote: >> Last week you may remember Twitter planned to enable the new Status ID >> generator - 'Snowflake' but didn't. The purpose of this email is to explain >> the reason why this didn't happen, what we are doing about it, and what the >> new release plan is. >> >> So what is Snowflake? >> -- >> Snowflake is a service we will be using to generate unique Tweet IDs. These >> Tweet IDs are unique 64bit unsigned integers, which, instead of being >> sequential like the current IDs, are based on time. The full ID is composed >> of a timestamp, a worker number, and a sequence number. >> >> The problem >> - >> Before launch it came to our attention that some programming languages such >> as Javascript cannot support numbers with >53bits. This can be easily >> examined by running a command similar to: (90071992547409921).toString() in >> your browsers console or by running the following JSON snippet through your >> JSON parser. >> >> {"id": 10765432100123456789, "id_str": "10765432100123456789"} >> >> In affected JSON parsers the ID will not be converted successfully and will >> lose accuracy. In some parsers there may even be an exception. >> >> The solution >> >> To allow javascript and JSON parsers to read the IDs we need to include a >> string version of any ID when responding in the JSON format. What this means >> is Status, User, Direct Message and Saved Search IDs in the Twitter API will >> now be returned as an integer and a string in JSON responses. This will >> apply to the main Twitter API, the Streaming API and the Search API. >> >> For example, a status object will now contain an id and an id_str. The >> following JSON representation of a status object shows the two versions of >> the ID fields for each data point. >> >> [ >> { >> "coordinates": null, >> "truncated": false, >> "created_at": "Thu Oct 14 22:20:15 + 2010", >> "favorited": false, >> "entities": { >> "urls": [ >> ], >> "hashtags": [ >> ], >> "user_mentions": [ >> { >> "name": "Matt Harris", >> "id": 777925, >> "id_str": "777925", >> "indices": [ >> 0, >> 14 >> ], >> "screen_name": "themattharris" >> } >> ] >> }, >> "text": "@themattharris hey how are things?", >> "annotations": null, >> "contributors": [ >> { >> "id": 819797, >> "id_str": "819797", >> "screen_name": "episod" >> } >> ], >> "id": 12738165059, >> "id_str": "12738165059", >> "retweet_count": 0, >> "geo": null, >> "retweeted": false, >> "in_reply_to_user_id": 777925, >> "in_reply_to_user_id_str": "777925", >> "in_reply_to_screen_name": "themattharris", >> "user": { >> "id": 6253282 >> "id_str": "6253282" >> }, >> "source": "web", >> "place": null, >> "in_reply_to_status_id": 12738040524 >> "in_reply_to_status_id_str": "12738040524" >> } >> ] >> >> What should you do - RIGHT NOW >> -- >> The first thing you should do is attempt to decode the JSON snippet above >> using your production code parser. Observe the output to confirm the ID has >> not lost accuracy. >> >> What you do next depends on what happens: >> >> * If your code converts the ID successfully without losing accuracy you are >> OK but should consider converting to the _str versions of IDs as soon as >> possible. >> * If your code has lost accuracy, convert your code to using the _str >> version immediately. If you do not do this your code will be unable to >> interact with the Twitter API reliably. >> * In some language parsers, the JSON may throw an exception when reading the >> ID value. If this happens in your parser you will need to ‘pre-parse’ the >> data, removing or replacing ID parameters with their _str versions. >> >> Summary >> - >> 1) If you develop in Javascript, know that you will have to update your code >> to read the string version instead of the integer version. >> >> 2) If you use a JSON decoder, validate that the example JSON, above, decodes >> without throwing exceptions. If except
Re: [twitter-dev] Re: Snowflake: An update and some very important information
I'm also patiently awaiting a response from twitter about this. Are the ids sane for 64-bit *signed* long? Dan On Mon, Oct 18, 2010 at 9:08 PM, jon wrote: > Hi, > > You wrote that the IDs are "unsigned" 64 bit ints, but the IdWorker is > pumping out java Longs which are signed. I'm assuming that was a > typo, but please clarify. > > > http://github.com/twitter/snowflake/blob/master/src/main/scala/com/twitter/service/snowflake/IdWorker.scala > > Thanks, > > - Jon > > On Oct 18, 8:19 pm, Matt Harris wrote: > > Last week you may remember Twitter planned to enable the new Status ID > > generator - 'Snowflake' but didn't. The purpose of this email is to > explain > > the reason why this didn't happen, what we are doing about it, and what > the > > new release plan is. > > > > So what is Snowflake? > > -- > > Snowflake is a service we will be using to generate unique Tweet IDs. > These > > Tweet IDs are unique 64bit unsigned integers, which, instead of being > > sequential like the current IDs, are based on time. The full ID is > composed > > of a timestamp, a worker number, and a sequence number. > > > > The problem > > - > > Before launch it came to our attention that some programming languages > such > > as Javascript cannot support numbers with >53bits. This can be easily > > examined by running a command similar to: (90071992547409921).toString() > in > > your browsers console or by running the following JSON snippet through > your > > JSON parser. > > > > {"id": 10765432100123456789, "id_str": "10765432100123456789"} > > > > In affected JSON parsers the ID will not be converted successfully and > will > > lose accuracy. In some parsers there may even be an exception. > > > > The solution > > > > To allow javascript and JSON parsers to read the IDs we need to include a > > string version of any ID when responding in the JSON format. What this > means > > is Status, User, Direct Message and Saved Search IDs in the Twitter API > will > > now be returned as an integer and a string in JSON responses. This will > > apply to the main Twitter API, the Streaming API and the Search API. > > > > For example, a status object will now contain an id and an id_str. The > > following JSON representation of a status object shows the two versions > of > > the ID fields for each data point. > > > > [ > > { > > "coordinates": null, > > "truncated": false, > > "created_at": "Thu Oct 14 22:20:15 + 2010", > > "favorited": false, > > "entities": { > > "urls": [ > > ], > > "hashtags": [ > > ], > > "user_mentions": [ > > { > > "name": "Matt Harris", > > "id": 777925, > > "id_str": "777925", > > "indices": [ > > 0, > > 14 > > ], > > "screen_name": "themattharris" > > } > > ] > > }, > > "text": "@themattharris hey how are things?", > > "annotations": null, > > "contributors": [ > > { > > "id": 819797, > > "id_str": "819797", > > "screen_name": "episod" > > } > > ], > > "id": 12738165059, > > "id_str": "12738165059", > > "retweet_count": 0, > > "geo": null, > > "retweeted": false, > > "in_reply_to_user_id": 777925, > > "in_reply_to_user_id_str": "777925", > > "in_reply_to_screen_name": "themattharris", > > "user": { > > "id": 6253282 > > "id_str": "6253282" > > }, > > "source": "web", > > "place": null, > > "in_reply_to_status_id": 12738040524 > > "in_reply_to_status_id_str": "12738040524" > > } > > ] > > > > What should you do - RIGHT NOW > > -- > > The first thing you should do is attempt to decode the JSON snippet above > > using your production code parser. Observe the output to confirm the ID > has > > not lost accuracy. > > > > What you do next depends on what happens: > > > > * If your code converts the ID successfully without losing accuracy you > are > > OK but should consider converting to the _str versions of IDs as soon as > > possible. > > * If your code has lost accuracy, convert your code to using the _str > > version immediately. If you do not do this your code will be unable to > > interact with the Twitter API reliably. > > * In some language parsers, the JSON may throw an exception when reading > the > > ID value. If this happens in your parser you will need to ‘pre-parse’ the > > data, removing or replacing ID parameters with their _str versions. > > > > Summary > > - > > 1) If you develop in Javascript, know that you will have to update your > code > > to read the string version instead of the integer version. > > > > 2) If you use a JSON decoder, validate that the example JSON, above, > decodes > > without throwing exceptions. If exceptions are thrown, you will need to > > pre-parse the data. Please let u
Re: [twitter-dev] Re: Snowflake: An update and some very important information
Probably null as well. Tom On 10/19/10 11:49 AM, Reivax wrote: > Hi > > In the case where an id is null (as in "in_reply_to_status_id":null ) > what will the value of "in_reply_to_status_id_str" be ? > > Thanks > Xavier > > On 19 oct, 02:34, themattharris wrote: >> Thanks to @gotwalt for spotting the missing commas. >> >> Fixed JSON sample ... >> >> [ >> { >> "coordinates": null, >> "truncated": false, >> "created_at": "Thu Oct 14 22:20:15 + 2010", >> "favorited": false, >> "entities": { >> "urls": [ >> ], >> "hashtags": [ >> ], >> "user_mentions": [ >> { >> "name": "Matt Harris", >> "id": 777925, >> "id_str": "777925", >> "indices": [ >> 0, >> 14 >> ], >> "screen_name": "themattharris" >> } >> ] >> }, >> "text": "@themattharris hey how are things?", >> "annotations": null, >> "contributors": [ >> { >> "id": 819797, >> "id_str": "819797", >> "screen_name": "episod" >> } >> ], >> "id": 12738165059, >> "id_str": "12738165059", >> "retweet_count": 0, >> "geo": null, >> "retweeted": false, >> "in_reply_to_user_id": 777925, >> "in_reply_to_user_id_str": "777925", >> "in_reply_to_screen_name": "themattharris", >> "user": { >> "id": 6253282, >> "id_str": "6253282" >> }, >> "source": "web", >> "place": null, >> "in_reply_to_status_id": 12738040524, >> "in_reply_to_status_id_str": "12738040524" >> } >> ] >> >> Best, >> @themattharris >> >> On Oct 18, 5:19 pm, Matt Harris wrote: >> >>> Last week you may remember Twitter planned to enable the new Status ID >>> generator - 'Snowflake' but didn't. The purpose of this email is to explain >>> the reason why this didn't happen, what we are doing about it, and what the >>> new release plan is. >> >>> So what is Snowflake? >>> -- >>> Snowflake is a service we will be using to generate unique Tweet IDs. These >>> Tweet IDs are unique 64bit unsigned integers, which, instead of being >>> sequential like the current IDs, are based on time. The full ID is composed >>> of a timestamp, a worker number, and a sequence number. >> >>> The problem >>> - >>> Before launch it came to our attention that some programming languages such >>> as Javascript cannot support numbers with >53bits. This can be easily >>> examined by running a command similar to: (90071992547409921).toString() in >>> your browsers console or by running the following JSON snippet through your >>> JSON parser. >> >>> {"id": 10765432100123456789, "id_str": "10765432100123456789"} >> >>> In affected JSON parsers the ID will not be converted successfully and will >>> lose accuracy. In some parsers there may even be an exception. >> >>> The solution >>> >>> To allow javascript and JSON parsers to read the IDs we need to include a >>> string version of any ID when responding in the JSON format. What this means >>> is Status, User, Direct Message and Saved Search IDs in the Twitter API will >>> now be returned as an integer and a string in JSON responses. This will >>> apply to the main Twitter API, the Streaming API and the Search API. >> >>> For example, a status object will now contain an id and an id_str. The >>> following JSON representation of a status object shows the two versions of >>> the ID fields for each data point. >> >>> [ >>> { >>> "coordinates": null, >>> "truncated": false, >>> "created_at": "Thu Oct 14 22:20:15 + 2010", >>> "favorited": false, >>> "entities": { >>> "urls": [ >>> ], >>> "hashtags": [ >>> ], >>> "user_mentions": [ >>> { >>> "name": "Matt Harris", >>> "id": 777925, >>> "id_str": "777925", >>> "indices": [ >>> 0, >>> 14 >>> ], >>> "screen_name": "themattharris" >>> } >>> ] >>> }, >>> "text": "@themattharris hey how are things?", >>> "annotations": null, >>> "contributors": [ >>> { >>> "id": 819797, >>> "id_str": "819797", >>> "screen_name": "episod" >>> } >>> ], >>> "id": 12738165059, >>> "id_str": "12738165059", >>> "retweet_count": 0, >>> "geo": null, >>> "retweeted": false, >>> "in_reply_to_user_id": 777925, >>> "in_reply_to_user_id_str": "777925", >>> "in_reply_to_screen_name": "themattharris", >>> "user": { >>> "id": 6253282 >>> "id_str": "6253282" >>> }, >>> "source": "web", >>> "place": null, >>> "in_reply_to_status_id": 12738040524 >>> "in_reply_to_status_id_str": "12738040524" >>> } >>> ] >> >>> What should you do - RIGHT NOW >>> -- >>> The first thing you should do is attempt to decode the JSON snippet
Re: [twitter-dev] Re: Snowflake: An update and some very important information
Because a lot of Twitter applications (including my own) would go crazy immediately. Tom On 10/19/10 6:09 AM, Detroitpro wrote: > Out of curiosity; what advantage is there of including both the int > and the string? Why not only offer the string and put the parsing on > the client side? > > Thanks, > @detroitpro > > On Oct 18, 8:34 pm, themattharris wrote: >> Thanks to @gotwalt for spotting the missing commas. >> >> Fixed JSON sample ... >> >> [ >> { >> "coordinates": null, >> "truncated": false, >> "created_at": "Thu Oct 14 22:20:15 + 2010", >> "favorited": false, >> "entities": { >> "urls": [ >> ], >> "hashtags": [ >> ], >> "user_mentions": [ >> { >> "name": "Matt Harris", >> "id": 777925, >> "id_str": "777925", >> "indices": [ >> 0, >> 14 >> ], >> "screen_name": "themattharris" >> } >> ] >> }, >> "text": "@themattharris hey how are things?", >> "annotations": null, >> "contributors": [ >> { >> "id": 819797, >> "id_str": "819797", >> "screen_name": "episod" >> } >> ], >> "id": 12738165059, >> "id_str": "12738165059", >> "retweet_count": 0, >> "geo": null, >> "retweeted": false, >> "in_reply_to_user_id": 777925, >> "in_reply_to_user_id_str": "777925", >> "in_reply_to_screen_name": "themattharris", >> "user": { >> "id": 6253282, >> "id_str": "6253282" >> }, >> "source": "web", >> "place": null, >> "in_reply_to_status_id": 12738040524, >> "in_reply_to_status_id_str": "12738040524" >> } >> ] >> >> Best, >> @themattharris >> >> On Oct 18, 5:19 pm, Matt Harris wrote: >> >>> Last week you may remember Twitter planned to enable the new Status ID >>> generator - 'Snowflake' but didn't. The purpose of this email is to explain >>> the reason why this didn't happen, what we are doing about it, and what the >>> new release plan is. >> >>> So what is Snowflake? >>> -- >>> Snowflake is a service we will be using to generate unique Tweet IDs. These >>> Tweet IDs are unique 64bit unsigned integers, which, instead of being >>> sequential like the current IDs, are based on time. The full ID is composed >>> of a timestamp, a worker number, and a sequence number. >> >>> The problem >>> - >>> Before launch it came to our attention that some programming languages such >>> as Javascript cannot support numbers with >53bits. This can be easily >>> examined by running a command similar to: (90071992547409921).toString() in >>> your browsers console or by running the following JSON snippet through your >>> JSON parser. >> >>> {"id": 10765432100123456789, "id_str": "10765432100123456789"} >> >>> In affected JSON parsers the ID will not be converted successfully and will >>> lose accuracy. In some parsers there may even be an exception. >> >>> The solution >>> >>> To allow javascript and JSON parsers to read the IDs we need to include a >>> string version of any ID when responding in the JSON format. What this means >>> is Status, User, Direct Message and Saved Search IDs in the Twitter API will >>> now be returned as an integer and a string in JSON responses. This will >>> apply to the main Twitter API, the Streaming API and the Search API. >> >>> For example, a status object will now contain an id and an id_str. The >>> following JSON representation of a status object shows the two versions of >>> the ID fields for each data point. >> >>> [ >>> { >>> "coordinates": null, >>> "truncated": false, >>> "created_at": "Thu Oct 14 22:20:15 + 2010", >>> "favorited": false, >>> "entities": { >>> "urls": [ >>> ], >>> "hashtags": [ >>> ], >>> "user_mentions": [ >>> { >>> "name": "Matt Harris", >>> "id": 777925, >>> "id_str": "777925", >>> "indices": [ >>> 0, >>> 14 >>> ], >>> "screen_name": "themattharris" >>> } >>> ] >>> }, >>> "text": "@themattharris hey how are things?", >>> "annotations": null, >>> "contributors": [ >>> { >>> "id": 819797, >>> "id_str": "819797", >>> "screen_name": "episod" >>> } >>> ], >>> "id": 12738165059, >>> "id_str": "12738165059", >>> "retweet_count": 0, >>> "geo": null, >>> "retweeted": false, >>> "in_reply_to_user_id": 777925, >>> "in_reply_to_user_id_str": "777925", >>> "in_reply_to_screen_name": "themattharris", >>> "user": { >>> "id": 6253282 >>> "id_str": "6253282" >>> }, >>> "source": "web", >>> "place": null, >>> "in_reply_to_status_id": 12738040524 >>> "in_reply_to_status_id_str": "12738040524" >>> } >>> ] >> >>> What should you do - RIGHT NOW >>> ---
[twitter-dev] Re: Snowflake: An update and some very important information
Hi In the case where an id is null (as in "in_reply_to_status_id":null ) what will the value of "in_reply_to_status_id_str" be ? Thanks Xavier On 19 oct, 02:34, themattharris wrote: > Thanks to @gotwalt for spotting the missing commas. > > Fixed JSON sample ... > > [ > { > "coordinates": null, > "truncated": false, > "created_at": "Thu Oct 14 22:20:15 + 2010", > "favorited": false, > "entities": { > "urls": [ > ], > "hashtags": [ > ], > "user_mentions": [ > { > "name": "Matt Harris", > "id": 777925, > "id_str": "777925", > "indices": [ > 0, > 14 > ], > "screen_name": "themattharris" > } > ] > }, > "text": "@themattharris hey how are things?", > "annotations": null, > "contributors": [ > { > "id": 819797, > "id_str": "819797", > "screen_name": "episod" > } > ], > "id": 12738165059, > "id_str": "12738165059", > "retweet_count": 0, > "geo": null, > "retweeted": false, > "in_reply_to_user_id": 777925, > "in_reply_to_user_id_str": "777925", > "in_reply_to_screen_name": "themattharris", > "user": { > "id": 6253282, > "id_str": "6253282" > }, > "source": "web", > "place": null, > "in_reply_to_status_id": 12738040524, > "in_reply_to_status_id_str": "12738040524" > } > ] > > Best, > @themattharris > > On Oct 18, 5:19 pm, Matt Harris wrote: > > > Last week you may remember Twitter planned to enable the new Status ID > > generator - 'Snowflake' but didn't. The purpose of this email is to explain > > the reason why this didn't happen, what we are doing about it, and what the > > new release plan is. > > > So what is Snowflake? > > -- > > Snowflake is a service we will be using to generate unique Tweet IDs. These > > Tweet IDs are unique 64bit unsigned integers, which, instead of being > > sequential like the current IDs, are based on time. The full ID is composed > > of a timestamp, a worker number, and a sequence number. > > > The problem > > - > > Before launch it came to our attention that some programming languages such > > as Javascript cannot support numbers with >53bits. This can be easily > > examined by running a command similar to: (90071992547409921).toString() in > > your browsers console or by running the following JSON snippet through your > > JSON parser. > > > {"id": 10765432100123456789, "id_str": "10765432100123456789"} > > > In affected JSON parsers the ID will not be converted successfully and will > > lose accuracy. In some parsers there may even be an exception. > > > The solution > > > > To allow javascript and JSON parsers to read the IDs we need to include a > > string version of any ID when responding in the JSON format. What this means > > is Status, User, Direct Message and Saved Search IDs in the Twitter API will > > now be returned as an integer and a string in JSON responses. This will > > apply to the main Twitter API, the Streaming API and the Search API. > > > For example, a status object will now contain an id and an id_str. The > > following JSON representation of a status object shows the two versions of > > the ID fields for each data point. > > > [ > > { > > "coordinates": null, > > "truncated": false, > > "created_at": "Thu Oct 14 22:20:15 + 2010", > > "favorited": false, > > "entities": { > > "urls": [ > > ], > > "hashtags": [ > > ], > > "user_mentions": [ > > { > > "name": "Matt Harris", > > "id": 777925, > > "id_str": "777925", > > "indices": [ > > 0, > > 14 > > ], > > "screen_name": "themattharris" > > } > > ] > > }, > > "text": "@themattharris hey how are things?", > > "annotations": null, > > "contributors": [ > > { > > "id": 819797, > > "id_str": "819797", > > "screen_name": "episod" > > } > > ], > > "id": 12738165059, > > "id_str": "12738165059", > > "retweet_count": 0, > > "geo": null, > > "retweeted": false, > > "in_reply_to_user_id": 777925, > > "in_reply_to_user_id_str": "777925", > > "in_reply_to_screen_name": "themattharris", > > "user": { > > "id": 6253282 > > "id_str": "6253282" > > }, > > "source": "web", > > "place": null, > > "in_reply_to_status_id": 12738040524 > > "in_reply_to_status_id_str": "12738040524" > > } > > ] > > > What should you do - RIGHT NOW > > -- > > The first thing you should do is attempt to decode the JSON snippet above > > using your production code parser. Observe the output to confirm the ID has > > not lost accuracy. > > > What you do next depends on what happ
[twitter-dev] Re: Snowflake: An update and some very important information
Out of curiosity; what advantage is there of including both the int and the string? Why not only offer the string and put the parsing on the client side? Thanks, @detroitpro On Oct 18, 8:34 pm, themattharris wrote: > Thanks to @gotwalt for spotting the missing commas. > > Fixed JSON sample ... > > [ > { > "coordinates": null, > "truncated": false, > "created_at": "Thu Oct 14 22:20:15 + 2010", > "favorited": false, > "entities": { > "urls": [ > ], > "hashtags": [ > ], > "user_mentions": [ > { > "name": "Matt Harris", > "id": 777925, > "id_str": "777925", > "indices": [ > 0, > 14 > ], > "screen_name": "themattharris" > } > ] > }, > "text": "@themattharris hey how are things?", > "annotations": null, > "contributors": [ > { > "id": 819797, > "id_str": "819797", > "screen_name": "episod" > } > ], > "id": 12738165059, > "id_str": "12738165059", > "retweet_count": 0, > "geo": null, > "retweeted": false, > "in_reply_to_user_id": 777925, > "in_reply_to_user_id_str": "777925", > "in_reply_to_screen_name": "themattharris", > "user": { > "id": 6253282, > "id_str": "6253282" > }, > "source": "web", > "place": null, > "in_reply_to_status_id": 12738040524, > "in_reply_to_status_id_str": "12738040524" > } > ] > > Best, > @themattharris > > On Oct 18, 5:19 pm, Matt Harris wrote: > > > Last week you may remember Twitter planned to enable the new Status ID > > generator - 'Snowflake' but didn't. The purpose of this email is to explain > > the reason why this didn't happen, what we are doing about it, and what the > > new release plan is. > > > So what is Snowflake? > > -- > > Snowflake is a service we will be using to generate unique Tweet IDs. These > > Tweet IDs are unique 64bit unsigned integers, which, instead of being > > sequential like the current IDs, are based on time. The full ID is composed > > of a timestamp, a worker number, and a sequence number. > > > The problem > > - > > Before launch it came to our attention that some programming languages such > > as Javascript cannot support numbers with >53bits. This can be easily > > examined by running a command similar to: (90071992547409921).toString() in > > your browsers console or by running the following JSON snippet through your > > JSON parser. > > > {"id": 10765432100123456789, "id_str": "10765432100123456789"} > > > In affected JSON parsers the ID will not be converted successfully and will > > lose accuracy. In some parsers there may even be an exception. > > > The solution > > > > To allow javascript and JSON parsers to read the IDs we need to include a > > string version of any ID when responding in the JSON format. What this means > > is Status, User, Direct Message and Saved Search IDs in the Twitter API will > > now be returned as an integer and a string in JSON responses. This will > > apply to the main Twitter API, the Streaming API and the Search API. > > > For example, a status object will now contain an id and an id_str. The > > following JSON representation of a status object shows the two versions of > > the ID fields for each data point. > > > [ > > { > > "coordinates": null, > > "truncated": false, > > "created_at": "Thu Oct 14 22:20:15 + 2010", > > "favorited": false, > > "entities": { > > "urls": [ > > ], > > "hashtags": [ > > ], > > "user_mentions": [ > > { > > "name": "Matt Harris", > > "id": 777925, > > "id_str": "777925", > > "indices": [ > > 0, > > 14 > > ], > > "screen_name": "themattharris" > > } > > ] > > }, > > "text": "@themattharris hey how are things?", > > "annotations": null, > > "contributors": [ > > { > > "id": 819797, > > "id_str": "819797", > > "screen_name": "episod" > > } > > ], > > "id": 12738165059, > > "id_str": "12738165059", > > "retweet_count": 0, > > "geo": null, > > "retweeted": false, > > "in_reply_to_user_id": 777925, > > "in_reply_to_user_id_str": "777925", > > "in_reply_to_screen_name": "themattharris", > > "user": { > > "id": 6253282 > > "id_str": "6253282" > > }, > > "source": "web", > > "place": null, > > "in_reply_to_status_id": 12738040524 > > "in_reply_to_status_id_str": "12738040524" > > } > > ] > > > What should you do - RIGHT NOW > > -- > > The first thing you should do is attempt to decode the JSON snippet above > > using your production code parser. Observe the output to confirm the ID has > > not lost accuracy. > > > What
[twitter-dev] Re: Snowflake: An update and some very important information
Hi, You wrote that the IDs are "unsigned" 64 bit ints, but the IdWorker is pumping out java Longs which are signed. I'm assuming that was a typo, but please clarify. http://github.com/twitter/snowflake/blob/master/src/main/scala/com/twitter/service/snowflake/IdWorker.scala Thanks, - Jon On Oct 18, 8:19 pm, Matt Harris wrote: > Last week you may remember Twitter planned to enable the new Status ID > generator - 'Snowflake' but didn't. The purpose of this email is to explain > the reason why this didn't happen, what we are doing about it, and what the > new release plan is. > > So what is Snowflake? > -- > Snowflake is a service we will be using to generate unique Tweet IDs. These > Tweet IDs are unique 64bit unsigned integers, which, instead of being > sequential like the current IDs, are based on time. The full ID is composed > of a timestamp, a worker number, and a sequence number. > > The problem > - > Before launch it came to our attention that some programming languages such > as Javascript cannot support numbers with >53bits. This can be easily > examined by running a command similar to: (90071992547409921).toString() in > your browsers console or by running the following JSON snippet through your > JSON parser. > > {"id": 10765432100123456789, "id_str": "10765432100123456789"} > > In affected JSON parsers the ID will not be converted successfully and will > lose accuracy. In some parsers there may even be an exception. > > The solution > > To allow javascript and JSON parsers to read the IDs we need to include a > string version of any ID when responding in the JSON format. What this means > is Status, User, Direct Message and Saved Search IDs in the Twitter API will > now be returned as an integer and a string in JSON responses. This will > apply to the main Twitter API, the Streaming API and the Search API. > > For example, a status object will now contain an id and an id_str. The > following JSON representation of a status object shows the two versions of > the ID fields for each data point. > > [ > { > "coordinates": null, > "truncated": false, > "created_at": "Thu Oct 14 22:20:15 + 2010", > "favorited": false, > "entities": { > "urls": [ > ], > "hashtags": [ > ], > "user_mentions": [ > { > "name": "Matt Harris", > "id": 777925, > "id_str": "777925", > "indices": [ > 0, > 14 > ], > "screen_name": "themattharris" > } > ] > }, > "text": "@themattharris hey how are things?", > "annotations": null, > "contributors": [ > { > "id": 819797, > "id_str": "819797", > "screen_name": "episod" > } > ], > "id": 12738165059, > "id_str": "12738165059", > "retweet_count": 0, > "geo": null, > "retweeted": false, > "in_reply_to_user_id": 777925, > "in_reply_to_user_id_str": "777925", > "in_reply_to_screen_name": "themattharris", > "user": { > "id": 6253282 > "id_str": "6253282" > }, > "source": "web", > "place": null, > "in_reply_to_status_id": 12738040524 > "in_reply_to_status_id_str": "12738040524" > } > ] > > What should you do - RIGHT NOW > -- > The first thing you should do is attempt to decode the JSON snippet above > using your production code parser. Observe the output to confirm the ID has > not lost accuracy. > > What you do next depends on what happens: > > * If your code converts the ID successfully without losing accuracy you are > OK but should consider converting to the _str versions of IDs as soon as > possible. > * If your code has lost accuracy, convert your code to using the _str > version immediately. If you do not do this your code will be unable to > interact with the Twitter API reliably. > * In some language parsers, the JSON may throw an exception when reading the > ID value. If this happens in your parser you will need to ‘pre-parse’ the > data, removing or replacing ID parameters with their _str versions. > > Summary > - > 1) If you develop in Javascript, know that you will have to update your code > to read the string version instead of the integer version. > > 2) If you use a JSON decoder, validate that the example JSON, above, decodes > without throwing exceptions. If exceptions are thrown, you will need to > pre-parse the data. Please let us know the name, version, and language of > the parser which throws the exception so we can investigate. > > Timeline > --- > by 22nd October 2010 (Friday): String versions of ID numbers will start > appearing in the API responses > 4th November 2010 (Thursday) : Snowflake will be turned on but at ~41bit > length > 26th November 2010 (Friday) : Status IDs will break 53bits in length and > cease being usable as Integers in Javascript
[twitter-dev] Re: Snowflake: An update and some very important information
Hi, Will user ids be generated by snowflake in the near future? Is it safe to parse and store them as signed 64bit integers? Thanks. On Oct 18, 8:34 pm, themattharris wrote: > Thanks to @gotwalt for spotting the missing commas. > > Fixed JSON sample ... > > [ > { > "coordinates": null, > "truncated": false, > "created_at": "Thu Oct 14 22:20:15 + 2010", > "favorited": false, > "entities": { > "urls": [ > ], > "hashtags": [ > ], > "user_mentions": [ > { > "name": "Matt Harris", > "id": 777925, > "id_str": "777925", > "indices": [ > 0, > 14 > ], > "screen_name": "themattharris" > } > ] > }, > "text": "@themattharris hey how are things?", > "annotations": null, > "contributors": [ > { > "id": 819797, > "id_str": "819797", > "screen_name": "episod" > } > ], > "id": 12738165059, > "id_str": "12738165059", > "retweet_count": 0, > "geo": null, > "retweeted": false, > "in_reply_to_user_id": 777925, > "in_reply_to_user_id_str": "777925", > "in_reply_to_screen_name": "themattharris", > "user": { > "id": 6253282, > "id_str": "6253282" > }, > "source": "web", > "place": null, > "in_reply_to_status_id": 12738040524, > "in_reply_to_status_id_str": "12738040524" > } > ] > > Best, > @themattharris > > On Oct 18, 5:19 pm, Matt Harris wrote: > > > > > Last week you may remember Twitter planned to enable the new Status ID > > generator - 'Snowflake' but didn't. The purpose of this email is to explain > > the reason why this didn't happen, what we are doing about it, and what the > > new release plan is. > > > So what is Snowflake? > > -- > > Snowflake is a service we will be using to generate unique Tweet IDs. These > > Tweet IDs are unique 64bit unsigned integers, which, instead of being > > sequential like the current IDs, are based on time. The full ID is composed > > of a timestamp, a worker number, and a sequence number. > > > The problem > > - > > Before launch it came to our attention that some programming languages such > > as Javascript cannot support numbers with >53bits. This can be easily > > examined by running a command similar to: (90071992547409921).toString() in > > your browsers console or by running the following JSON snippet through your > > JSON parser. > > > {"id": 10765432100123456789, "id_str": "10765432100123456789"} > > > In affected JSON parsers the ID will not be converted successfully and will > > lose accuracy. In some parsers there may even be an exception. > > > The solution > > > > To allow javascript and JSON parsers to read the IDs we need to include a > > string version of any ID when responding in the JSON format. What this means > > is Status, User, Direct Message and Saved Search IDs in the Twitter API will > > now be returned as an integer and a string in JSON responses. This will > > apply to the main Twitter API, the Streaming API and the Search API. > > > For example, a status object will now contain an id and an id_str. The > > following JSON representation of a status object shows the two versions of > > the ID fields for each data point. > > > [ > > { > > "coordinates": null, > > "truncated": false, > > "created_at": "Thu Oct 14 22:20:15 + 2010", > > "favorited": false, > > "entities": { > > "urls": [ > > ], > > "hashtags": [ > > ], > > "user_mentions": [ > > { > > "name": "Matt Harris", > > "id": 777925, > > "id_str": "777925", > > "indices": [ > > 0, > > 14 > > ], > > "screen_name": "themattharris" > > } > > ] > > }, > > "text": "@themattharris hey how are things?", > > "annotations": null, > > "contributors": [ > > { > > "id": 819797, > > "id_str": "819797", > > "screen_name": "episod" > > } > > ], > > "id": 12738165059, > > "id_str": "12738165059", > > "retweet_count": 0, > > "geo": null, > > "retweeted": false, > > "in_reply_to_user_id": 777925, > > "in_reply_to_user_id_str": "777925", > > "in_reply_to_screen_name": "themattharris", > > "user": { > > "id": 6253282 > > "id_str": "6253282" > > }, > > "source": "web", > > "place": null, > > "in_reply_to_status_id": 12738040524 > > "in_reply_to_status_id_str": "12738040524" > > } > > ] > > > What should you do - RIGHT NOW > > -- > > The first thing you should do is attempt to decode the JSON snippet above > > using your production code parser. Observe the output to confirm the ID has > > not lost accuracy. > > > What you do next depends on what happens
[twitter-dev] Re: Snowflake: An update and some very important information
Thanks to @gotwalt for spotting the missing commas. Fixed JSON sample ... [ { "coordinates": null, "truncated": false, "created_at": "Thu Oct 14 22:20:15 + 2010", "favorited": false, "entities": { "urls": [ ], "hashtags": [ ], "user_mentions": [ { "name": "Matt Harris", "id": 777925, "id_str": "777925", "indices": [ 0, 14 ], "screen_name": "themattharris" } ] }, "text": "@themattharris hey how are things?", "annotations": null, "contributors": [ { "id": 819797, "id_str": "819797", "screen_name": "episod" } ], "id": 12738165059, "id_str": "12738165059", "retweet_count": 0, "geo": null, "retweeted": false, "in_reply_to_user_id": 777925, "in_reply_to_user_id_str": "777925", "in_reply_to_screen_name": "themattharris", "user": { "id": 6253282, "id_str": "6253282" }, "source": "web", "place": null, "in_reply_to_status_id": 12738040524, "in_reply_to_status_id_str": "12738040524" } ] Best, @themattharris On Oct 18, 5:19 pm, Matt Harris wrote: > Last week you may remember Twitter planned to enable the new Status ID > generator - 'Snowflake' but didn't. The purpose of this email is to explain > the reason why this didn't happen, what we are doing about it, and what the > new release plan is. > > So what is Snowflake? > -- > Snowflake is a service we will be using to generate unique Tweet IDs. These > Tweet IDs are unique 64bit unsigned integers, which, instead of being > sequential like the current IDs, are based on time. The full ID is composed > of a timestamp, a worker number, and a sequence number. > > The problem > - > Before launch it came to our attention that some programming languages such > as Javascript cannot support numbers with >53bits. This can be easily > examined by running a command similar to: (90071992547409921).toString() in > your browsers console or by running the following JSON snippet through your > JSON parser. > > {"id": 10765432100123456789, "id_str": "10765432100123456789"} > > In affected JSON parsers the ID will not be converted successfully and will > lose accuracy. In some parsers there may even be an exception. > > The solution > > To allow javascript and JSON parsers to read the IDs we need to include a > string version of any ID when responding in the JSON format. What this means > is Status, User, Direct Message and Saved Search IDs in the Twitter API will > now be returned as an integer and a string in JSON responses. This will > apply to the main Twitter API, the Streaming API and the Search API. > > For example, a status object will now contain an id and an id_str. The > following JSON representation of a status object shows the two versions of > the ID fields for each data point. > > [ > { > "coordinates": null, > "truncated": false, > "created_at": "Thu Oct 14 22:20:15 + 2010", > "favorited": false, > "entities": { > "urls": [ > ], > "hashtags": [ > ], > "user_mentions": [ > { > "name": "Matt Harris", > "id": 777925, > "id_str": "777925", > "indices": [ > 0, > 14 > ], > "screen_name": "themattharris" > } > ] > }, > "text": "@themattharris hey how are things?", > "annotations": null, > "contributors": [ > { > "id": 819797, > "id_str": "819797", > "screen_name": "episod" > } > ], > "id": 12738165059, > "id_str": "12738165059", > "retweet_count": 0, > "geo": null, > "retweeted": false, > "in_reply_to_user_id": 777925, > "in_reply_to_user_id_str": "777925", > "in_reply_to_screen_name": "themattharris", > "user": { > "id": 6253282 > "id_str": "6253282" > }, > "source": "web", > "place": null, > "in_reply_to_status_id": 12738040524 > "in_reply_to_status_id_str": "12738040524" > } > ] > > What should you do - RIGHT NOW > -- > The first thing you should do is attempt to decode the JSON snippet above > using your production code parser. Observe the output to confirm the ID has > not lost accuracy. > > What you do next depends on what happens: > > * If your code converts the ID successfully without losing accuracy you are > OK but should consider converting to the _str versions of IDs as soon as > possible. > * If your code has lost accuracy, convert your code to using the _str > version immediately. If you do not do this your code will be unable to > interact with the Twitter API reliably. > * In some language parsers, the JSON may throw an exception when reading the > ID value. If this happens in your pars