http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/72ea8afd/depends/thirdparty/thrift/doc/specs/idl.md ---------------------------------------------------------------------- diff --git a/depends/thirdparty/thrift/doc/specs/idl.md b/depends/thirdparty/thrift/doc/specs/idl.md deleted file mode 100644 index 6da4696..0000000 --- a/depends/thirdparty/thrift/doc/specs/idl.md +++ /dev/null @@ -1,236 +0,0 @@ -## Thrift interface description language -The Thrift interface definition language (IDL) allows for the definition of [Thrift Types](/docs/types). A Thrift IDL file is processed by the Thrift code generator to produce code for the various target languages to support the defined structs and services in the IDL file. - -## Description - -*Under construction* - -Here is a description of the Thrift IDL. - -## Document - -Every Thrift document contains 0 or more headers followed by 0 or more definitions. - - [1] Document ::= Header* Definition* - -## Header - -A header is either a Thrift include, a C++ include, or a namespace declaration. - - [2] Header ::= Include | CppInclude | Namespace - -### Thrift Include - -An include makes all the symbols from another file visible (with a prefix) and adds corresponding include statements into the code generated for this Thrift document. - - [3] Include ::= 'include' Literal - -### C++ Include - -A C++ include adds a custom C++ include to the output of the C++ code generator for this Thrift document. - - [4] CppInclude ::= 'cpp_include' Literal - -### Namespace - -A namespace declares which namespaces/package/module/etc. the type definitions in this file will be declared in for the target languages. The namespace scope indicates which language the namespace applies to; a scope of '*' indicates that the namespace applies to all target languages. - - [5] Namespace ::= ( 'namespace' ( NamespaceScope Identifier ) | - ( 'smalltalk.category' STIdentifier ) | - ( 'smalltalk.prefix' Identifier ) ) | - ( 'php_namespace' Literal ) | - ( 'xsd_namespace' Literal ) - - [6] NamespaceScope ::= '*' | 'cpp' | 'java' | 'py' | 'perl' | 'rb' | 'cocoa' | 'csharp' - -N.B.: Smalltalk has two distinct types of namespace commands: - -- smalltalk.prefix: Prepended to generated classnames. - - Smalltalk does not have namespaces for classes, so prefixes - are used to avoid class-name collisions. - Often, the prefix is the author's initials, like "KB" or "JWS", - or an abbreviation of the package name, like "MC" for "Monticello". -- smalltalk.category: Determines the category for generated classes. - Any dots in the identifier will be replaced with hyphens when generating - the category name. - If not provided, defaults to "Generated-" + the program name. - Methods will not be categorized beyond "as yet uncategorized". - - Smalltalk allows filing both classes and methods within classes into named - groups. These named groups of methods are called categories. - -N.B.: The `php_namespace` directive will be deprecated at some point in the future in favor of the scoped syntax, but the scoped syntax is not yet supported for PHP. - -N.B.: The `xsd_namespace` directive has some purpose internal to Facebook but serves no purpose in Thrift itself. Use of this feature is strongly discouraged - -## Definition - - [7] Definition ::= Const | Typedef | Enum | Senum | Struct | Union | Exception | Service - -### Const - - [8] Const ::= 'const' FieldType Identifier '=' ConstValue ListSeparator? - -### Typedef - -A typedef creates an alternate name for a type. - - [9] Typedef ::= 'typedef' DefinitionType Identifier - -### Enum - -An enum creates an enumerated type, with named values. If no constant value is supplied, the value is either 0 for the first element, or one greater than the preceding value for any subsequent element. Any constant value that is supplied must be non-negative. - - [10] Enum ::= 'enum' Identifier '{' (Identifier ('=' IntConstant)? ListSeparator?)* '}' - -### Senum - -Senum (and Slist) are now deprecated and should both be replaced with String. - - [11] Senum ::= 'senum' Identifier '{' (Literal ListSeparator?)* '}' - -### Struct - -Structs are the fundamental compositional type in Thrift. The name of each field must be unique within the struct. - - [12] Struct ::= 'struct' Identifier 'xsd_all'? '{' Field* '}' - -N.B.: The `xsd_all` keyword has some purpose internal to Facebook but serves no purpose in Thrift itself. Use of this feature is strongly discouraged - -### Union - -Unions are similar to structs, except that they provide a means to transport exactly one field of a possible set of fields, just like union {} in C++. Consequently, union members cannot be required fields. - - [13] Union ::= 'union' Identifier 'xsd_all'? '{' Field* '}' - -N.B.: The `xsd_all` keyword has some purpose internal to Facebook but serves no purpose in Thrift itself. Use of this feature is strongly discouraged - -### Exception - -Exceptions are similar to structs except that they are intended to integrate with the native exception handling mechanisms in the target languages. The name of each field must be unique within the exception. - - [14] Exception ::= 'exception' Identifier '{' Field* '}' - -### Service - -A service provides the interface for a set of functionality provided by a Thrift server. The interface is simply a list of functions. A service can extend another service, which simply means that it provides the functions of the extended service in addition to its own. - - [15] Service ::= 'service' Identifier ( 'extends' Identifier )? '{' Function* '}' - -## Field - - [16] Field ::= FieldID? FieldReq? FieldType Identifier ('= ConstValue)? XsdFieldOptions ListSeparator? - -### Field ID - - [17] FieldID ::= IntConstant ':' - -### Field Requiredness - - [18] FieldReq ::= 'required' | 'optional' - -### XSD Options - -N.B.: These have some internal purpose at Facebook but serve no current purpose in Thrift. Use of these options is strongly discouraged. - - [19] XsdFieldOptions ::= 'xsd_optional'? 'xsd_nillable'? XsdAttrs? - - [20] XsdAttrs ::= 'xsd_attrs' '{' Field* '}' - -## Functions - - [21] Function ::= 'oneway'? FunctionType Identifier '(' Field* ')' Throws? ListSeparator? - - [22] FunctionType ::= FieldType | 'void' - - [23] Throws ::= 'throws' '(' Field* ')' - -## Types - - [24] FieldType ::= Identifier | BaseType | ContainerType - - [25] DefinitionType ::= BaseType | ContainerType - - [26] BaseType ::= 'bool' | 'byte' | 'i16' | 'i32' | 'i64' | 'double' | 'string' | 'binary' | 'slist' - - [27] ContainerType ::= MapType | SetType | ListType - - [28] MapType ::= 'map' CppType? '<' FieldType ',' FieldType '>' - - [29] SetType ::= 'set' CppType? '<' FieldType '>' - - [30] ListType ::= 'list' '<' FieldType '>' CppType? - - [31] CppType ::= 'cpp_type' Literal - -## Constant Values - - [32] ConstValue ::= IntConstant | DoubleConstant | Literal | Identifier | ConstList | ConstMap - - [33] IntConstant ::= ('+' | '-')? Digit+ - - [34] DoubleConstant ::= ('+' | '-')? Digit* ('.' Digit+)? ( ('E' | 'e') IntConstant )? - - [35] ConstList ::= '[' (ConstValue ListSeparator?)* ']' - - [36] ConstMap ::= '{' (ConstValue ':' ConstValue ListSeparator?)* '}' - -## Basic Definitions - -### Literal - - [37] Literal ::= ('"' [^"]* '"') | ("'" [^']* "'") - -### Identifier - - [38] Identifier ::= ( Letter | '_' ) ( Letter | Digit | '.' | '_' )* - - [39] STIdentifier ::= ( Letter | '_' ) ( Letter | Digit | '.' | '_' | '-' )* - -### List Separator - - [40] ListSeparator ::= ',' | ';' - -### Letters and Digits - - [41] Letter ::= ['A'-'Z'] | ['a'-'z'] - - [42] Digit ::= ['0'-'9'] - -## Examples - -Here are some examples of Thrift definitions, using the Thrift IDL: - - * [ThriftTest.thrift][] used by the Thrift TestFramework - * Thrift [tutorial][] - * Facebook's [fb303.thrift][] - * [Apache Cassandra's][] Thrift IDL: [cassandra.thrift][] - * [Evernote API][] - - [ThriftTest.thrift]: https://git-wip-us.apache.org/repos/asf?p=thrift.git;a=blob_plain;f=test/ThriftTest.thrift;hb=HEAD - [tutorial]: /tutorial/ - [fb303.thrift]: https://git-wip-us.apache.org/repos/asf?p=thrift.git;a=blob_plain;f=contrib/fb303/if/fb303.thrift;hb=HEAD - [Apache Cassandra's]: http://cassandra.apache.org/ - [cassandra.thrift]: http://svn.apache.org/viewvc/cassandra/trunk/interface/cassandra.thrift?view=co - [Evernote API]: http://www.evernote.com/about/developer/api/ - -## To Do/Questions - -Initialization of Base Types for all Languages? - - * Do all Languages initialize them to 0, bool=false and string=""? or null, undefined? - -Why does position of `CppType` vary between `SetType` and `ListType`? - - * std::set does sort the elements automatically, that's the design. see [Thrift Types](/docs/types) or the [C++ std:set reference][] for further details - * The question is, how other languages are doing that? What about custom objects, do they have a Compare function the set the order correctly? - - [C++ std:set reference]: http://www.cplusplus.com/reference/stl/set/ - -Why can't `DefinitionType` be the same as `FieldType` (i.e. include `Identifier`)? - -Examine the `smalltalk.prefix` and `smalltalk.category` status (esp `smalltalk.category`, which takes `STIdentifier` as its argument)... - -What to do about `ListSeparator`? Do we really want to be as lax as we currently are? - -Should `Field*` really be `Field+` in `Struct`, `Enum`, etc.? -
http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/72ea8afd/depends/thirdparty/thrift/doc/specs/thrift-protocol-spec.md ---------------------------------------------------------------------- diff --git a/depends/thirdparty/thrift/doc/specs/thrift-protocol-spec.md b/depends/thirdparty/thrift/doc/specs/thrift-protocol-spec.md deleted file mode 100644 index 950e163..0000000 --- a/depends/thirdparty/thrift/doc/specs/thrift-protocol-spec.md +++ /dev/null @@ -1,99 +0,0 @@ -Thrift Protocol Structure -==================================================================== - -Last Modified: 2007-Jun-29 - --------------------------------------------------------------------- - -Licensed to the Apache Software Foundation (ASF) under one -or more contributor license agreements. See the NOTICE file -distributed with this work for additional information -regarding copyright ownership. The ASF licenses this file -to you under the Apache License, Version 2.0 (the -"License"); you may not use this file except in compliance -with the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, -software distributed under the License is distributed on an -"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -KIND, either express or implied. See the License for the -specific language governing permissions and limitations -under the License. - --------------------------------------------------------------------- - -This document describes the structure of the Thrift protocol -without specifying the encoding. Thus, the order of elements -could in some cases be rearranged depending upon the TProtocol -implementation, but this document specifies the minimum required -structure. There are some "dumb" terminals like STRING and INT -that take the place of an actual encoding specification. - -They key point to notice is that ALL messages are just one wrapped -`<struct>`. Depending upon the message type, the `<struct>` can be -interpreted as the argument list to a function, the return value -of a function, or an exception. - --------------------------------------------------------------------- - -``` - <message> ::= <message-begin> <struct> <message-end> - - <message-begin> ::= <method-name> <message-type> <message-seqid> - - <method-name> ::= STRING - - <message-type> ::= T_CALL | T_REPLY | T_EXCEPTION | T_ONEWAY - - <message-seqid> ::= I32 - - <struct> ::= <struct-begin> <field>* <field-stop> <struct-end> - - <struct-begin> ::= <struct-name> - - <struct-name> ::= STRING - - <field-stop> ::= T_STOP - - <field> ::= <field-begin> <field-data> <field-end> - - <field-begin> ::= <field-name> <field-type> <field-id> - - <field-name> ::= STRING - - <field-type> ::= T_BOOL | T_BYTE | T_I8 | T_I16 | T_I32 | T_I64 | T_DOUBLE - | T_STRING | T_BINARY | T_STRUCT | T_MAP | T_SET | T_LIST - - <field-id> ::= I16 - - <field-data> ::= I8 | I16 | I32 | I64 | DOUBLE | STRING | BINARY - <struct> | <map> | <list> | <set> - - <map> ::= <map-begin> <field-datum>* <map-end> - - <map-begin> ::= <map-key-type> <map-value-type> <map-size> - - <map-key-type> ::= <field-type> - -<map-value-type> ::= <field-type> - - <map-size> ::= I32 - - <list> ::= <list-begin> <field-data>* <list-end> - - <list-begin> ::= <list-elem-type> <list-size> - -<list-elem-type> ::= <field-type> - - <list-size> ::= I32 - - <set> ::= <set-begin> <field-data>* <set-end> - - <set-begin> ::= <set-elem-type> <set-size> - - <set-elem-type> ::= <field-type> - - <set-size> ::= I32 -``` http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/72ea8afd/depends/thirdparty/thrift/doc/specs/thrift-sasl-spec.txt ---------------------------------------------------------------------- diff --git a/depends/thirdparty/thrift/doc/specs/thrift-sasl-spec.txt b/depends/thirdparty/thrift/doc/specs/thrift-sasl-spec.txt deleted file mode 100644 index 02cf79e..0000000 --- a/depends/thirdparty/thrift/doc/specs/thrift-sasl-spec.txt +++ /dev/null @@ -1,108 +0,0 @@ -A Thrift SASL message shall be a byte array of the following form: - -| 1-byte status code | 4-byte payload length | variable-length payload | - -The length fields shall be interpreted as integers, with the high byte sent -first. This indicates the length of the field immediately following it, not -including the status code or the length bytes. - -The possible status codes are: - -0x01 - START - Hello, let's go on a date. -0x02 - OK - Everything's been going alright so far, let's see each other again. -0x03 - BAD - I understand what you're saying. I really do. I just don't like it. We have to break up. -0x04 - ERROR - We can't go on like this. It's like you're speaking another language. -0x05 - COMPLETE - Will you marry me? - -The Thrift SASL communication will proceed as follows: - -1. The client is configured at instantiation of the transport with a single -underlying SASL security mechanism that it supports. - -2. The server is configured with a mapping of underlying security mechanism -name -> mechanism options. - -3. At connection time, the client will initiate communication by sending the -server a START message. The payload of this message will be the name of the -underlying security mechanism that the client would like to use. -This mechanism name shall be 1-20 characters in length, and follow the -specifications for SASL mechanism names specified in RFC 2222. - -4. The server receives this message and, if the mechanism name provided is -among the set of mechanisms this server transport is configured to accept, -appropriate initialization of the underlying security mechanism may take place. -If the mechanism name is not one which the server is configured to support, the -server shall return the BAD byte, followed by a 4-byte, potentially zero-value -message length, followed by the potentially zero-length payload which may be a -status code or message indicating failure. No further communication may take -place via this transport. If the mechanism name is one which the server -supports, then proceed to step 5. - -5. Following the START message, the client must send another message containing -the "initial response" of the chosen SASL implementation. The client may send -this message piggy-backed on the "START" message of step 3. The message type -of this message must be either "OK" or "COMPLETE", depending on whether the -SASL implementation indicates that this side of the authentication has been -satisfied. - -6. The server then provides the byte array of the payload received to its -underlying security mechanism. A challenge is generated by the underlying -security mechanism on the server, and this is used as the payload for a message -sent to the client. This message shall consist of an OK byte, followed by the -non-zero message length word, followed by the payload. - -7. The client receives this message from the server and passes the payload to -its underlying security mechanism to generate a response. The client then sends -the server an OK byte, followed by the non-zero-value length of the response, -followed by the bytes of the response as the payload. - -8. Steps 6 and 7 are repeated until both security mechanisms are satisfied with -the challenge/response exchange. When either side has completed its security -protocol, its next message shall be the COMPLETE byte, followed by a 4-byte -potentially zero-value length word, followed by a potentially zero-length -payload. This payload will be empty except for those underlying security -mechanisms which provide additional data with success. - -If at any point in time either side is able to interpret the challenge or -response sent by the other, but is dissatisfied with the contents thereof, this -side should send the other a BAD byte, followed by a 4-byte potentially -zero-value length word, followed by an optional, potentially zero-length -message encoded in UTF-8 indicating failure. This message should be passed to -the protocol above the thrift transport by whatever mechanism is appropriate -and idiomatic for the particular language these thrift bindings are for. - -If at any point in time either side fails to interpret the challenge or -response sent by the other, this side should send the other an ERROR byte, -followed by a 4-byte potentially zero-value length word, followed by an -optional, potentially zero-length message encoded in UTF-8. This message should -be passed to the protocol above the thrift transport by whatever mechanism is -appropriate and idiomatic for the particular language these thrift bindings are -for. - -If step 8 completes successfully, then the communication is considered -authenticated and subsequent communication may commence. - -If step 8 fails to complete successfully, then no further communication may -take place via this transport. - -8. All writes to the underlying transport must be prefixed by the 4-byte length -of the payload data, followed by the payload. All reads from this transport -should read the 4-byte length word, then read the full quantity of bytes -specified by this length word. - -If no SASL QOP (quality of protection) is negotiated during steps 6 and 7, then -all subsequent writes to/reads from this transport are written/read unaltered, -save for the length prefix, to the underlying transport. - -If a SASL QOP is negotiated, then this must be used by the Thrift transport for -all subsequent communication. This is done by wrapping subsequent writes to the -transport using the underlying security mechanism, and unwrapping subsequent -reads from the underlying transport. Note that in this case, the length prefix -of the write to the underlying transport is the length of the data after it has -been wrapped by the underlying security mechanism. Note that the complete -message must be read before giving this data to the underlying security -mechanism for unwrapping. - -If at any point in time reading of a message fails either because of a -malformed length word or failure to unwrap by the underlying security -mechanism, then all further communication on this transport must cease. http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/72ea8afd/depends/thirdparty/thrift/doc/specs/thrift.tex ---------------------------------------------------------------------- diff --git a/depends/thirdparty/thrift/doc/specs/thrift.tex b/depends/thirdparty/thrift/doc/specs/thrift.tex deleted file mode 100644 index a706fcb..0000000 --- a/depends/thirdparty/thrift/doc/specs/thrift.tex +++ /dev/null @@ -1,1057 +0,0 @@ -%----------------------------------------------------------------------------- -% -% Thrift whitepaper -% -% Name: thrift.tex -% -% Authors: Mark Slee (mcs...@facebook.com) -% -% Created: 05 March 2007 -% -% You will need a copy of sigplanconf.cls to format this document. -% It is available at <http://www.sigplan.org/authorInformation.htm>. -% -%----------------------------------------------------------------------------- - - -\documentclass[nocopyrightspace,blockstyle]{sigplanconf} - -\usepackage{amssymb} -\usepackage{amsfonts} -\usepackage{amsmath} -\usepackage{url} - -\begin{document} - -% \conferenceinfo{WXYZ '05}{date, City.} -% \copyrightyear{2007} -% \copyrightdata{[to be supplied]} - -% \titlebanner{banner above paper title} % These are ignored unless -% \preprintfooter{short description of paper} % 'preprint' option specified. - -\title{Thrift: Scalable Cross-Language Services Implementation} -\subtitle{} - -\authorinfo{Mark Slee, Aditya Agarwal and Marc Kwiatkowski} - {Facebook, 156 University Ave, Palo Alto, CA} - {\{mcslee,aditya,marc\}@facebook.com} - -\maketitle - -\begin{abstract} -Thrift is a software library and set of code-generation tools developed at -Facebook to expedite development and implementation of efficient and scalable -backend services. Its primary goal is to enable efficient and reliable -communication across programming languages by abstracting the portions of each -language that tend to require the most customization into a common library -that is implemented in each language. Specifically, Thrift allows developers to -define datatypes and service interfaces in a single language-neutral file -and generate all the necessary code to build RPC clients and servers. - -This paper details the motivations and design choices we made in Thrift, as -well as some of the more interesting implementation details. It is not -intended to be taken as research, but rather it is an exposition on what we did -and why. -\end{abstract} - -% \category{D.3.3}{Programming Languages}{Language constructs and features} - -%\terms -%Languages, serialization, remote procedure call - -%\keywords -%Data description language, interface definition language, remote procedure call - -\section{Introduction} -As Facebook's traffic and network structure have scaled, the resource -demands of many operations on the site (i.e. search, -ad selection and delivery, event logging) have presented technical requirements -drastically outside the scope of the LAMP framework. In our implementation of -these services, various programming languages have been selected to -optimize for the right combination of performance, ease and speed of -development, availability of existing libraries, etc. By and large, -Facebook's engineering culture has tended towards choosing the best -tools and implementations available over standardizing on any one -programming language and begrudgingly accepting its inherent limitations. - -Given this design choice, we were presented with the challenge of building -a transparent, high-performance bridge across many programming languages. -We found that most available solutions were either too limited, did not offer -sufficient datatype freedom, or suffered from subpar performance. -\footnote{See Appendix A for a discussion of alternative systems.} - -The solution that we have implemented combines a language-neutral software -stack implemented across numerous programming languages and an associated code -generation engine that transforms a simple interface and data definition -language into client and server remote procedure call libraries. -Choosing static code generation over a dynamic system allows us to create -validated code that can be run without the need for -any advanced introspective run-time type checking. It is also designed to -be as simple as possible for the developer, who can typically define all -the necessary data structures and interfaces for a complex service in a single -short file. - -Surprised that a robust open solution to these relatively common problems -did not yet exist, we committed early on to making the Thrift implementation -open source. - -In evaluating the challenges of cross-language interaction in a networked -environment, some key components were identified: - -\textit{Types.} A common type system must exist across programming languages -without requiring that the application developer use custom Thrift datatypes -or write their own serialization code. That is, -a C++ programmer should be able to transparently exchange a strongly typed -STL map for a dynamic Python dictionary. Neither -programmer should be forced to write any code below the application layer -to achieve this. Section 2 details the Thrift type system. - -\textit{Transport.} Each language must have a common interface to -bidirectional raw data transport. The specifics of how a given -transport is implemented should not matter to the service developer. -The same application code should be able to run against TCP stream sockets, -raw data in memory, or files on disk. Section 3 details the Thrift Transport -layer. - -\textit{Protocol.} Datatypes must have some way of using the Transport -layer to encode and decode themselves. Again, the application -developer need not be concerned by this layer. Whether the service uses -an XML or binary protocol is immaterial to the application code. -All that matters is that the data can be read and written in a consistent, -deterministic matter. Section 4 details the Thrift Protocol layer. - -\textit{Versioning.} For robust services, the involved datatypes must -provide a mechanism for versioning themselves. Specifically, -it should be possible to add or remove fields in an object or alter the -argument list of a function without any interruption in service (or, -worse yet, nasty segmentation faults). Section 5 details Thrift's versioning -system. - -\textit{Processors.} Finally, we generate code capable of processing data -streams to accomplish remote procedure calls. Section 6 details the generated -code and TProcessor paradigm. - -Section 7 discusses implementation details, and Section 8 describes -our conclusions. - -\section{Types} - -The goal of the Thrift type system is to enable programmers to develop using -completely natively defined types, no matter what programming language they -use. By design, the Thrift type system does not introduce any special dynamic -types or wrapper objects. It also does not require that the developer write -any code for object serialization or transport. The Thrift IDL (Interface -Definition Language) file is -logically a way for developers to annotate their data structures with the -minimal amount of extra information necessary to tell a code generator -how to safely transport the objects across languages. - -\subsection{Base Types} - -The type system rests upon a few base types. In considering which types to -support, we aimed for clarity and simplicity over abundance, focusing -on the key types available in all programming languages, omitting any -niche types available only in specific languages. - -The base types supported by Thrift are: -\begin{itemize} -\item \texttt{bool} A boolean value, true or false -\item \texttt{byte} A signed byte -\item \texttt{i16} A 16-bit signed integer -\item \texttt{i32} A 32-bit signed integer -\item \texttt{i64} A 64-bit signed integer -\item \texttt{double} A 64-bit floating point number -\item \texttt{string} An encoding-agnostic text or binary string -\item \texttt{binary} A byte array representation for blobs -\end{itemize} - -Of particular note is the absence of unsigned integer types. Because these -types have no direct translation to native primitive types in many languages, -the advantages they afford are lost. Further, there is no way to prevent the -application developer in a language like Python from assigning a negative value -to an integer variable, leading to unpredictable behavior. From a design -standpoint, we observed that unsigned integers were very rarely, if ever, used -for arithmetic purposes, but in practice were much more often used as keys or -identifiers. In this case, the sign is irrelevant. Signed integers serve this -same purpose and can be safely cast to their unsigned counterparts (most -commonly in C++) when absolutely necessary. - -\subsection{Structs} - -A Thrift struct defines a common object to be used across languages. A struct -is essentially equivalent to a class in object oriented programming -languages. A struct has a set of strongly typed fields, each with a unique -name identifier. The basic syntax for defining a Thrift struct looks very -similar to a C struct definition. Fields may be annotated with an integer field -identifier (unique to the scope of that struct) and optional default values. -Field identifiers will be automatically assigned if omitted, though they are -strongly encouraged for versioning reasons discussed later. - -\subsection{Containers} - -Thrift containers are strongly typed containers that map to the most commonly -used containers in common programming languages. They are annotated using -the C++ template (or Java Generics) style. There are three types available: -\begin{itemize} -\item \texttt{list<type>} An ordered list of elements. Translates directly into -an STL \texttt{vector}, Java \texttt{ArrayList}, or native array in scripting languages. May -contain duplicates. -\item \texttt{set<type>} An unordered set of unique elements. Translates into -an STL \texttt{set}, Java \texttt{HashSet}, \texttt{set} in Python, or native -dictionary in PHP/Ruby. -\item \texttt{map<type1,type2>} A map of strictly unique keys to values -Translates into an STL \texttt{map}, Java \texttt{HashMap}, PHP associative -array, or Python/Ruby dictionary. -\end{itemize} - -While defaults are provided, the type mappings are not explicitly fixed. Custom -code generator directives have been added to substitute custom types in -destination languages (i.e. -\texttt{hash\_map} or Google's sparse hash map can be used in C++). The -only requirement is that the custom types support all the necessary iteration -primitives. Container elements may be of any valid Thrift type, including other -containers or structs. - -\begin{verbatim} -struct Example { - 1:i32 number=10, - 2:i64 bigNumber, - 3:double decimals, - 4:string name="thrifty" -}\end{verbatim} - -In the target language, each definition generates a type with two methods, -\texttt{read} and \texttt{write}, which perform serialization and transport -of the objects using a Thrift TProtocol object. - -\subsection{Exceptions} - -Exceptions are syntactically and functionally equivalent to structs except -that they are declared using the \texttt{exception} keyword instead of the -\texttt{struct} keyword. - -The generated objects inherit from an exception base class as appropriate -in each target programming language, in order to seamlessly -integrate with native exception handling in any given -language. Again, the design emphasis is on making the code familiar to the -application developer. - -\subsection{Services} - -Services are defined using Thrift types. Definition of a service is -semantically equivalent to defining an interface (or a pure virtual abstract -class) in object oriented -programming. The Thrift compiler generates fully functional client and -server stubs that implement the interface. Services are defined as follows: - -\begin{verbatim} -service <name> { - <returntype> <name>(<arguments>) - [throws (<exceptions>)] - ... -}\end{verbatim} - -An example: - -\begin{verbatim} -service StringCache { - void set(1:i32 key, 2:string value), - string get(1:i32 key) throws (1:KeyNotFound knf), - void delete(1:i32 key) -} -\end{verbatim} - -Note that \texttt{void} is a valid type for a function return, in addition to -all other defined Thrift types. Additionally, an \texttt{async} modifier -keyword may be added to a \texttt{void} function, which will generate code that does -not wait for a response from the server. Note that a pure \texttt{void} -function will return a response to the client which guarantees that the -operation has completed on the server side. With \texttt{async} method calls -the client will only be guaranteed that the request succeeded at the -transport layer. (In many transport scenarios this is inherently unreliable -due to the Byzantine Generals' Problem. Therefore, application developers -should take care only to use the async optimization in cases where dropped -method calls are acceptable or the transport is known to be reliable.) - -Also of note is the fact that argument lists and exception lists for functions -are implemented as Thrift structs. All three constructs are identical in both -notation and behavior. - -\section{Transport} - -The transport layer is used by the generated code to facilitate data transfer. - -\subsection{Interface} - -A key design choice in the implementation of Thrift was to decouple the -transport layer from the code generation layer. Though Thrift is typically -used on top of the TCP/IP stack with streaming sockets as the base layer of -communication, there was no compelling reason to build that constraint into -the system. The performance tradeoff incurred by an abstracted I/O layer -(roughly one virtual method lookup / function call per operation) was -immaterial compared to the cost of actual I/O operations (typically invoking -system calls). - -Fundamentally, generated Thrift code only needs to know how to read and -write data. The origin and destination of the data are irrelevant; it may be a -socket, a segment of shared memory, or a file on the local disk. The Thrift -transport interface supports the following methods: - -\begin{itemize} -\item \texttt{open} Opens the transport -\item \texttt{close} Closes the transport -\item \texttt{isOpen} Indicates whether the transport is open -\item \texttt{read} Reads from the transport -\item \texttt{write} Writes to the transport -\item \texttt{flush} Forces any pending writes -\end{itemize} - -There are a few additional methods not documented here which are used to aid -in batching reads and optionally signaling the completion of a read or -write operation from the generated code. - -In addition to the above -\texttt{TTransport} interface, there is a\\ -\texttt{TServerTransport} interface -used to accept or create primitive transport objects. Its interface is as -follows: - -\begin{itemize} -\item \texttt{open} Opens the transport -\item \texttt{listen} Begins listening for connections -\item \texttt{accept} Returns a new client transport -\item \texttt{close} Closes the transport -\end{itemize} - -\subsection{Implementation} - -The transport interface is designed for simple implementation in any -programming language. New transport mechanisms can be easily defined as needed -by application developers. - -\subsubsection{TSocket} - -The \texttt{TSocket} class is implemented across all target languages. It -provides a common, simple interface to a TCP/IP stream socket. - -\subsubsection{TFileTransport} - -The \texttt{TFileTransport} is an abstraction of an on-disk file to a data -stream. It can be used to write out a set of incoming Thrift requests to a file -on disk. The on-disk data can then be replayed from the log, either for -post-processing or for reproduction and/or simulation of past events. - -\subsubsection{Utilities} - -The Transport interface is designed to support easy extension using common -OOP techniques, such as composition. Some simple utilities include the -\texttt{TBufferedTransport}, which buffers the writes and reads on an -underlying transport, the \texttt{TFramedTransport}, which transmits data with frame -size headers for chunking optimization or nonblocking operation, and the -\texttt{TMemoryBuffer}, which allows reading and writing directly from the heap -or stack memory owned by the process. - -\section{Protocol} - -A second major abstraction in Thrift is the separation of data structure from -transport representation. Thrift enforces a certain messaging structure when -transporting data, but it is agnostic to the protocol encoding in use. That is, -it does not matter whether data is encoded as XML, human-readable ASCII, or a -dense binary format as long as the data supports a fixed set of operations -that allow it to be deterministically read and written by generated code. - -\subsection{Interface} - -The Thrift Protocol interface is very straightforward. It fundamentally -supports two things: 1) bidirectional sequenced messaging, and -2) encoding of base types, containers, and structs. - -\begin{verbatim} -writeMessageBegin(name, type, seq) -writeMessageEnd() -writeStructBegin(name) -writeStructEnd() -writeFieldBegin(name, type, id) -writeFieldEnd() -writeFieldStop() -writeMapBegin(ktype, vtype, size) -writeMapEnd() -writeListBegin(etype, size) -writeListEnd() -writeSetBegin(etype, size) -writeSetEnd() -writeBool(bool) -writeByte(byte) -writeI16(i16) -writeI32(i32) -writeI64(i64) -writeDouble(double) -writeString(string) - -name, type, seq = readMessageBegin() - readMessageEnd() -name = readStructBegin() - readStructEnd() -name, type, id = readFieldBegin() - readFieldEnd() -k, v, size = readMapBegin() - readMapEnd() -etype, size = readListBegin() - readListEnd() -etype, size = readSetBegin() - readSetEnd() -bool = readBool() -byte = readByte() -i16 = readI16() -i32 = readI32() -i64 = readI64() -double = readDouble() -string = readString() -\end{verbatim} - -Note that every \texttt{write} function has exactly one \texttt{read} counterpart, with -the exception of \texttt{writeFieldStop()}. This is a special method -that signals the end of a struct. The procedure for reading a struct is to -\texttt{readFieldBegin()} until the stop field is encountered, and then to -\texttt{readStructEnd()}. The -generated code relies upon this call sequence to ensure that everything written by -a protocol encoder can be read by a matching protocol decoder. Further note -that this set of functions is by design more robust than necessary. -For example, \texttt{writeStructEnd()} is not strictly necessary, as the end of -a struct may be implied by the stop field. This method is a convenience for -verbose protocols in which it is cleaner to separate these calls (e.g. a closing -\texttt{</struct>} tag in XML). - -\subsection{Structure} - -Thrift structures are designed to support encoding into a streaming -protocol. The implementation should never need to frame or compute the -entire data length of a structure prior to encoding it. This is critical to -performance in many scenarios. Consider a long list of relatively large -strings. If the protocol interface required reading or writing a list to be an -atomic operation, then the implementation would need to perform a linear pass over the -entire list before encoding any data. However, if the list can be written -as iteration is performed, the corresponding read may begin in parallel, -theoretically offering an end-to-end speedup of $(kN - C)$, where $N$ is the size -of the list, $k$ the cost factor associated with serializing a single -element, and $C$ is fixed offset for the delay between data being written -and becoming available to read. - -Similarly, structs do not encode their data lengths a priori. Instead, they are -encoded as a sequence of fields, with each field having a type specifier and a -unique field identifier. Note that the inclusion of type specifiers allows -the protocol to be safely parsed and decoded without any generated code -or access to the original IDL file. Structs are terminated by a field header -with a special \texttt{STOP} type. Because all the basic types can be read -deterministically, all structs (even those containing other structs) can be -read deterministically. The Thrift protocol is self-delimiting without any -framing and regardless of the encoding format. - -In situations where streaming is unnecessary or framing is advantageous, it -can be very simply added into the transport layer, using the -\texttt{TFramedTransport} abstraction. - -\subsection{Implementation} - -Facebook has implemented and deployed a space-efficient binary protocol which -is used by most backend services. Essentially, it writes all data -in a flat binary format. Integer types are converted to network byte order, -strings are prepended with their byte length, and all message and field headers -are written using the primitive integer serialization constructs. String names -for fields are omitted - when using generated code, field identifiers are -sufficient. - -We decided against some extreme storage optimizations (i.e. packing -small integers into ASCII or using a 7-bit continuation format) for the sake -of simplicity and clarity in the code. These alterations can easily be made -if and when we encounter a performance-critical use case that demands them. - -\section{Versioning} - -Thrift is robust in the face of versioning and data definition changes. This -is critical to enable staged rollouts of changes to deployed services. The -system must be able to support reading of old data from log files, as well as -requests from out-of-date clients to new servers, and vice versa. - -\subsection{Field Identifiers} - -Versioning in Thrift is implemented via field identifiers. The field header -for every member of a struct in Thrift is encoded with a unique field -identifier. The combination of this field identifier and its type specifier -is used to uniquely identify the field. The Thrift definition language -supports automatic assignment of field identifiers, but it is good -programming practice to always explicitly specify field identifiers. -Identifiers are specified as follows: - -\begin{verbatim} -struct Example { - 1:i32 number=10, - 2:i64 bigNumber, - 3:double decimals, - 4:string name="thrifty" -}\end{verbatim} - -To avoid conflicts between manually and automatically assigned identifiers, -fields with identifiers omitted are assigned identifiers -decrementing from -1, and the language only supports the manual assignment of -positive identifiers. - -When data is being deserialized, the generated code can use these identifiers -to properly identify the field and determine whether it aligns with a field in -its definition file. If a field identifier is not recognized, the generated -code can use the type specifier to skip the unknown field without any error. -Again, this is possible due to the fact that all datatypes are self -delimiting. - -Field identifiers can (and should) also be specified in function argument -lists. In fact, argument lists are not only represented as structs on the -backend, but actually share the same code in the compiler frontend. This -allows for version-safe modification of method parameters - -\begin{verbatim} -service StringCache { - void set(1:i32 key, 2:string value), - string get(1:i32 key) throws (1:KeyNotFound knf), - void delete(1:i32 key) -} -\end{verbatim} - -The syntax for specifying field identifiers was chosen to echo their structure. -Structs can be thought of as a dictionary where the identifiers are keys, and -the values are strongly-typed named fields. - -Field identifiers internally use the \texttt{i16} Thrift type. Note, however, -that the \texttt{TProtocol} abstraction may encode identifiers in any format. - -\subsection{Isset} - -When an unexpected field is encountered, it can be safely ignored and -discarded. When an expected field is not found, there must be some way to -signal to the developer that it was not present. This is implemented via an -inner \texttt{isset} structure inside the defined objects. (Isset functionality -is implicit with a \texttt{null} value in PHP, \texttt{None} in Python -and \texttt{nil} in Ruby.) Essentially, -the inner \texttt{isset} object of each Thrift struct contains a boolean value -for each field which denotes whether or not that field is present in the -struct. When a reader receives a struct, it should check for a field being set -before operating directly on it. - -\begin{verbatim} -class Example { - public: - Example() : - number(10), - bigNumber(0), - decimals(0), - name("thrifty") {} - - int32_t number; - int64_t bigNumber; - double decimals; - std::string name; - - struct __isset { - __isset() : - number(false), - bigNumber(false), - decimals(false), - name(false) {} - bool number; - bool bigNumber; - bool decimals; - bool name; - } __isset; -... -} -\end{verbatim} - -\subsection{Case Analysis} - -There are four cases in which version mismatches may occur. - -\begin{enumerate} -\item \textit{Added field, old client, new server.} In this case, the old -client does not send the new field. The new server recognizes that the field -is not set, and implements default behavior for out-of-date requests. -\item \textit{Removed field, old client, new server.} In this case, the old -client sends the removed field. The new server simply ignores it. -\item \textit{Added field, new client, old server.} The new client sends a -field that the old server does not recognize. The old server simply ignores -it and processes as normal. -\item \textit{Removed field, new client, old server.} This is the most -dangerous case, as the old server is unlikely to have suitable default -behavior implemented for the missing field. It is recommended that in this -situation the new server be rolled out prior to the new clients. -\end{enumerate} - -\subsection{Protocol/Transport Versioning} -The \texttt{TProtocol} abstractions are also designed to give protocol -implementations the freedom to version themselves in whatever manner they -see fit. Specifically, any protocol implementation is free to send whatever -it likes in the \texttt{writeMessageBegin()} call. It is entirely up to the -implementor how to handle versioning at the protocol level. The key point is -that protocol encoding changes are safely isolated from interface definition -version changes. - -Note that the exact same is true of the \texttt{TTransport} interface. For -example, if we wished to add some new checksumming or error detection to the -\texttt{TFileTransport}, we could simply add a version header into the -data it writes to the file in such a way that it would still accept old -log files without the given header. - -\section{RPC Implementation} - -\subsection{TProcessor} - -The last core interface in the Thrift design is the \texttt{TProcessor}, -perhaps the most simple of the constructs. The interface is as follows: - -\begin{verbatim} -interface TProcessor { - bool process(TProtocol in, TProtocol out) - throws TException -} -\end{verbatim} - -The key design idea here is that the complex systems we build can fundamentally -be broken down into agents or services that operate on inputs and outputs. In -most cases, there is actually just one input and output (an RPC client) that -needs handling. - -\subsection{Generated Code} - -When a service is defined, we generate a -\texttt{TProcessor} instance capable of handling RPC requests to that service, -using a few helpers. The fundamental structure (illustrated in pseudo-C++) is -as follows: - -\begin{verbatim} -Service.thrift - => Service.cpp - interface ServiceIf - class ServiceClient : virtual ServiceIf - TProtocol in - TProtocol out - class ServiceProcessor : TProcessor - ServiceIf handler - -ServiceHandler.cpp - class ServiceHandler : virtual ServiceIf - -TServer.cpp - TServer(TProcessor processor, - TServerTransport transport, - TTransportFactory tfactory, - TProtocolFactory pfactory) - serve() -\end{verbatim} - -From the Thrift definition file, we generate the virtual service interface. -A client class is generated, which implements the interface and -uses two \texttt{TProtocol} instances to perform the I/O operations. The -generated processor implements the \texttt{TProcessor} interface. The generated -code has all the logic to handle RPC invocations via the \texttt{process()} -call, and takes as a parameter an instance of the service interface, as -implemented by the application developer. - -The user provides an implementation of the application interface in separate, -non-generated source code. - -\subsection{TServer} - -Finally, the Thrift core libraries provide a \texttt{TServer} abstraction. -The \texttt{TServer} object generally works as follows. - -\begin{itemize} -\item Use the \texttt{TServerTransport} to get a \texttt{TTransport} -\item Use the \texttt{TTransportFactory} to optionally convert the primitive -transport into a suitable application transport (typically the -\texttt{TBufferedTransportFactory} is used here) -\item Use the \texttt{TProtocolFactory} to create an input and output protocol -for the \texttt{TTransport} -\item Invoke the \texttt{process()} method of the \texttt{TProcessor} object -\end{itemize} - -The layers are appropriately separated such that the server code needs to know -nothing about any of the transports, encodings, or applications in play. The -server encapsulates the logic around connection handling, threading, etc. -while the processor deals with RPC. The only code written by the application -developer lives in the definitional Thrift file and the interface -implementation. - -Facebook has deployed multiple \texttt{TServer} implementations, including -the single-threaded \texttt{TSimpleServer}, thread-per-connection -\texttt{TThreadedServer}, and thread-pooling \texttt{TThreadPoolServer}. - -The \texttt{TProcessor} interface is very general by design. There is no -requirement that a \texttt{TServer} take a generated \texttt{TProcessor} -object. Thrift allows the application developer to easily write any type of -server that operates on \texttt{TProtocol} objects (for instance, a server -could simply stream a certain type of object without any actual RPC method -invocation). - -\section{Implementation Details} -\subsection{Target Languages} -Thrift currently supports five target languages: C++, Java, Python, Ruby, and -PHP. At Facebook, we have deployed servers predominantly in C++, Java, and -Python. Thrift services implemented in PHP have also been embedded into the -Apache web server, providing transparent backend access to many of our -frontend constructs using a \texttt{THttpClient} implementation of the -\texttt{TTransport} interface. - -Though Thrift was explicitly designed to be much more efficient and robust -than typical web technologies, as we were designing our XML-based REST web -services API we noticed that Thrift could be easily used to define our -service interface. Though we do not currently employ SOAP envelopes (in the -authors' opinions there is already far too much repetitive enterprise Java -software to do that sort of thing), we were able to quickly extend Thrift to -generate XML Schema Definition files for our service, as well as a framework -for versioning different implementations of our web service. Though public -web services are admittedly tangential to Thrift's core use case and design, -Thrift facilitated rapid iteration and affords us the ability to quickly -migrate our entire XML-based web service onto a higher performance system -should the need arise. - -\subsection{Generated Structs} -We made a conscious decision to make our generated structs as transparent as -possible. All fields are publicly accessible; there are no \texttt{set()} and -\texttt{get()} methods. Similarly, use of the \texttt{isset} object is not -enforced. We do not include any \texttt{FieldNotSetException} construct. -Developers have the option to use these fields to write more robust code, but -the system is robust to the developer ignoring the \texttt{isset} construct -entirely and will provide suitable default behavior in all cases. - -This choice was motivated by the desire to ease application development. Our stated -goal is not to make developers learn a rich new library in their language of -choice, but rather to generate code that allow them to work with the constructs -that are most familiar in each language. - -We also made the \texttt{read()} and \texttt{write()} methods of the generated -objects public so that the objects can be used outside of the context -of RPC clients and servers. Thrift is a useful tool simply for generating -objects that are easily serializable across programming languages. - -\subsection{RPC Method Identification} -Method calls in RPC are implemented by sending the method name as a string. One -issue with this approach is that longer method names require more bandwidth. -We experimented with using fixed-size hashes to identify methods, but in the -end concluded that the savings were not worth the headaches incurred. Reliably -dealing with conflicts across versions of an interface definition file is -impossible without a meta-storage system (i.e. to generate non-conflicting -hashes for the current version of a file, we would have to know about all -conflicts that ever existed in any previous version of the file). - -We wanted to avoid too many unnecessary string comparisons upon -method invocation. To deal with this, we generate maps from strings to function -pointers, so that invocation is effectively accomplished via a constant-time -hash lookup in the common case. This requires the use of a couple interesting -code constructs. Because Java does not have function pointers, process -functions are all private member classes implementing a common interface. - -\begin{verbatim} -private class ping implements ProcessFunction { - public void process(int seqid, - TProtocol iprot, - TProtocol oprot) - throws TException - { ...} -} - -HashMap<String,ProcessFunction> processMap_ = - new HashMap<String,ProcessFunction>(); -\end{verbatim} - -In C++, we use a relatively esoteric language construct: member function -pointers. - -\begin{verbatim} -std::map<std::string, - void (ExampleServiceProcessor::*)(int32_t, - facebook::thrift::protocol::TProtocol*, - facebook::thrift::protocol::TProtocol*)> - processMap_; -\end{verbatim} - -Using these techniques, the cost of string processing is minimized, and we -reap the benefit of being able to easily debug corrupt or misunderstood data by -inspecting it for known string method names. - -\subsection{Servers and Multithreading} -Thrift services require basic multithreading to handle simultaneous -requests from multiple clients. For the Python and Java implementations of -Thrift server logic, the standard threading libraries distributed with the -languages provide adequate support. For the C++ implementation, no standard multithread runtime -library exists. Specifically, robust, lightweight, and portable -thread manager and timer class implementations do not exist. We investigated -existing implementations, namely \texttt{boost::thread}, -\texttt{boost::threadpool}, \texttt{ACE\_Thread\_Manager} and -\texttt{ACE\_Timer}. - -While \texttt{boost::threads}\cite{boost.threads} provides clean, -lightweight and robust implementations of multi-thread primitives (mutexes, -conditions, threads) it does not provide a thread manager or timer -implementation. - -\texttt{boost::threadpool}\cite{boost.threadpool} also looked promising but -was not far enough along for our purposes. We wanted to limit the dependency on -third-party libraries as much as possible. Because\\ -\texttt{boost::threadpool} is -not a pure template library and requires runtime libraries and because it is -not yet part of the official Boost distribution we felt it was not ready for -use in Thrift. As \texttt{boost::threadpool} evolves and especially if it is -added to the Boost distribution we may reconsider our decision to not use it. - -ACE has both a thread manager and timer class in addition to multi-thread -primitives. The biggest problem with ACE is that it is ACE. Unlike Boost, ACE -API quality is poor. Everything in ACE has large numbers of dependencies on -everything else in ACE - thus forcing developers to throw out standard -classes, such as STL collections, in favor of ACE's homebrewed implementations. In -addition, unlike Boost, ACE implementations demonstrate little understanding -of the power and pitfalls of C++ programming and take no advantage of modern -templating techniques to ensure compile time safety and reasonable compiler -error messages. For all these reasons, ACE was rejected. Instead, we chose -to implement our own library, described in the following sections. - -\subsection{Thread Primitives} - -The Thrift thread libraries are implemented in the namespace\\ -\texttt{facebook::thrift::concurrency} and have three components: -\begin{itemize} -\item primitives -\item thread pool manager -\item timer manager -\end{itemize} - -As mentioned above, we were hesitant to introduce any additional dependencies -on Thrift. We decided to use \texttt{boost::shared\_ptr} because it is so -useful for multithreaded application, it requires no link-time or -runtime libraries (i.e. it is a pure template library) and it is due -to become part of the C++0x standard. - -We implement standard \texttt{Mutex} and \texttt{Condition} classes, and a - \texttt{Monitor} class. The latter is simply a combination of a mutex and -condition variable and is analogous to the \texttt{Monitor} implementation provided for -the Java \texttt{Object} class. This is also sometimes referred to as a barrier. We -provide a \texttt{Synchronized} guard class to allow Java-like synchronized blocks. -This is just a bit of syntactic sugar, but, like its Java counterpart, clearly -delimits critical sections of code. Unlike its Java counterpart, we still -have the ability to programmatically lock, unlock, block, and signal monitors. - -\begin{verbatim} -void run() { - {Synchronized s(manager->monitor); - if (manager->state == TimerManager::STARTING) { - manager->state = TimerManager::STARTED; - manager->monitor.notifyAll(); - } - } -} -\end{verbatim} - -We again borrowed from Java the distinction between a thread and a runnable -class. A \texttt{Thread} is the actual schedulable object. The -\texttt{Runnable} is the logic to execute within the thread. -The \texttt{Thread} implementation deals with all the platform-specific thread -creation and destruction issues, while the \texttt{Runnable} implementation deals -with the application-specific per-thread logic. The benefit of this approach -is that developers can easily subclass the Runnable class without pulling in -platform-specific super-classes. - -\subsection{Thread, Runnable, and shared\_ptr} -We use \texttt{boost::shared\_ptr} throughout the \texttt{ThreadManager} and -\texttt{TimerManager} implementations to guarantee cleanup of dead objects that can -be accessed by multiple threads. For \texttt{Thread} class implementations, -\texttt{boost::shared\_ptr} usage requires particular attention to make sure -\texttt{Thread} objects are neither leaked nor dereferenced prematurely while -creating and shutting down threads. - -Thread creation requires calling into a C library. (In our case the POSIX -thread library, \texttt{libpthread}, but the same would be true for WIN32 threads). -Typically, the OS makes few, if any, guarantees about when \texttt{ThreadMain}, a C thread's entry-point function, will be called. Therefore, it is -possible that our thread create call, -\texttt{ThreadFactory::newThread()} could return to the caller -well before that time. To ensure that the returned \texttt{Thread} object is not -prematurely cleaned up if the caller gives up its reference prior to the -\texttt{ThreadMain} call, the \texttt{Thread} object makes a weak reference to -itself in its \texttt{start} method. - -With the weak reference in hand the \texttt{ThreadMain} function can attempt to get -a strong reference before entering the \texttt{Runnable::run} method of the -\texttt{Runnable} object bound to the \texttt{Thread}. If no strong references to the -thread are obtained between exiting \texttt{Thread::start} and entering \texttt{ThreadMain}, the weak reference returns \texttt{null} and the function -exits immediately. - -The need for the \texttt{Thread} to make a weak reference to itself has a -significant impact on the API. Since references are managed through the -\texttt{boost::shared\_ptr} templates, the \texttt{Thread} object must have a reference -to itself wrapped by the same \texttt{boost::shared\_ptr} envelope that is returned -to the caller. This necessitated the use of the factory pattern. -\texttt{ThreadFactory} creates the raw \texttt{Thread} object and a -\texttt{boost::shared\_ptr} wrapper, and calls a private helper method of the class -implementing the \texttt{Thread} interface (in this case, \texttt{PosixThread::weakRef}) - to allow it to make add weak reference to itself through the - \texttt{boost::shared\_ptr} envelope. - -\texttt{Thread} and \texttt{Runnable} objects reference each other. A \texttt{Runnable} -object may need to know about the thread in which it is executing, and a Thread, obviously, -needs to know what \texttt{Runnable} object it is hosting. This interdependency is -further complicated because the lifecycle of each object is independent of the -other. An application may create a set of \texttt{Runnable} object to be reused in different threads, or it may create and forget a \texttt{Runnable} object -once a thread has been created and started for it. - -The \texttt{Thread} class takes a \texttt{boost::shared\_ptr} reference to the hosted -\texttt{Runnable} object in its constructor, while the \texttt{Runnable} class has an -explicit \texttt{thread} method to allow explicit binding of the hosted thread. -\texttt{ThreadFactory::newThread} binds the objects to each other. - -\subsection{ThreadManager} - -\texttt{ThreadManager} creates a pool of worker threads and -allows applications to schedule tasks for execution as free worker threads -become available. The \texttt{ThreadManager} does not implement dynamic -thread pool resizing, but provides primitives so that applications can add -and remove threads based on load. This approach was chosen because -implementing load metrics and thread pool size is very application -specific. For example some applications may want to adjust pool size based -on running-average of work arrival rates that are measured via polled -samples. Others may simply wish to react immediately to work-queue -depth high and low water marks. Rather than trying to create a complex -API abstract enough to capture these different approaches, we -simply leave it up to the particular application and provide the -primitives to enact the desired policy and sample current status. - -\subsection{TimerManager} - -\texttt{TimerManager} allows applications to schedule - \texttt{Runnable} objects for execution at some point in the future. Its specific task -is to allows applications to sample \texttt{ThreadManager} load at regular -intervals and make changes to the thread pool size based on application policy. -Of course, it can be used to generate any number of timer or alarm events. - -The default implementation of \texttt{TimerManager} uses a single thread to -execute expired \texttt{Runnable} objects. Thus, if a timer operation needs to -do a large amount of work and especially if it needs to do blocking I/O, -that should be done in a separate thread. - -\subsection{Nonblocking Operation} -Though the Thrift transport interfaces map more directly to a blocking I/O -model, we have implemented a high performance \texttt{TNonBlockingServer} -in C++ based on \texttt{libevent} and the \texttt{TFramedTransport}. We -implemented this by moving all I/O into one tight event loop using a -state machine. Essentially, the event loop reads framed requests into -\texttt{TMemoryBuffer} objects. Once entire requests are ready, they are -dispatched to the \texttt{TProcessor} object which can read directly from -the data in memory. - -\subsection{Compiler} -The Thrift compiler is implemented in C++ using standard \texttt{lex}/\texttt{yacc} -lexing and parsing. Though it could have been implemented with fewer -lines of code in another language (i.e. Python Lex-Yacc (PLY) or \texttt{ocamlyacc}), using C++ -forces explicit definition of the language constructs. Strongly typing the -parse tree elements (debatably) makes the code more approachable for new -developers. - -Code generation is done using two passes. The first pass looks only for -include files and type definitions. Type definitions are not checked during -this phase, since they may depend upon include files. All included files -are sequentially scanned in a first pass. Once the include tree has been -resolved, a second pass over all files is taken that inserts type definitions -into the parse tree and raises an error on any undefined types. The program is -then generated against the parse tree. - -Due to inherent complexities and potential for circular dependencies, -we explicitly disallow forward declaration. Two Thrift structs cannot -each contain an instance of the other. (Since we do not allow \texttt{null} -struct instances in the generated C++ code, this would actually be impossible.) - -\subsection{TFileTransport} -The \texttt{TFileTransport} logs Thrift requests/structs by -framing incoming data with its length and writing it out to disk. -Using a framed on-disk format allows for better error checking and -helps with the processing of a finite number of discrete events. The\\ -\texttt{TFileWriterTransport} uses a system of swapping in-memory buffers -to ensure good performance while logging large amounts of data. -A Thrift log file is split up into chunks of a specified size; logged messages -are not allowed to cross chunk boundaries. A message that would cross a chunk -boundary will cause padding to be added until the end of the chunk and the -first byte of the message are aligned to the beginning of the next chunk. -Partitioning the file into chunks makes it possible to read and interpret data -from a particular point in the file. - -\section{Facebook Thrift Services} -Thrift has been employed in a large number of applications at Facebook, including -search, logging, mobile, ads and the developer platform. Two specific usages are discussed below. - -\subsection{Search} -Thrift is used as the underlying protocol and transport layer for the Facebook Search service. -The multi-language code generation is well suited for search because it allows for application -development in an efficient server side language (C++) and allows the Facebook PHP-based web application -to make calls to the search service using Thrift PHP libraries. There is also a large -variety of search stats, deployment and testing functionality that is built on top -of generated Python code. Additionally, the Thrift log file format is -used as a redo log for providing real-time search index updates. Thrift has allowed the -search team to leverage each language for its strengths and to develop code at a rapid pace. - -\subsection{Logging} -The Thrift \texttt{TFileTransport} functionality is used for structured logging. Each -service function definition along with its parameters can be considered to be -a structured log entry identified by the function name. This log can then be used for -a variety of purposes, including inline and offline processing, stats aggregation and as a redo log. - -\section{Conclusions} -Thrift has enabled Facebook to build scalable backend -services efficiently by enabling engineers to divide and conquer. Application -developers can focus on application code without worrying about the -sockets layer. We avoid duplicated work by writing buffering and I/O logic -in one place, rather than interspersing it in each application. - -Thrift has been employed in a wide variety of applications at Facebook, -including search, logging, mobile, ads, and the developer platform. We have -found that the marginal performance cost incurred by an extra layer of -software abstraction is far eclipsed by the gains in developer efficiency and -systems reliability. - -\appendix - -\section{Similar Systems} -The following are software systems similar to Thrift. Each is (very!) briefly -described: - -\begin{itemize} -\item \textit{SOAP.} XML-based. Designed for web services via HTTP, excessive -XML parsing overhead. -\item \textit{CORBA.} Relatively comprehensive, debatably overdesigned and -heavyweight. Comparably cumbersome software installation. -\item \textit{COM.} Embraced mainly in Windows client software. Not an entirely -open solution. -\item \textit{Pillar.} Lightweight and high-performance, but missing versioning -and abstraction. -\item \textit{Protocol Buffers.} Closed-source, owned by Google. Described in -Sawzall paper. -\end{itemize} - -\acks - -Many thanks for feedback on Thrift (and extreme trial by fire) are due to -Martin Smith, Karl Voskuil and Yishan Wong. - -Thrift is a successor to Pillar, a similar system developed -by Adam D'Angelo, first while at Caltech and continued later at Facebook. -Thrift simply would not have happened without Adam's insights. - -\begin{thebibliography}{} - -\bibitem{boost.threads} -Kempf, William, -``Boost.Threads'', -\url{http://www.boost.org/doc/html/threads.html} - -\bibitem{boost.threadpool} -Henkel, Philipp, -``threadpool'', -\url{http://threadpool.sourceforge.net} - -\end{thebibliography} - -\end{document} http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/72ea8afd/depends/thirdparty/thrift/hawqbuild/Makefile ---------------------------------------------------------------------- diff --git a/depends/thirdparty/thrift/hawqbuild/Makefile b/depends/thirdparty/thrift/hawqbuild/Makefile deleted file mode 100644 index 3e00b41..0000000 --- a/depends/thirdparty/thrift/hawqbuild/Makefile +++ /dev/null @@ -1,37 +0,0 @@ -subdir = depends/thirdparty/thrift -top_builddir = ../../../.. -include Makefile.global - -.PHONY: all install distclean maintainer-clean clean pre-config - -ifeq ($(with_thrift), yes) - -all: build - cd $(top_srcdir)/$(subdir); mkdir -p install; \ - $(MAKE) DESTDIR=$(abs_top_builddir)/$(subdir)/install install - -install: - cd $(top_srcdir)/$(subdir) && $(MAKE) install - -distclean: - cd $(top_srcdir)/$(subdir) && $(MAKE) distclean - cd $(top_srcdir)/$(subdir) && rm -rf install - -maintainer-clean: distclean - -clean: - cd $(top_srcdir)/$(subdir) && $(MAKE) clean - -build: pre-config - cd $(top_srcdir)/$(subdir) && $(MAKE) - -pre-config: - cd $(top_srcdir)/$(subdir); \ - ./bootstrap.sh; \ - ./configure --without-php --prefix=$(prefix) - -else - -all install distclean maintainer-clean clean pre-config: - -endif http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/72ea8afd/depends/thirdparty/thrift/hawqbuild/Makefile.global.in ---------------------------------------------------------------------- diff --git a/depends/thirdparty/thrift/hawqbuild/Makefile.global.in b/depends/thirdparty/thrift/hawqbuild/Makefile.global.in deleted file mode 100644 index 22a11b0..0000000 --- a/depends/thirdparty/thrift/hawqbuild/Makefile.global.in +++ /dev/null @@ -1,17 +0,0 @@ -# Makefile.globbal.in for HAWQ build with --with-thrift -prefix := @prefix@ -with_thrift = @with_thrift@ - -# Support for VPATH builds -vpath_build = @vpath_build@ -abs_top_srcdir = @abs_top_srcdir@ -abs_top_builddir = @abs_top_builddir@ - -ifneq ($(vpath_build),yes) -top_srcdir = $(top_builddir) -srcdir = ./thrift -else # vpath_build = yes -top_srcdir = $(abs_top_srcdir) -srcdir = $(top_srcdir)/$(subdir) -VPATH = $(srcdir) -endif http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/72ea8afd/depends/thirdparty/thrift/json-schema.json ---------------------------------------------------------------------- diff --git a/depends/thirdparty/thrift/json-schema.json b/depends/thirdparty/thrift/json-schema.json deleted file mode 100644 index d61216a..0000000 --- a/depends/thirdparty/thrift/json-schema.json +++ /dev/null @@ -1,310 +0,0 @@ -{ - "$schema": "http://json-schema.org/draft-04/schema#", - - "id": "http://thrift.apache.org/program-schema#", - "description": "Schema for Apache Thrift protocol descriptors", - - "definitions": { - "type-id": { - "enum": [ - "void", - "string", - "bool", - "byte", - "i16", - "i32", - "i64", - "double", - "list", - "set", - "map", - "union", - "struct", - "exception", - "binary" - ] - }, - "base-type": { - "title": "Base types", - "type": "object", - "properties": { - "typeId": { - "enum": [ "void", "string", "bool", "byte", "i16", "i32", "i64", "double", "binary" ] - } - }, - "required": [ "typeId" ] - }, - "list-type": { - "title": "List and set types", - "type": "object", - "properties": { - "typeId": { "enum": [ "list", "set" ] }, - "elemTypeId": { "$ref": "#/definitions/type-id" }, - "elemType": { "$ref": "#/definitions/type-spec" } - }, - "required": [ "typeId", "elemTypeId", "elemType" ] - }, - "map-type": { - "title": "Map type", - "type": "object", - "properties": { - "typeId": { "enum": [ "map" ] }, - "keyTypeId": { "$ref": "#/definitions/type-id" }, - "keyType": { "$ref": "#/definitions/type-spec" }, - "valueTypeId": { "$ref": "#/definitions/type-id" }, - "valueType": { "$ref": "#/definitions/type-spec" } - }, - "required": [ "typeId", "keyTypeId", "valueTypeId" ] - }, - "struct-spec": { - "title": "Struct and union types", - "type": "object", - "properties": { - "typeId": { "enum": [ "union", "struct" ] }, - "class": { "type": "string" } - }, - "required": [ "typeId", "class" ] - }, - "type-spec": { - "allOf": [ - { "type": "object" }, - { - "oneOf": - [ - { "$ref": "#/definitions/base-type" }, - { "$ref": "#/definitions/list-type" }, - { "$ref": "#/definitions/map-type" }, - { "$ref": "#/definitions/struct-spec" } - ] - } - ] - }, - "name-and-doc": { - "type": "object", - "properties": { - "name": { "type": "string" }, - "doc": { "type": "string" } - }, - "required": [ "name" ] - }, - "enum": { - "type": "object", - "allOf": [ - { "$ref": "#/definitions/name-and-doc" }, - { - "required": [ "members" ], - "properties": { - "members": { - "type": "array", - "items": { - "type": "object", - "properties": { - "name": { "type": "string" }, - "value": { "type": "integer" } - }, - "required": [ "name", "value" ] - } - } - } - } - ] - }, - "typedef": { - "type": "object", - "allOf": [ - { "$ref": "#/definitions/name-and-doc" }, - { - "properties": { - "typeId": { "$ref": "#/definitions/type-id" }, - "type": { "$ref": "#/definitions/type-spec" } - }, - "required": [ "typeId" ] - } - ] - }, - "constant": { - "type": "object", - "allOf": [ - { "$ref": "#/definitions/name-and-doc" }, - { "$ref": "#/definitions/type-spec" }, - { - "properties": { - "value": { - "oneOf": [ - { "type": "string" }, - { "type": "number" }, - { "type": "array" }, - { "type": "object" } - ] - } - }, - "required": [ "value" ] - } - ] - }, - "field": { - "type": "object", - "allOf": [ - { "$ref": "#/definitions/name-and-doc" }, - { "$ref": "#/definitions/type-spec" }, - { - "properties": { - "key": { - "type": "integer", - "minimum": 1, - "maximum": 65535 - }, - "required": { - "enum": [ "required", "optional", "req_out" ] - }, - "default": { - "oneOf": [ - { "type": "string" }, - { "type": "number" }, - { "type": "array" }, - { "type": "object" } - ] - } - }, - "required": [ "key", "required" ] - } - ] - }, - "struct": { - "type": "object", - "allOf": [ - { "$ref": "#/definitions/name-and-doc" }, - { - "properties": { - "isException": { "type": "boolean" }, - "isUnion": { "type": "boolean" }, - "fields": { - "type": "array", - "items": { - "$ref": "#/definitions/field" - } - } - }, - "required": [ "isException", "isUnion", "fields" ] - } - ] - }, - "union": { - "$ref": "#/definitions/struct" - }, - "exception": { - "$ref": "#/definitions/struct" - }, - "function": { - "type": "object", - "allOf": [ - { "$ref": "#/definitions/name-and-doc" }, - { - "oneOf": [ - { - "properties": { "oneway": { "type": "boolean" } }, - "required": [ "oneway" ] - }, - { - "properties": { "returnType": { "$ref": "#/definitions/type-spec" } }, - "required": [ "returnType" ] - } - ] - }, - { - "properties": { - "arguments": { - "type": "array", - "items": { - "allOf": [ - { "$ref": "#/definitions/field" }, - { - "properties": { } - } - ] - } - }, - "exceptions": { - "type": "array", - "items": { - "$ref": "#/definitions/exception" - } - } - }, - "required": [ "oneway", "arguments", "exceptions" ] - } - ] - }, - "service": { - "type": "object", - "allOf": [ - { "$ref": "#/definitions/name-and-doc" }, - { - "properties": { - "functions": { - "type": "array", - "items": { - "$ref": "#/definitions/function" - } - } - }, - "required": [ "functions" ] - } - ] - } - }, - - "type": "object", - "required": [ - "name", - "namespaces", - "includes", - "enums", - "typedefs", - "structs", - "constants", - "services" - ], - "properties": { - "name": { - "type": "string" - }, - "includes": { - "type": "array", - "items": { - "type": "string" - }, - "uniqueItems": true - }, - "enums": { - "type": "array", - "items": { - "$ref": "#/definitions/enum" - } - }, - "typedefs": { - "type": "array", - "items": { - "$ref": "#/definitions/typedef" - } - }, - "structs": { - "type": "array", - "items": { - "$ref": "#/definitions/struct" - } - }, - "constants": { - "type": "array", - "items": { - "$ref": "#/definitions/constant" - } - }, - "services": { - "type": "array", - "items": { - "$ref": "#/definitions/service" - } - } - } -} http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/72ea8afd/depends/thirdparty/thrift/lib/Makefile.am ---------------------------------------------------------------------- diff --git a/depends/thirdparty/thrift/lib/Makefile.am b/depends/thirdparty/thrift/lib/Makefile.am deleted file mode 100644 index 5066a00..0000000 --- a/depends/thirdparty/thrift/lib/Makefile.am +++ /dev/null @@ -1,101 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# - -SUBDIRS = -PRECROSS_TARGET = - -if WITH_CPP -SUBDIRS += cpp -endif - -if WITH_C_GLIB -SUBDIRS += c_glib -endif - -if WITH_MONO -SUBDIRS += csharp -PRECROSS_TARGET += precross-csharp -endif - -if WITH_JAVA -SUBDIRS += java -PRECROSS_TARGET += precross-java -# JavaScript unit test depends on java -# so test only if java, ant & co is available -SUBDIRS += js/test -endif - -if WITH_PYTHON -SUBDIRS += py -endif - -if WITH_ERLANG -SUBDIRS += erl -endif - -if WITH_RUBY -SUBDIRS += rb -endif - -if WITH_HASKELL -SUBDIRS += hs -endif - -if WITH_PERL -SUBDIRS += perl -endif - -if WITH_PHP -SUBDIRS += php -endif - -if WITH_GO -SUBDIRS += go -endif - -if WITH_D -SUBDIRS += d -endif - -if WITH_NODEJS -SUBDIRS += nodejs -PRECROSS_TARGET += precross-nodejs -endif - -if WITH_LUA -SUBDIRS += lua -endif - -# All of the libs that don't use Automake need to go in here -# so they will end up in our release tarballs. -EXTRA_DIST = \ - as3 \ - cocoa \ - d \ - delphi \ - haxe \ - javame \ - js \ - ocaml \ - st \ - ts - -precross-%: - $(MAKE) -C $* precross -precross: $(PRECROSS_TARGET)