http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/datasets/001-aol.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/datasets/001-aol.md b/_docs/drill-docs/datasets/001-aol.md deleted file mode 100644 index 472f52f..0000000 --- a/_docs/drill-docs/datasets/001-aol.md +++ /dev/null @@ -1,47 +0,0 @@ ---- -title: "AOL Search" -parent: "Sample Datasets" ---- -## Quick Stats - -The [AOL Search dataset](http://en.wikipedia.org/wiki/AOL_search_data_leak) is -a collection of real query log data that is based on real users. - -## The Data Source - -The dataset consists of 20M Web queries from 650k users over a period of three -months, 440MB in total and available [for -download](http://zola.di.unipi.it/smalltext/datasets.html). The format used in -the dataset is: - - AnonID, Query, QueryTime, ItemRank, ClickURL - -... with: - - * AnonID, an anonymous user ID number. - * Query, the query issued by the user, case shifted with most punctuation removed. - * QueryTime, the time at which the query was submitted for search. - * ItemRank, if the user clicked on a search result, the rank of the item on which they clicked is listed. - * [ClickURL](http://www.dietkart.com/), if the user clicked on a search result, the domain portion of the URL in the clicked result is listed. - -Each line in the data represents one of two types of events - - * A query that was NOT followed by the user clicking on a result item. - * A click through on an item in the result list returned from a query. - -In the first case (query only) there is data in only the first three columns, -in the second case (click through), there is data in all five columns. For -click through events, the query that preceded the click through is included. -Note that if a user clicked on more than one result in the list returned from -a single query, there will be TWO lines in the data to represent the two -events. - -## The Queries - -Interesting queries, for example - - * Users querying for topic X - * Users that click on the first (second, third) ranked item - * TOP 10 domains searched - * TOP 10 domains clicked at -
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/datasets/002-enron.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/datasets/002-enron.md b/_docs/drill-docs/datasets/002-enron.md deleted file mode 100644 index 2ddbef6..0000000 --- a/_docs/drill-docs/datasets/002-enron.md +++ /dev/null @@ -1,21 +0,0 @@ ---- -title: "Enron Emails" -parent: "Sample Datasets" ---- -## Quick Stats - -The [Enron Email dataset](http://www.cs.cmu.edu/~enron/) contains data from -about 150 users, mostly senior management of Enron. - -## The Data Source - -Totalling some 500,000 messages, the [raw -data](http://www.cs.cmu.edu/~enron/enron_mail_20110402.tgz) (2009 version of -the dataset; ~423MB) is available for download as well as a [MySQL -dump](ftp://ftp.isi.edu/sims/philpot/data/enron-mysqldump.sql.gz) (~177MB). - -## The Queries - -Interesting queries, for example - - * Via [Query Dataset for Email Search](https://dbappserv.cis.upenn.edu/spell/) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/datasets/003-wikipedia.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/datasets/003-wikipedia.md b/_docs/drill-docs/datasets/003-wikipedia.md deleted file mode 100644 index 99e6e24..0000000 --- a/_docs/drill-docs/datasets/003-wikipedia.md +++ /dev/null @@ -1,105 +0,0 @@ ---- -title: "Wikipedia Edit History" -parent: "Sample Datasets" ---- -# Quick Stats - -The Wikipedia Edit History is a public dump of the website made available by -the wikipedia foundation. You can find details -[here](http://en.wikipedia.org/wiki/Wikipedia:Database_download). The dumps -are made available as SQL or XML dumps. You can find the entire schema drawn -together in this great [diagram](http://upload.wikimedia.org/wikipedia/commons -/thumb/4/42/MediaWiki_1.20_%2844edaa2%29_database_schema.svg/2193px- -MediaWiki_1.20_%2844edaa2%29_database_schema.svg.png). - -# Approach - -The _main_ distribution files are: - - * Current Pages: As of January 2013 this SQL dump was 9.0GB in its compressed format. - * Complere Archive: This is what we actually want, but at a size of multiple terrabytes, clearly exceeds the storage available at home. - -To have some real historic data, it is recommended to download a _Special -Export_ use this -[link](http://en.wikipedia.org/w/index.php?title=Special:Export). Using this -tool you generate a category specific XML dump and configure various export -options. There are some limits like a maximum of 1000 revisions per export, -but otherwise this should work out just fine. - -![](../../img/Overview.png) - -The entities used in the query use cases. - -# Use Cases - -## Select Change Volume Based on Time - -**Query** - - select rev.::parent.title, rev.::parent.id, sum(rev.text.bytes) - from mediawiki.page.revision as rev - where rev.timestamp.between(?, ?) - group by rev.::parent; - -_Explanation_: This is my attempt in mixing records and structures. The `from` -statement refers to `mediawiki` as a record type / row, but also mixes in -structural information, i.e. `page.revision`, internal to the record. The -query now uses `page.revision` as base to all other statements, in this case -the `select`, `where` and the `goup by`. The `where` statement again uses a -JSON like expression to state, that the timestamp must be between two values, -paramaeters are written as question marks, similar to JDBC. The `group by` -statement instructs the query to aggregate results based on the parent of a -`revision`, in this case a `page`. The `::parent` syntax is borrowed from -XPath. As we are aggregating on `page` it is safe to select the `title` and -`id` from the element in the `select`. We also use an aggregation function to -add the number of bytes changed in the given time frame, this should be self -explanatory. - -_Discussion_: - - * I am not very satisfied using the `::` syntax, as it is _ugly_. We probably wont need that many axis specifiers, e.g. we dont need any attribute specifiers, but for now, I could not think of anything better, - * Using an `as` expression in the `from` statement is optional, you would simply have to replace all references to `rev` with `revision`. - * I am not sure if this is desired, but you cannot see on first glance, where the _hierarchical_ stuff starts. This may be confusing to a RDBMS purist, at least it was for me at the beginning. But now I think this strikes the right mix between verbosity and elegance. - * I assume we would need some good indexing, but this should be achievable. We would need to translate the relative index `rev.timestamp` to an record absolute index `$.mediawiki.page.revision.timestamp` . Unclear to me now is whether the index would point to the record, or would it point to some kind of record substructure? - -## Select Change Volume Aggregated on Time - -**Query** - - select rev.::parent.title, rev.::parent.id, sum(rev.text.bytes), rev.timestamp.monthYear() - from mediawiki.page.revision as rev - where rev.timestamp.between(?, ?) - group by rev.::parent, rev.timestamp.monthYear() - order by rev.::parent.id, rev.timestamp.monthYear(); - -_Explanation_: This is refinement of the previous query. In this case we are -again returning a flat list, but are using an additional scalar result and -`group` statement. In the previous example we were returning one result per -found page, now we are returning one result per page and month of changes. -`Order by` is nothing special, in this case. - -_Discussion_: - - * I always considered mySQL confusing using implicit group by statements, as I prefer fail fast mechanisms. Hence I would opt for explicit `group by` operators. - * I would not provide implicit nodes into the records, i.e. if you want some attribute of a timestamp, call a function and not expect an automatically added element. So we want `rev.timestamp.monthYear()` and not `rev.timestamp.monthYear`. This may be quite confusing, especially if we have heterogenous record structures. We might even go ahead and support namespaces for custom, experimental features like `rev.timestamp.custom.maya:doomsDay()`. - -## Select Change Volume Based on Contributor - -**Query** - - select ctrbr.username, ctbr.ip, ctbr.userid, sum(ctbr::parent.bytes) as bytesContributed - from mediawiki.page..contributor as ctbr - group by ctbr.canonize() - order by bytesContributed; - -_Explanation_: This query looks quite similar to the previous queries, but I -added this one nonetheless, as it hints on an aggregation which may spawn -multiple records. The previous examples were based on pages, which are unique -to a record, where as the contributor may appear many times in many different -records. - -_Discussion_: - - * I have added the `..` operator in this example. Besides of being syntactic sugar, it also allows us to search for `revision` and `upload` which are both children of `page` and may both have a `contributor`. The more RBMS like alternative would be a `union`, but this was not natural enough. - * I am sure the `ctbr.canonize()` will cause lots of discussions :-). The thing is, that a contributor may repeat itself in many different records, and we dont really have an id. If you look at the wikimedia XSD, all three attributes are optional, and the data says the same, so we cannot just simply say `ctbr.userid`. Hence the canonize function should create a scalar value containing all available information of the node in a canonical form. - * Last but not least, I always hated, that mySQL would not be able to reuse column definitions from the `select` statement in the `order` statements. So I added on my wishlist, that the `bytesContributed` definition is reusable. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/design/001-plan.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/design/001-plan.md b/_docs/drill-docs/design/001-plan.md deleted file mode 100644 index 66147cb..0000000 --- a/_docs/drill-docs/design/001-plan.md +++ /dev/null @@ -1,25 +0,0 @@ ---- -title: "Drill Plan Syntax" -parent: "Design Docs" ---- -### Whats the plan? - -This section is about the end-to-end plan flow for Drill. The incoming query -to Drill can be a SQL 2003 query/DrQL or MongoQL. The query is converted to a -_Logical Plan_ that is a Drill's internal representation of the query -(language-agnostic). Drill then uses its optimization rules over the Logical -Plan to optimize it for best performance and crafts out a _Physical Plan_. The -Physical Plan is the actual plan the Drill then executes for the final data -processing. Below is a diagram to illustrate the flow: - -![](../../img/slide-15-638.png) - -**The Logical Plan** describes the abstract data flow of a language independent query i.e. it would be a representation of the input query which would not be dependent on the actual input query language. It generally tries to work with primitive operations without focus on optimization. This makes it more verbose than traditional query languages. This is to allow a substantial level of flexibility in defining higher-level query language features. It would be forwarded to the optimizer to get a physical plan. - -**The Physical Plan** is often called the execution plan, since it is the input to the execution engine. Its a description of the physical operations the execution engine will undertake to get the desired result. It is the output of the query planner and is a transformation of the logical plan after applying the optimization rules. - -Typically, the physical and execution plans will be represented using the same -JSON format as the logical plan. - -**Detailed document**: Here is a document that explains the Drill logical & physical plans in full detail. [Drill detailed plan syntax document](https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit). - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/design/002-rpc.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/design/002-rpc.md b/_docs/drill-docs/design/002-rpc.md deleted file mode 100644 index 05cb1d6..0000000 --- a/_docs/drill-docs/design/002-rpc.md +++ /dev/null @@ -1,19 +0,0 @@ ---- -title: "RPC Overview" -parent: "Design Docs" ---- -Drill leverages the Netty 4 project as an RPC underlayment. From there, we -built a simple protobuf based communication layer optimized to minimize the -requirement for on heap data transformations. Both client and server utilize -the CompleteRpcMessage protobuf envelope to communicate requests, responses -and errors. The communication model is that each endpoint sends a stream of -CompleteRpcMessages to its peer. The CompleteRpcMessage is prefixed by a -protobuf encoded length. - -CompleteRpcMessage is broken into three key components: RpcHeader, Protobuf -Body (bytes), RawBody (bytes). - -RpcHeader has the following fields: - -Drillbits communicate through the BitCom intermediary. BitCom manages... - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/design/003-query-stages.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/design/003-query-stages.md b/_docs/drill-docs/design/003-query-stages.md deleted file mode 100644 index 5c54249..0000000 --- a/_docs/drill-docs/design/003-query-stages.md +++ /dev/null @@ -1,42 +0,0 @@ ---- -title: "Query Stages" -parent: "Design Docs" ---- -## Overview - -Apache Drill is a system for interactive analysis of large-scale datasets. It -was designed to allow users to query across multiple large big data systems -using traditional query technologies such as SQL. It is built as a flexible -framework to support a wide variety of data operations, query languages and -storage engines. - -## Query Parsing - -A Drillbit is capable of parsing a provided query into a logical plan. In -theory, Drill is capable of parsing a large range of query languages. At -launch, this will likely be restricted to an enhanced SQL2003 language. - -## Physical Planning - -Once a query is parsed into a logical plan, a Drillbit will then translate the -plan into a physical plan. The physical plan will then be optimized for -performance. Since plan optimization can be computationally intensive, a -distributed in-memory cache will provide LRU retrieval of previously generated -optimized plans to speed query execution. - -## Execution Planning - -Once a physical plan is generated, the physical plan is then rendered into a -set of detailed executional plan fragments (EPFs). This rendering is based on -available resources, cluster load, query priority and detailed information -about data distribution. In the case of large clusters, a subset of nodes will -be responsible for rendering the EPFs. Shared state will be managed through -the use of a distributed in-memory cache. - -## Execution Operation - -Query execution starts with each Drillbit being provided with one or more EPFs -associated with query execution. A portion of these EPFs may be identified as -initial EPFs and thus they are executed immediately. Other EPFs are executed -as data flows into them. - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/design/004-research.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/design/004-research.md b/_docs/drill-docs/design/004-research.md deleted file mode 100644 index 77be828..0000000 --- a/_docs/drill-docs/design/004-research.md +++ /dev/null @@ -1,48 +0,0 @@ ---- -title: "Useful Research" -parent: "Design Docs" ---- -## Drill itself - - * Apache Proposal: <http://wiki.apache.org/incubator/DrillProposal> - * Mailing List Archive: <http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/> - * DrQL ANTLR grammar: <https://gist.github.com/3483314> - * Apache Drill, Architecture outlines: <http://www.slideshare.net/jasonfrantz/drill-architecture-20120913> - -## Background info - - * Dremel Paper: <http://research.google.com/pubs/pub36632.html> - * Dremel Presentation: <http://www.slideshare.net/robertlz/dremel-interactive-analysis-of-webscale-datasets> - * Query Language: <http://developers.google.com/bigquery/docs/query-reference> - * Protobuf: <http://developers.google.com/protocol-buffers/docs/proto> - * Dryad: <http://research.microsoft.com/en-us/projects/dryad/> - * SQLServer Query Plan: <http://msdn.microsoft.com/en-us/library/ms191158.aspx> - * CStore: <http://db.csail.mit.edu/projects/cstore/> - * Vertica (commercial evolution of C-Store): <http://vldb.org/pvldb/vol5/p1790_andrewlamb_vldb2012.pdf> - * <http://pdf.aminer.org/000/094/728/database_cracking.pdf> - * <http://homepages.cwi.nl/~idreos/NoDBsigmod2012.pdf> - * <http://db.csail.mit.edu/projects/cstore/abadiicde2007.pdf> - * Hive Architecture: <https://cwiki.apache.org/confluence/display/Hive/Design#Design-HiveArchitecture> - * Fast Response in an unreliable world: <http://research.google.com/people/jeff/latency.html> - * Column-Oriented Database Systems: <http://www.vldb.org/pvldb/2/vldb09-tutorial6.pdf> (SLIDES: <http://phdopen.mimuw.edu.pl/lato10/boncz_mimuw.pdf>) - -## OpenDremel - - * OpenDremel site: <http://code.google.com/p/dremel/> - * Design Proposal for Drill: <http://www.slideshare.net/CamuelGilyadov/apache-drill-14071739> - -## Dazo (second generation OpenDremel) - - * Dazo repos: <https://github.com/Dazo-org> - * ZeroVM (multi-tenant executor): <http://zerovm.org/> - * ZeroVM elaboration: <http://news.ycombinator.com/item?id=3746222> - -## Rob Grzywinski Dremel adventures - - * <https://github.com/rgrzywinski/field-stripe/> - -## Code generation / Physical plan generation - - * <http://www.vldb.org/pvldb/vol4/p539-neumann.pdf> (SLIDES: <http://www.vldb.org/2011/files/slides/research9/rSession9-3.pdf>) - * <http://www.vldb.org/pvldb/2/vldb09-327.pdf> (SLIDES: <http://www.slideserve.com/cher/simd-scan-ultra-fast-in-memory-table-scan-using-on-chip-vector-processing-units>) - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/design/005-value.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/design/005-value.md b/_docs/drill-docs/design/005-value.md deleted file mode 100644 index 0d19a96..0000000 --- a/_docs/drill-docs/design/005-value.md +++ /dev/null @@ -1,191 +0,0 @@ ---- -title: "Value Vectors" -parent: "Design Docs" ---- -This document defines the data structures required for passing sequences of -columnar data between [Operators](https://docs.google.com/a/maprtech.com/docum -ent/d/1zaxkcrK9mYyfpGwX1kAV80z0PCi8abefL45zOzb97dI/edit#bookmark=id.iip15ful18 -mm). - -# Goals - -#### Support Operators Written in Multiple Language - -ValueVectors should support operators written in C/C++/Assembly. To support -this, the underlying ByteBuffer will not require modification when passed -through the JNI interface. The ValueVector will be considered immutable once -constructed. Endianness has not yet been considered. - -#### Access - -Reading a random element from a ValueVector must be a constant time operation. -To accomodate, elements are identified by their offset from the start of the -buffer. Repeated, nullable and variable width ValueVectors utilize in an -additional fixed width value vector to index each element. Write access is not -supported once the ValueVector has been constructed by the RecordBatch. - -#### Efficient Subsets of Value Vectors - -When an operator returns a subset of values from a ValueVector, it should -reuse the original ValueVector. To accomplish this, a level of indirection is -introduced to skip over certain values in the vector. This level of -indirection is a sequence of offsets which reference an offset in the original -ValueVector and the count of subsequent values which are to be included in the -subset. - -#### Pooled Allocation - -ValueVectors utilize one or more buffers under the covers. These buffers will -be drawn from a pool. Value vectors are themselves created and destroyed as a -schema changes during the course of record iteration. - -#### Homogenous Value Types - -Each value in a Value Vector is of the same type. The [Record Batch](https://d -ocs.google.com/a/maprtech.com/document/d/1zaxkcrK9mYyfpGwX1kAV80z0PCi8abefL45z -Ozb97dI/edit#bookmark=kix.s2xuoqnr8obe) implementation is responsible for -creating a new Value Vector any time there is a change in schema. - -# Definitions - -Data Types - -The canonical source for value type definitions is the [Drill -Datatypes](http://bit.ly/15JO9bC) document. The individual types are listed -under the âBasic Data Typesâ tab, while the value vector types can be found -under the âValue Vectorsâ tab. - -Operators - -An operator is responsible for transforming a stream of fields. It operates on -Record Batches or constant values. - -Record Batch - -A set of field values for some range of records. The batch may be composed of -Value Vectors, in which case each batch consists of exactly one schema. - -Value Vector - -The value vector is comprised of one or more contiguous buffers; one which -stores a sequence of values, and zero or more which store any metadata -associated with the ValueVector. - -# Data Structure - -A ValueVector stores values in a ByteBuf, which is a contiguous region of -memory. Additional levels of indirection are used to support variable value -widths, nullable values, repeated values and selection vectors. These levels -of indirection are primarily lookup tables which consist of one or more fixed -width ValueVectors which may be combined (e.g. for nullable, variable width -values). A fixed width ValueVector of non-nullable, non-repeatable values does -not require an indirect lookup; elements can be accessed directly by -multiplying position by stride. - -Fixed Width Values - -Fixed width ValueVectors simply contain a packed sequence of values. Random -access is supported by accessing element n at ByteBuf[0] + Index * Stride, -where Index is 0-based. The following illustrates the underlying buffer of -INT4 values [1 .. 6]: - -![image](../../img/value1.png) -<!--https://lh5.googleusercontent.com/iobQUgeF4dyrWFeqVfhIBZKbkjrLk5sBJqYhWdzm -IyMmmcX1pzZaeQiKZ5OzYeafxcY5IZHXDKuG_JkPwJrjxeLJITpXBbn7r5ep1V07a3JBQC0cJg4qKf -VhzPZ0PDeh--> - -Nullable Values - -Nullable values are represented by a vector of bit values. Each bit in the -vector corresponds to an element in the ValueVector. If the bit is not set, -the value is NULL. Otherwise the value is retrieved from the underlying -buffer. The following illustrates a NullableValueVector of INT4 values 2, 3 -and 6: - -![](../../img/value2.png) - -<!--![](https://lh5.googleusercontent.com/3M19t18av5cuXflB3WYHS0OJBaO-zFHD8TcNaKF0 -ua6g9h_LPnBijkGavCCwDDsbQzSoT5Glj1dgIwfhzK_xFPjPzc3w5O2NaVrbvEQgFhuOpK3yEr- -nSyMocEjRuhGB)--> - - - -#### Repeated Values - -A repeated ValueVector is used for elements which can contain multiple values -(e.g. a JSON array). A table of offset and count pairs is used to represent -each repeated element in the ValueVector. A count of zero means the element -has no values (note the offset field is unused in this case). The following -illustrates three fields; one with two values, one with no values, and one -with a single value: - -![](../../img/value3.png) -<!--![](https://lh6.googleusercontent.com/nFIJjIOPAl9zXttVURgp-xkW8v6z6F7ikN7sMREm -58pdtfTlwdfjEUH4CHxknHexGdIeEhPHbMMzAgqMwnL99IZlR_YzAWvJaiStOO4QMtML8zLuwLvFDr -hJKLMNc0zg)--> - -ValueVector Representation of the equivalent JSON: - -x:[1, 2] - -x:[ ] - -x:[3] - -Variable Width Values - -Variable width values are stored contiguously in a ByteBuf. Each element is -represented by an entry in a fixed width ValueVector of offsets. The length of -an entry is deduced by subtracting the offset of the following field. Because -of this, the offset table will always contain one more entry than total -elements, with the last entry pointing to the end of the buffer. - - -![](../../img/value4.png) -<!--![](https://lh5.googleusercontent.com/ZxAfkmCVRJsKgLYO0pLbRM- -aEjR2yyNZWfYkFSmlsod8GnM3huKHQuc6Do-Bp4U1wK- -hF3e6vGHTiGPqhEc25YEHEuVTNqb1sBj0LdVrOlvGBzL8nywQbn8O1RlN-vrw)--> - -Repeated Map Vectors - -A repeated map vector contains one or more maps (akin to an array of objects -in JSON). The values of each field in the map are stored contiguously within a -ByteBuf. To access a specific record, a lookup table of count and offset pairs -is used. This lookup table points to the first repeated field in each column, -while the count indicates the maximum number of elements for the column. The -following example illustrates a RepeatedMap with two records; one with two -objects, and one with a single object: - -![](../../img/value5.png) -<!--![](https://lh3.googleusercontent.com -/l8yo_z_MbBz9C3OoGQEy1bNOrmnNbo2e0XtCUDRbdRR4mbCYK8h- -Lz7_VlhDtbTkPQziwwyNpw3ylfEKjMKtj-D0pUah4arohs1hcnHrzoFfE-QZRwUdQmEReMdpSgIT)--> - -ValueVector representation of the equivalent JSON: - -x: [ {name:âSamâ, age:1}, {name:âMaxâ, age:2} ] - -x: [ {name:âJoeâ, age:3} ] - -Selection Vectors - -A Selection Vector represents a subset of a ValueVector. It is implemented -with a list of offsets which identify each element in the ValueVector to be -included in the SelectionVector. In the case of a fixed width ValueVector, the -offsets reference the underlying ByteBuf. In the case of a nullable, repeated -or variable width ValueVector, the offset references the corresponding lookup -table. The following illustrates a SelectionVector of INT4 (fixed width) -values 2, 3 and 5 from the original vector of [1 .. 6]: - -![](../../img/value6.png) -<!--![](https://lh5.googleusercontent.com/-hLlAaq9n-Q0_fZ_MKk3yFpXWZO7JOJLm- -NDh_a_x2Ir5BhZDrZX0t-6e_w3K7R4gfgQIsv-sPxryTUzrJRszNpA3pEEn5V5uRCAlMtHejTpcu- -_QFPfSTzzpdsf88OS)--> - -The following illustrates the same ValueVector with nullable fields: - -![](../../img/value7.png) -<!--![](https://lh3.googleusercontent.com -/cJxo5H_nsWWlKFUFxjOHHC6YI4sPyG5Fjj1gbdAT2AEo-c6cdkZelso6rYeZV4leMWMfbei_- -rncjasvR9u4MUXgkpFpM22CUSnnkVX6ynpkcLW1Q-s5F2NgqCez1Fa_)--> - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/dev-custom-fcn/001-dev-simple.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/dev-custom-fcn/001-dev-simple.md b/_docs/drill-docs/dev-custom-fcn/001-dev-simple.md deleted file mode 100644 index 47be7d9..0000000 --- a/_docs/drill-docs/dev-custom-fcn/001-dev-simple.md +++ /dev/null @@ -1,51 +0,0 @@ ---- -title: "Develop a Simple Function" -parent: "Develop Custom Functions" ---- -Create a class within a Java package that implements Drillâs simple interface -into the program, and include the required information for the function type. -Your function must include data types that Drill supports, such as int or -BigInt. For a list of supported data types, refer to the Apache Drill SQL -Reference. - -Complete the following steps to develop a simple function using Drillâs simple -function interface: - - 1. Create a Maven project and add the following dependency: - - <dependency> - <groupId>org.apache.drill.exec</groupId> - <artifactId>drill-java-exec</artifactId> - <version>1.0.0-m2-incubating-SNAPSHOT</version> - </dependency> - - 2. Create a class that implements the `DrillSimpleFunc` interface and identify the scope as `FunctionScope.SIMPLE`. - - **Example** - - @FunctionTemplate(name = "myaddints", scope = FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL) - public static class IntIntAdd implements DrillSimpleFunc { - - 3. Provide the variables used in the code in the `Param` and `Output` bit holders. - - **Example** - - @Param IntHolder in1; - @Param IntHolder in2; - @Output IntHolder out; - - 4. Add the code that performs operations for the function in the `eval()` method. - - **Example** - - public void setup(RecordBatch b) { - } - public void eval() { - out.value = (int) (in1.value + in2.value); - } - - 5. Use the maven-source-plugin to compile the sources and classes JAR files. Verify that an empty `drill-module.conf` is included in the resources folder of the JARs. -Drill searches this module during classpath scanning. If the file is not -included in the resources folder, you can add it to the JAR file or add it to -`etc/drill/conf`. - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/dev-custom-fcn/002-dev-aggregate.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/dev-custom-fcn/002-dev-aggregate.md b/_docs/drill-docs/dev-custom-fcn/002-dev-aggregate.md deleted file mode 100644 index fe6f406..0000000 --- a/_docs/drill-docs/dev-custom-fcn/002-dev-aggregate.md +++ /dev/null @@ -1,59 +0,0 @@ ---- -title: "Developing an Aggregate Function" -parent: "Develop Custom Functions" ---- -Create a class within a Java package that implements Drillâs aggregate -interface into the program. Include the required information for the function. -Your function must include data types that Drill supports, such as int or -BigInt. For a list of supported data types, refer to the Drill SQL Reference. - -Complete the following steps to create an aggregate function: - - 1. Create a Maven project and add the following dependency: - - <dependency> - <groupId>org.apache.drill.exec</groupId> - <artifactId>drill-java-exec</artifactId> - <version>1.0.0-m2-incubating-SNAPSHOT</version> - </dependency> - - 2. Create a class that implements the `DrillAggFunc` interface and identify the scope as `FunctionTemplate.FunctionScope.POINT_AGGREGATE`. - - **Example** - - @FunctionTemplate(name = "count", scope = FunctionTemplate.FunctionScope.POINT_AGGREGATE) - public static class BitCount implements DrillAggFunc{ - - 3. Provide the variables used in the code in the `Param, Workspace, `and `Output` bit holders. - - **Example** - - @Param BitHolder in; - @Workspace BitHolder value; - @Output BitHolder out; - - 4. Include the `setup(), add(), output(),` and `reset()` methods. - - **Example** - public void setup(RecordBatch b) { - value = new BitHolder(); - value.value = 0; - } - - @Override - public void add() { - value.value++; - } - @Override - public void output() { - out.value = value.value; - } - @Override - public void reset() { - - value.value = 0; - - 5. Use the maven-source-plugin to compile the sources and classes JAR files. Verify that an empty `drill-module.conf` is included in the resources folder of the JARs. -Drill searches this module during classpath scanning. If the file is not -included in the resources folder, you can add it to the JAR file or add it to -`etc/drill/conf`. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/dev-custom-fcn/003-add-custom.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/dev-custom-fcn/003-add-custom.md b/_docs/drill-docs/dev-custom-fcn/003-add-custom.md deleted file mode 100644 index 7efcdce..0000000 --- a/_docs/drill-docs/dev-custom-fcn/003-add-custom.md +++ /dev/null @@ -1,28 +0,0 @@ ---- -title: "Adding Custom Functions to Drill" -parent: "Develop Custom Functions" ---- -After you develop your custom function and generate the sources and classes -JAR files, add both JAR files to the Drill classpath, and include the name of -the package that contains the classes to the main Drill configuration file. -Restart the Drillbit on each node to refresh the configuration. - -To add a custom function to Drill, complete the following steps: - - 1. Add the sources JAR file and the classes JAR file for the custom function to the Drill classpath on all nodes running a Drillbit. To add the JAR files, copy them to `<drill installation directory>/jars/3rdparty`. - 2. On all nodes running a Drillbit, add the name of the package that contains the classes to the main Drill configuration file in the following location: - - <drill installation directory>/conf/drill-override.conf - - To add the package, add the package name to - `drill.logical.function.package+=`. Separate package names with a comma. - - **Example** - - drill.logical.function.package+= [âorg.apache.drill.exec.expr.fn.impl","org.apache.drill.udfsâ] - - 3. On each Drill node in the cluster, navigate to the Drill installation directory, and issue the following command to restart the Drillbit: - - <drill installation directory>/bin/drillbit.sh restart - - Now you can issue queries with your custom functions to Drill. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/dev-custom-fcn/004-use-custom.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/dev-custom-fcn/004-use-custom.md b/_docs/drill-docs/dev-custom-fcn/004-use-custom.md deleted file mode 100644 index 6a0245a..0000000 --- a/_docs/drill-docs/dev-custom-fcn/004-use-custom.md +++ /dev/null @@ -1,55 +0,0 @@ ---- -title: "Using Custom Functions in Queries" -parent: "Develop Custom Functions" ---- -When you issue a query with a custom function to Drill, Drill searches the -classpath for the function that matches the request in the query. Once Drill -locates the function for the request, Drill processes the query and applies -the function during processing. - -Your Drill installation includes sample files in the Drill classpath. One -sample file, `employee.json`, contains some fictitious employee data that you -can query with a custom function. - -## Simple Function Example - -This example uses the `myaddints` simple function in a query on the -`employee.json` file. - -If you issue the following query to Drill, you can see all of the employee -data within the `employee.json` file: - - 0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json`; - -The query returns the following results: - - | employee_id | full_name | first_name | last_name | position_id | position_title | store_id | department_id | birth_da | - +-------------+------------+------------+------------+-------------+----------------+------------+---------------+----------+----------- - | 1101 | Steve Eurich | Steve | Eurich | 16 | Store Temporary Checker | 12 | 16 | - | 1102 | Mary Pierson | Mary | Pierson | 16 | Store Temporary Checker | 12 | 16 | - | 1103 | Leo Jones | Leo | Jones | 16 | Store Temporary Checker | 12 | 16 | - ⦠- -Since the `postion_id` and `store_id` columns contain integers, you can issue -a query with the `myaddints` custom function on these columns to add the -integers in the columns. - -The following query tells Drill to apply the `myaddints` function to the -`position_id` and `store_id` columns in the `employee.json` file: - - 0: jdbc:drill:zk=local> SELECT myaddints(CAST(position_id AS int),CAST(store_id AS int)) FROM cp.`employee.json`; - -Since JSON files do not store information about data types, you must apply the -`CAST` function in the query to tell Drill that the columns contain integer -values. - -The query returns the following results: - - +------------+ - | EXPR$0 | - +------------+ - | 28 | - | 28 | - | 36 | - +------------+ - ⦠\ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/dev-custom-fcn/005-cust-interface.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/dev-custom-fcn/005-cust-interface.md b/_docs/drill-docs/dev-custom-fcn/005-cust-interface.md deleted file mode 100644 index b84cad0..0000000 --- a/_docs/drill-docs/dev-custom-fcn/005-cust-interface.md +++ /dev/null @@ -1,14 +0,0 @@ ---- -title: "Custom Function Interfaces" -parent: "Develop Custom Functions" ---- -Implement the Drill interface appropriate for the type of function that you -want to develop. Each interface provides a set of required holders where you -input data types that your function uses and required methods that Drill calls -to perform your functionâs operations. - -Click on either of the links for more information about custom function -interfaces for Drill: - - * [Simple Function Interface](/confluence/display/DRILL/Simple+Function+Interface) - * [Aggregate Function Interface](/confluence/display/DRILL/Aggregate+Function+Interface) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/develop/001-compile.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/develop/001-compile.md b/_docs/drill-docs/develop/001-compile.md deleted file mode 100644 index 85db854..0000000 --- a/_docs/drill-docs/develop/001-compile.md +++ /dev/null @@ -1,37 +0,0 @@ ---- -title: "Compiling Drill From source" -parent: "Develop Drill" ---- -## Prerequisites - - * Maven 3.0.4 or later - * Oracle JDK 7 or later - -Run the following commands to verify that you have the correct versions of -Maven and JDK installed: - - java -version - mvn -version - -## 1\. Clone the Repository - - git clone https://git-wip-us.apache.org/repos/asf/incubator-drill.git - -## 2\. Compile the Code - - cd incubator-drill - mvn clean install -DskipTests - -## 3\. Explode the Tarball in the Installation Directory - - mkdir ~/compiled-drill - tar xvzf distribution/target/*.tar.gz --strip=1 -C ~/compiled-drill - -Now that you have Drill installed, you can connect to Drill and query sample -data or you can connect Drill to your data sources. - - * To connect Drill to your data sources, refer to [Connecting to Data Sources](https://cwiki.apache.org/confluence/display/DRILL/Connecting+to+Data+Sources) for instructions. - * To connect to Drill and query sample data, refer to the following topics: - * [Start Drill ](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=44994063)(For Drill installed in embedded mode) - * [Query Data ](https://cwiki.apache.org/confluence/display/DRILL/Query+Data) - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/develop/002-setup.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/develop/002-setup.md b/_docs/drill-docs/develop/002-setup.md deleted file mode 100644 index 19fb554..0000000 --- a/_docs/drill-docs/develop/002-setup.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -title: "Setting Up Your Development Environment" -parent: "Develop Drill" ---- -TBD \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/develop/003-patch-tool.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/develop/003-patch-tool.md b/_docs/drill-docs/develop/003-patch-tool.md deleted file mode 100644 index 5b94577..0000000 --- a/_docs/drill-docs/develop/003-patch-tool.md +++ /dev/null @@ -1,160 +0,0 @@ ---- -title: "Compiling Drill From source" -parent: "Develop Drill" ---- - * Drill JIRA and Reviewboard script - * 1\. Setup - * 2\. Usage - * 3\. Upload patch - * 4\. Update patch - * JIRA command line tool - * 1\. Download the JIRA command line package - * 2\. Configure JIRA username and password - * Reviewboard - * 1\. Install the post-review tool - * 2\. Configure Stuff - * FAQ - * When I run the script, it throws the following error and exits - * When I run the script, it throws the following error and exits - -### Drill JIRA and Reviewboard script - -#### 1\. Setup - - 1. Follow instructions [here](https://cwiki.apache.org/confluence/display/DRILL/Drill+Patch+Review+Tool#Drillpatchreviewtool-JIRAcommandlinetool) to setup the jira-python package - 2. Follow instructions [here](https://cwiki.apache.org/confluence/display/DRILL/Drill+Patch+Review+Tool#Drillpatchreviewtool-Reviewboard) to setup the reviewboard python tools - 3. Install the argparse module - - On Linux -> sudo yum install python-argparse - On Mac -> sudo easy_install argparse - -#### 2\. Usage - - nnarkhed-mn: nnarkhed$ python drill-patch-review.py --help - usage: drill-patch-review.py [-h] -b BRANCH -j JIRA [-s SUMMARY] - [-d DESCRIPTION] [-r REVIEWBOARD] [-t TESTING] - [-v VERSION] [-db] -rbu REVIEWBOARDUSER -rbp REVIEWBOARDPASSWORD - - Drill patch review tool - - optional arguments: - -h, --help show this help message and exit - -b BRANCH, --branch BRANCH - Tracking branch to create diff against - -j JIRA, --jira JIRA JIRA corresponding to the reviewboard - -s SUMMARY, --summary SUMMARY - Summary for the reviewboard - -d DESCRIPTION, --description DESCRIPTION - Description for reviewboard - -r REVIEWBOARD, --rb REVIEWBOARD - Review board that needs to be updated - -t TESTING, --testing-done TESTING - Text for the Testing Done section of the reviewboard - -v VERSION, --version VERSION - Version of the patch - -db, --debug Enable debug mode - -rbu, --reviewboard-user Reviewboard user name - -rbp, --reviewboard-password Reviewboard password - -#### 3\. Upload patch - - 1. Specify the branch against which the patch should be created (-b) - 2. Specify the corresponding JIRA (-j) - 3. Specify an **optional** summary (-s) and description (-d) for the reviewboard - -Example: - - python drill-patch-review.py -b origin/master -j DRILL-241 -rbu tnachen -rbp password - -#### 4\. Update patch - - 1. Specify the branch against which the patch should be created (-b) - 2. Specify the corresponding JIRA (--jira) - 3. Specify the rb to be updated (-r) - 4. Specify an **optional** summary (-s) and description (-d) for the reviewboard, if you want to update it - 5. Specify an **optional** version of the patch. This will be appended to the jira to create a file named JIRA-<version>.patch. The purpose is to be able to upload multiple patches to the JIRA. This has no bearing on the reviewboard update. - -Example: - - python drill-patch-review.py -b origin/master -j DRILL-241 -r 14081 rbp tnachen -rbp password - -### JIRA command line tool - -#### 1\. Download the JIRA command line package - -Install the jira-python package. - - sudo easy_install jira-python - -#### 2\. Configure JIRA username and password - -Include a jira.ini file in your $HOME directory that contains your Apache JIRA -username and password. - - nnarkhed-mn:~ nnarkhed$ cat ~/jira.ini - user=nehanarkhede - password=*********** - -### Reviewboard - -This is a quick tutorial on using [Review Board](https://reviews.apache.org) -with Drill. - -#### 1\. Install the post-review tool - -If you are on RHEL, Fedora or CentOS, follow these steps: - - sudo yum install python-setuptools - sudo easy_install -U RBTools - -If you are on Mac, follow these steps: - - sudo easy_install -U setuptools - sudo easy_install -U RBTools - -For other platforms, follow the [instructions](http://www.reviewboard.org/docs/manual/dev/users/tools/post-review/) to -setup the post-review tool. - -#### 2\. Configure Stuff - -Then you need to configure a few things to make it work. - -First set the review board url to use. You can do this from in git: - - git config reviewboard.url https://reviews.apache.org - -If you checked out using the git wip http url that confusingly won't work with -review board. So you need to configure an override to use the non-http url. -You can do this by adding a config file like this: - - jkreps$ cat ~/.reviewboardrc - REPOSITORY = 'git://git.apache.org/incubator-drill.git' - TARGET_GROUPS = 'drill-git' -GUESS_FIELDS = True - - - -### FAQ - -#### When I run the script, it throws the following error and exits - - nnarkhed$python drill-patch-review.py -b trunk -j DRILL-241 - There don't seem to be any diffs - -There are two reasons for this: - - * The code is not checked into your local branch - * The -b branch is not pointing to the remote branch. In the example above, "trunk" is specified as the branch, which is the local branch. The correct value for the -b (--branch) option is the remote branch. "git branch -r" gives the list of the remote branch names. - -#### When I run the script, it throws the following error and exits - -Error uploading diff - -Your review request still exists, but the diff is not attached. - -One of the most common root causes of this error are that the git remote -branches are not up-to-date. Since the script already does that, it is -probably due to some other problem. You can run the script with the --debug -option that will make post-review run in the debug mode and list the root -cause of the issue. - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/install/001-drill-in-10.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/install/001-drill-in-10.md b/_docs/drill-docs/install/001-drill-in-10.md deleted file mode 100644 index bd60141..0000000 --- a/_docs/drill-docs/install/001-drill-in-10.md +++ /dev/null @@ -1,395 +0,0 @@ ---- -title: "Apache Drill in 10 Minutes" -parent: "Install Drill" ---- -* Objective -* A Few Bits About Apache Drill -* Process Overview -* Install Drill - * Installing Drill on Linux - * Installing Drill on Mac OS X - * Installing Drill on Windows -* Start Drill -* Query Sample Data -* Summary -* Next Steps -* More Information - - - -# Objective - -Use Apache Drill to query sample data in 10 minutes. For simplicity, youâll -run Drill in _embedded_ mode rather than _distributed_ mode to try out Drill -without having to perform any setup tasks. - -# A Few Bits About Apache Drill - -Drill is a clustered, powerful MPP (Massively Parallel Processing) query -engine for Hadoop that can process petabytes of data, fast. Drill is useful -for short, interactive ad-hoc queries on large-scale data sets. Drill is -capable of querying nested data in formats like JSON and Parquet and -performing dynamic schema discovery. Drill does not require a centralized -metadata repository. - -### **_Dynamic schema discovery _** - -Drill does not require schema or type specification for data in order to start -the query execution process. Drill starts data processing in record-batches -and discovers the schema during processing. Self-describing data formats such -as Parquet, JSON, AVRO, and NoSQL databases have schema specified as part of -the data itself, which Drill leverages dynamically at query time. Because -schema can change over the course of a Drill query, all Drill operators are -designed to reconfigure themselves when schemas change. - -### **_Flexible data model_** - -Drill allows access to nested data attributes, just like SQL columns, and -provides intuitive extensions to easily operate on them. From an architectural -point of view, Drill provides a flexible hierarchical columnar data model that -can represent complex, highly dynamic and evolving data models. Drill allows -for efficient processing of these models without the need to flatten or -materialize them at design time or at execution time. Relational data in Drill -is treated as a special or simplified case of complex/multi-structured data. - -### **_De-centralized metadata_** - -Drill does not have a centralized metadata requirement. You do not need to -create and manage tables and views in a metadata repository, or rely on a -database administrator group for such a function. Drill metadata is derived -from the storage plugins that correspond to data sources. Storage plugins -provide a spectrum of metadata ranging from full metadata (Hive), partial -metadata (HBase), or no central metadata (files). De-centralized metadata -means that Drill is NOT tied to a single Hive repository. You can query -multiple Hive repositories at once and then combine the data with information -from HBase tables or with a file in a distributed file system. You can also -use SQL DDL syntax to create metadata within Drill, which gets organized just -like a traditional database. Drill metadata is accessible through the ANSI -standard INFORMATION_SCHEMA database. - -### **_Extensibility_** - -Drill provides an extensible architecture at all layers, including the storage -plugin, query, query optimization/execution, and client API layers. You can -customize any layer for the specific needs of an organization or you can -extend the layer to a broader array of use cases. Drill provides a built in -classpath scanning and plugin concept to add additional storage plugins, -functions, and operators with minimal configuration. - -# Process Overview - -Download the Apache Drill archive and extract the contents to a directory on -your machine. The Apache Drill archive contains sample JSON and Parquet files -that you can query immediately. - -Query the sample JSON and parquet files using SQLLine. SQLLine is a pure-Java -console-based utility for connecting to relational databases and executing SQL -commands. SQLLine is used as the shell for Drill. Drill follows the ANSI SQL: -2011 standard with a few extensions for nested data formats. - -### Prerequisite - -You must have the following software installed on your machine to run Drill: - -<div class="table-wrap"><table class="confluenceTable"><tbody><tr><td class="confluenceTd"><p><strong>Software</strong></p></td><td class="confluenceTd"><p><strong>Description</strong></p></td></tr><tr><td class="confluenceTd"><p><a href="http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html" class="external-link" rel="nofollow">Oracle JDK version 7</a></p></td><td class="confluenceTd"><p>A set of programming tools for developing Java applications.</p></td></tr></tbody></table></div> - - -### Prerequisite Validation - -Run the following command to verify that the system meets the software -prerequisite: -<table class="confluenceTable"><tbody><tr><td class="confluenceTd"><p><strong>Command </strong></p></td><td class="confluenceTd"><p><strong>Example Output</strong></p></td></tr><tr><td class="confluenceTd"><p><code>java âversion</code></p></td><td class="confluenceTd"><p><code>java version "1.7.0_65"</code><br /><code>Java(TM) SE Runtime Environment (build 1.7.0_65-b19)</code><br /><code>Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)</code></p></td></tr></tbody></table> - -# Install Drill - -You can install Drill on a machine running Linux, Mac OS X, or Windows. - -## Installing Drill on Linux - -Complete the following steps to install Drill: - - 1. Issue the following command to download the latest, stable version of Apache Drill to a directory on your machine: - - wget http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz - - 2. Issue the following command to create a new directory to which you can extract the contents of the Drill `tar.gz` file: - - sudo mkdir -p /opt/drill - - 3. Navigate to the directory where you downloaded the Drill `tar.gz` file. - - - 4. Issue the following command to extract the contents of the Drill `tar.gz` file: - - sudo tar -xvzf apache-drill-<version>.tar.gz -C /opt/drill - - 5. Issue the following command to navigate to the Drill installation directory: - - cd /opt/drill/apache-drill-<version> - -At this point, you can [start Drill](https://cwiki.apache.org/confluence/displ -ay/DRILL/Apache+Drill+in+10+Minutes#ApacheDrillin10Minutes-StartDrill). - -## Installing Drill on Mac OS X - -Complete the following steps to install Drill: - - 1. Open a Terminal window, and create a `drill` directory inside your home directory (or in some other location if you prefer). - - **Example** - - $ pwd - /Users/max - $ mkdir drill - $ cd drill - $ pwd - /Users/max/drill - - 2. Click the following link to download the latest, stable version of Apache Drill: - - [http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz) - - 3. Open the downloaded `TAR` file with the Mac Archive utility or a similar tool for unzipping files. - - 4. Move the resulting `apache-drill-<version>` folder into the `drill` directory that you created. - - 5. Issue the following command to navigate to the `apache-drill-<version>` directory: - - cd /Users/max/drill/apache-drill-<version> - -At this point, you can [start Drill](https://cwiki.apache.org/confluence/displ -ay/DRILL/Apache+Drill+in+10+Minutes#ApacheDrillin10Minutes-StartDrill). - -## Installing Drill on Windows - -You can install Drill on Windows 7 or 8. To install Drill on Windows, you must -have JDK 7, and you must set the `JAVA_HOME` path in the Windows Environment -Variables. You must also have a utility, such as -[7-zip](http://www.7-zip.org/), installed on your machine. These instructions -assume that the [7-zip](http://www.7-zip.org/) decompression utility is -installed to extract a Drill archive file that you download. - -#### Setting JAVA_HOME - -Complete the following steps to set `JAVA_HOME`: - - 1. Navigate to `Control Panel\All Control Panel Items\System`, and select **Advanced System Settings**. The System Properties window appears. - 2. On the Advanced tab, click **Environment Variables**. The Environment Variables window appears. - 3. Add/Edit `JAVA_HOME` to point to the location where the JDK software is located. - - **Example** - - C:\Program Files\Java\jdk1.7.0_65 - - 4. Click **OK** to exit the windows. - -#### Installing Drill - -Complete the following steps to install Drill: - - 1. Create a `drill` directory on your `C:\` drive, (or in some other location if you prefer). - - **Example** - - C:\drill - - Do not include spaces in your directory path. If you include spaces in the -directory path, Drill fails to run. - - 2. Click the following link to download the latest, stable version of Apache Drill: - - [http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz) - - 3. Move the `apache-drill-<version>.tar.gz` file to the `drill` directory that you created on your `C:\` drive. - - 4. Unzip the `TAR.GZ` file and the resulting `TAR` file. - - 1. Right-click `apache-drill-<version>.tar.gz,` and select` 7-Zip>Extract Here`. The utility extracts the `apache-drill-<version>.tar` file. - 2. Right-click `apache-drill-<version>.tar, `and select`` 7-Zip>Extract Here`. `The utility extracts the` apache-drill-<version> `folder. - 5. Open the `apache-drill-<version> `folder. - - 6. Open the `bin` folder, and double-click on the `sqlline.bat` file. The Windows command prompt opens. - 7. At the `sqlline>` prompt, type `!connect jdbc:drill:zk=local` and then press `Enter`. - 8. Enter the username and password. - a. When prompted, enter the user name `admin` and then press Enter. - b. When prompted, enter the password `admin` and then press Enter. The cursor blinks for a few seconds and then `0: jdbc:drill:zk=local> `displays in the prompt. - -At this point, you can submit queries to Drill. Refer to the [Query Sample Dat -a](https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minute -s#ApacheDrillin10Minutes-QuerySampleData) section of this document. - -# Start Drill - -Launch SQLLine, the Drill shell, to start and run Drill in embedded mode. -Launching SQLLine automatically starts a new Drillbit within the shell. In a -production environment, Drillbits are the daemon processes that run on each -node in a Drill cluster. - -Complete the following steps to launch SQLLine and start Drill: - - 1. Verify that you are in the Drill installation directory. -Example: `~/apache-drill-<version>` - - 2. Issue the following command to launch SQLLine: - - bin/sqlline -u jdbc:drill:zk=local - - `-u` is a JDBC connection string that directs SQLLine to connect to Drill. It -also starts a local Drillbit. If you are connecting to an Apache Drill -cluster, the value of `zk=` would be a list of Zookeeper quorum nodes. For -more information about how to run Drill in clustered mode, go to [Deploying -Apache Drill in a Clustered Environment](/confluence/display/DRILL/Deploying+A -pache+Drill+in+a+Clustered+Environment). - -When SQLLine starts, the system displays the following prompt: -`0: jdbc:drill:zk=local>` - -Issue the following command when you want to exit SQLLine: - - !quit - - -# Query Sample Data - -Your Drill installation includes a `sample-date` directory with JSON and -Parquet files that you can query. The local file system on your machine is -configured as the `dfs` storage plugin instance by default when you install -Drill in embedded mode. For more information about storage plugin -configuration, refer to [Storage Plugin Registration](https://cwiki.apache.org -/confluence/display/DRILL/Connecting+to+Data+Sources#ConnectingtoDataSources- -StoragePluginRegistration). - -Use SQL syntax to query the sample `JSON` and `Parquet` files in the `sample- -data` directory on your local file system. - -### Querying a JSON File - -A sample JSON file, `employee.json`, contains fictitious employee data. - -To view the data in the `employee.json` file, submit the following SQL query -to Drill: - - 0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json`; - -The query returns the following results: - -**Example of partial output** - - +-------------+------------+------------+------------+-------------+-----------+ - | employee_id | full_name | first_name | last_name | position_id | position_ | - +-------------+------------+------------+------------+-------------+-----------+ - | 1101 | Steve Eurich | Steve | Eurich | 16 | Store T | - | 1102 | Mary Pierson | Mary | Pierson | 16 | Store T | - | 1103 | Leo Jones | Leo | Jones | 16 | Store Tem | - | 1104 | Nancy Beatty | Nancy | Beatty | 16 | Store T | - | 1105 | Clara McNight | Clara | McNight | 16 | Store | - | 1106 | Marcella Isaacs | Marcella | Isaacs | 17 | Stor | - | 1107 | Charlotte Yonce | Charlotte | Yonce | 17 | Stor | - | 1108 | Benjamin Foster | Benjamin | Foster | 17 | Stor | - | 1109 | John Reed | John | Reed | 17 | Store Per | - | 1110 | Lynn Kwiatkowski | Lynn | Kwiatkowski | 17 | St | - | 1111 | Donald Vann | Donald | Vann | 17 | Store Pe | - | 1112 | William Smith | William | Smith | 17 | Store | - | 1113 | Amy Hensley | Amy | Hensley | 17 | Store Pe | - | 1114 | Judy Owens | Judy | Owens | 17 | Store Per | - | 1115 | Frederick Castillo | Frederick | Castillo | 17 | S | - | 1116 | Phil Munoz | Phil | Munoz | 17 | Store Per | - | 1117 | Lori Lightfoot | Lori | Lightfoot | 17 | Store | - +-------------+------------+------------+------------+-------------+-----------+ - 1,155 rows selected (0.762 seconds) - 0: jdbc:drill:zk=local> - -### Querying a Parquet File - -Query the `region.parquet` and `nation.parquet` files in the `sample-data` -directory on your local file system. - -#### Region File - -If you followed the Apache Drill in 10 Minutes instructions to install Drill -in embedded mode, the path to the parquet file varies between operating -systems. - -**Note:** When you enter the query, include the version of Drill that you are currently running. - -To view the data in the `region.parquet` file, issue the query appropriate for -your operating system: - -* Linux -`SELECT * FROM dfs.`/opt/drill/apache-drill-<version>/sample- -data/region.parquet`; ` -* Mac OS X -`SELECT * FROM dfs.`/Users/max/drill/apache-drill-<version>/sample- -data/region.parquet`;` -* Windows -`SELECT * FROM dfs.`C:\drill\apache-drill-<version>\sample- -data\region.parquet`;` - -The query returns the following results: - - +------------+------------+ - | EXPR$0 | EXPR$1 | - +------------+------------+ - | AFRICA | lar deposits. blithely final packages cajole. regular waters ar | - | AMERICA | hs use ironic, even requests. s | - | ASIA | ges. thinly even pinto beans ca | - | EUROPE | ly final courts cajole furiously final excuse | - | MIDDLE EAST | uickly special accounts cajole carefully blithely close reques | - +------------+------------+ - 5 rows selected (0.165 seconds) - 0: jdbc:drill:zk=local> - -#### Nation File - -If you followed the Apache Drill in 10 Minutes instructions to install Drill -in embedded mode, the path to the parquet file varies between operating -systems. - -**Note:** When you enter the query, include the version of Drill that you are currently running. - -To view the data in the `nation.parquet` file, issue the query appropriate for -your operating system: - -* Linux - - ``SELECT * FROM dfs.`/opt/drill/apache-drill-<version>/sample- -data/nation.parquet`;`` -* Mac OS X - - ``SELECT * FROM dfs.`/Users/max/drill/apache-drill-<version>/sample- -data/nation.parquet`;`` - -* Windows - - ``SELECT * FROM dfs.`C:\drill\apache-drill-<version>\sample- -data\nation.parquet`;`` - -The query returns the following results: - -# Summary - -Now you know a bit about Apache Drill. To summarize, you have completed the -following tasks: - - * Learned that Apache Drill supports nested data, schema-less execution, and decentralized metadata. - * Downloaded and installed Apache Drill. - * Invoked SQLLine with Drill in embedded mode. - * Queried the sample JSON file, `employee.json`, to view its data. - * Queried the sample `region.parquet` file to view its data. - * Queried the sample `nation.parquet` file to view its data. - -# Next Steps - -Now that you have an idea about what Drill can do, you might want to: - - * [Deploy Drill in a clustered environment.](https://cwiki.apache.org/confluence/display/DRILL/Deploying+Apache+Drill+in+a+Clustered+Environment) - * [Configure storage plugins to connect Drill to your data sources](https://cwiki.apache.org/confluence/display/DRILL/Connecting+to+Data+Sources). - * Query [Hive](https://cwiki.apache.org/confluence/display/DRILL/Connecting+to+Data+Sources#ConnectingtoDataSources-QueryingHiveTables) and [HBase](https://cwiki.apache.org/confluence/display/DRILL/Connecting+to+Data+Sources#ConnectingtoDataSources-QueryingHiveTables) data. - * [Query Complex Data](https://cwiki.apache.org/confluence/display/DRILL/Querying+Complex+Data) - - * [Query Plain Text Files](https://cwiki.apache.org/confluence/display/DRILL/Querying+Plain+Text+Files) - -# More Information - -For more information about Apache Drill, go to [Apache Drill -Wiki](https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Wiki). \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/install/002-deploy.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/install/002-deploy.md b/_docs/drill-docs/install/002-deploy.md deleted file mode 100644 index 49ba68b..0000000 --- a/_docs/drill-docs/install/002-deploy.md +++ /dev/null @@ -1,102 +0,0 @@ ---- -title: "Deploying Apache Drill in a Clustered Environment" -parent: "Install Drill" ---- -# Overview - -To run Drill in a clustered environment, complete the following steps: - - 1. Install Drill on each designated node in the cluster. - 2. Configure a cluster ID and add Zookeeper information. - 3. Connect Drill to your data sources. - 4. Start Drill. - -### Prerequisites - -Before you install Apache Drill on nodes in your cluster, you must have the -following software and services installed: - - * [Oracle JDK version 7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) - * Configured and running ZooKeeper quorum - * Configured and running Hadoop cluster (Recommended) - * DNS (Recommended) - -Installing Drill - -Complete the following steps to install Drill on designated nodes: - - 1. Download the Drill tarball. - - curl http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz - - 2. Issue the following command to create a Drill installation directory and then explode the tarball to the directory: - - mkdir /opt/drill - tar xzf apache-drill-<version>.tar.gz --strip=1 -C /opt/drill - - 3. If you are using external JAR files, edit `drill-env.sh, `located in `/opt/drill/conf/`, and define `HADOOP_HOME:` - - export HADOOP_HOME="~/hadoop/hadoop-0.20.2/" - - 4. In `drill-override.conf,`create a unique Drill `cluster ID`, and provide Zookeeper host names and port numbers to configure a connection to your Zookeeper quorum. - - a. Edit `drill-override.conf `located in `~/drill/drill-<version>/conf/`. - - b. Provide a unique `cluster-id` and the Zookeeper host names and port numbers in `zk.connect`. If you install Drill on multiple nodes, assign the same `cluster ID` to each Drill node so that all Drill nodes share the same ID. The default Zookeeper port is 2181. - -**Example** - - drill.exec: { - cluster-id: "<mydrillcluster>", - zk.connect: "<zkhostname1>:<port>,<zkhostname2>:<port>,<zkhostname3>:<port>", - debug.error_on_leak: false, - buffer.size: 6, - functions: ["org.apache.drill.expr.fn.impl", "org.apache.drill.udfs"] - } - -Connecting Drill to Data Sources - -You can connect Drill to various types of data sources. Refer to [Connect -Apache Drill to Data Sources](https://cwiki.apache.org/confluence/display/DRIL -L/Connecting+to+Data+Sources) to get configuration instructions for the -particular type of data source that you want to connect to Drill. - -## Starting Drill - -Complete the following steps to start Drill: - - 1. Navigate to the Drill installation directory, and issue the following command to start a Drillbit: - - bin/drillbit.sh restart - - 2. Issue the following command to invoke SQLLine and start Drill: - - bin/sqlline -u jdbc:drill: - - When connected, the Drill prompt appears. - Example: - - `0: jdbc:drill:zk=<zk1host>:<port>>` - - If you cannot connect to Drill, invoke SQLLine with the ZooKeeper quorum: - - `bin/sqlline -u jdbc:drill:zk=<zk1host>:<port>,<zk2host>:<port>,<zk3host>:<port>` - - 3. Issue the following query to Drill to verify that all Drillbits have joined the cluster: - - 0: jdbc:drill:zk=<zk1host>:<port>> select * from sys.drillbits; - -Drill provides a list of Drillbits that have joined. - - +------------+------------+--------------+--------------------+ - | host | user_port | control_port | data_port | - +------------+------------+--------------+--------------------+ - | <host address> | <port number>| <port number>| <port number>| - +------------+------------+--------------+--------------------+ - -**Example** - -Now you can query data with Drill. The Drill installation includes sample data -that you can query. Refer to [Query Sample Data](https://cwiki.apache.org/conf -luence/display/DRILL/Apache+Drill+in+10+Minutes#ApacheDrillin10Minutes- -QuerySampleData). \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/install/003-install-embedded.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/install/003-install-embedded.md b/_docs/drill-docs/install/003-install-embedded.md deleted file mode 100644 index eb4fa2a..0000000 --- a/_docs/drill-docs/install/003-install-embedded.md +++ /dev/null @@ -1,30 +0,0 @@ ---- -title: "Installing Drill in Embedded Mode" -parent: "Install Drill" ---- -Installing Drill in embedded mode installs Drill locally on your machine. -Embedded mode is a quick, easy way to install and try Drill without having to -perform any configuration tasks. When you install Drill in embedded mode, the -Drillbit service is installed locally and starts automatically when you invoke -SQLLine, the Drill shell. You can install Drill in embedded mode on a machine -running Linux, Mac OS X, or Windows. - -**Prerequisite:** - -You must have the following software installed on your machine to run Drill: - -<div class="table-wrap"><table class="confluenceTable"><tbody><tr><td class="confluenceTd"><p><strong>Software</strong></p></td><td class="confluenceTd"><p><strong>Description</strong></p></td></tr><tr><td class="confluenceTd"><p><a class="external-link" href="http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html" rel="nofollow">Oracle JDK version 7</a></p></td><td class="confluenceTd"><p>A set of programming tools for developing Java applications.</p></td></tr></tbody></table></div> - - -A set of programming tools for developing Java applications. - -You can run the following command to verify that the system meets the software -prerequisite: - -<div class="table-wrap"><table class="confluenceTable"><tbody><tr><td class="confluenceTd"><p><strong>Command</strong></p></td><td class="confluenceTd"><p><strong>Example Output</strong></p></td></tr><tr><td class="confluenceTd"><p><code>java âversion</code></p></td><td class="confluenceTd"><p><code>java version "1.7.0_65"</code><br /><code>Java(TM) SE Runtime Environment (build 1.7.0_65-b19)</code><br /><code>Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)</code></p></td></tr></tbody></table></div> - -Click on the installation link appropriate for your operating system: - - * [Installing Drill on Linux](/confluence/display/DRILL/Installing+Drill+on+Linux) - * [Installing Drill on Mac OS X](/confluence/display/DRILL/Installing+Drill+on+Mac+OS+X) - * [Installing Drill on Windows](/confluence/display/DRILL/Installing+Drill+on+Windows) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/install/004-install-distributed.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/install/004-install-distributed.md b/_docs/drill-docs/install/004-install-distributed.md deleted file mode 100644 index 3d993e7..0000000 --- a/_docs/drill-docs/install/004-install-distributed.md +++ /dev/null @@ -1,61 +0,0 @@ ---- -title: "Installing Drill in Distributed Mode" -parent: "Install Drill" ---- -You can install Apache Drill in distributed mode on one or multiple nodes to -run it in a clustered environment. - -To install Apache Drill in distributed mode, complete the following steps: - - 1. Install Drill on each designated node in the cluster. - 2. Configure a cluster ID and add Zookeeper information. - 3. Connect Drill to your data sources. - 4. Start Drill. - -**Prerequisites** - -Before you install Apache Drill on nodes in your cluster, you must have the -following software and services installed: - - * [Oracle JDK version 7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) - * Configured and running ZooKeeper quorum - * Configured and running Hadoop cluster (Recommended) - * DNS (Recommended) - -## Installing Drill - -Complete the following steps to install Drill on designated nodes: - - 1. Download the Drill tarball. - - curl http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz - - 2. Issue the following command to create a Drill installation directory and then explode the tarball to the directory: - - mkdir /opt/drill - tar xzf apache-drill-<version>.tar.gz --strip=1 -C /opt/drill - - 3. If you are using external JAR files, edit `drill-env.sh, `located in `/opt/drill/conf/`, and define `HADOOP_HOME:` - - export HADOOP_HOME="~/hadoop/hadoop-0.20.2/" - - 4. In `drill-override.conf,`create a unique Drill `cluster ID`, and provide Zookeeper host names and port numbers to configure a connection to your Zookeeper quorum. - - a. Edit `drill-override.conf `located in `~/drill/drill-<version>/conf/`. - - b. Provide a unique `cluster-id` and the Zookeeper host names and port numbers in `zk.connect`. If you install Drill on multiple nodes, assign the same `cluster ID` to each Drill node so that all Drill nodes share the same ID. The default Zookeeper port is 2181. - - **Example** - - drill.exec:{ - cluster-id: "<mydrillcluster>", - zk.connect: "<zkhostname1>:<port>,<zkhostname2>:<port>,<zkhostname3>:<port>", - debug.error_on_leak: false, - buffer.size: 6, - functions: ["org.apache.drill.expr.fn.impl", "org.apache.drill.udfs"] - } - -You can connect Drill to various types of data sources. Refer to [Connect -Apache Drill to Data Sources](https://cwiki.apache.org/confluence/display/DRIL -L/Connecting+to+Data+Sources) to get configuration instructions for the -particular type of data source that you want to connect to Drill. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/install/install-embedded/001-install-linux.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/install/install-embedded/001-install-linux.md b/_docs/drill-docs/install/install-embedded/001-install-linux.md deleted file mode 100644 index 4dbc3c0..0000000 --- a/_docs/drill-docs/install/install-embedded/001-install-linux.md +++ /dev/null @@ -1,30 +0,0 @@ ---- -title: "Installing Drill on Linux" -parent: "Installing Drill in Embedded Mode" ---- -Complete the following steps to install Apache Drill on a machine running -Linux: - - 1. Issue the following command to download the latest, stable version of Apache Drill to a directory on your machine: - - wget http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz - - - 2. Issue the following command to create a new directory to which you can extract the contents of the Drill `tar.gz` file: - - sudo mkdir -p /opt/drill - - 3. Navigate to the directory where you downloaded the Drill `tar.gz` file. - - - 4. Issue the following command to extract the contents of the Drill `tar.gz` file to the directory you created: - - sudo tar -xvzf apache-drill-<version>.tar.gz -C /opt/drill - - 5. Issue the following command to navigate to the Drill installation directory: - - cd /opt/drill/apache-drill-<version> - -At this point, you can [invoke -SQLLine](/confluence/pages/viewpage.action?pageId=44994063#Starting -/StoppingDrill-invokeSQLLine) to run Drill. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/install/install-embedded/002-install-mac.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/install/install-embedded/002-install-mac.md b/_docs/drill-docs/install/install-embedded/002-install-mac.md deleted file mode 100644 index f20a01d..0000000 --- a/_docs/drill-docs/install/install-embedded/002-install-mac.md +++ /dev/null @@ -1,33 +0,0 @@ ---- -title: "Installing Drill on Mac OS X" -parent: "Installing Drill in Embedded Mode" ---- -Complete the following steps to install Apache Drill on a machine running Mac -OS X: - - 1. Open a Terminal window, and create a `drill` directory inside your home directory (or in some other location if you prefer). - - **Example** - - $ pwd - /Users/max - $ mkdir drill - $ cd drill - $ pwd - /Users/max/drill - - 2. Click the following link to download the latest, stable version of Apache Drill: - - [http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz) - - 3. Open the downloaded `TAR` file with the Mac Archive utility or a similar tool for unzipping files. - - 4. Move the resulting `apache-drill-<version>` folder into the `drill` directory that you created. - - 5. Issue the following command to navigate to the `apache-drill-<version>` directory: - - cd /Users/max/drill/apache-drill-<version> - -At this point, you can [invoke SQLLine](https://cwiki.apache.org/confluence/pa -ges/viewpage.action?pageId=44994063#Starting/StoppingDrill-invokeSQLLine) to -run Drill. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/install/install-embedded/003-install-win.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/install/install-embedded/003-install-win.md b/_docs/drill-docs/install/install-embedded/003-install-win.md deleted file mode 100644 index 285b584..0000000 --- a/_docs/drill-docs/install/install-embedded/003-install-win.md +++ /dev/null @@ -1,57 +0,0 @@ ---- -title: "Installing Drill on Windows" -parent: "Installing Drill in Embedded Mode" ---- -You can install Drill on Windows 7 or 8. To install Drill on Windows, you must -have JDK 7, and you must set the `JAVA_HOME` path in the Windows Environment -Variables. You must also have a utility, such as -[7-zip](http://www.7-zip.org/), installed on your machine. These instructions -assume that the [7-zip](http://www.7-zip.org/) decompression utility is -installed to extract the Drill archive file that you download. - -#### Setting JAVA_HOME - -Complete the following steps to set `JAVA_HOME`: - - 1. Navigate to `Control Panel\All Control Panel Items\System`, and select **Advanced System Settings**. The System Properties window appears. - 2. On the Advanced tab, click **Environment Variables**. The Environment Variables window appears. - 3. Add/Edit `JAVA_HOME` to point to the location where the JDK software is located. - - **Example** - - C:\Program Files\Java\jdk1.7.0_65 - - 4. Click **OK** to exit the windows. - -#### Installing Drill - -Complete the following steps to install Drill: - - 1. Create a `drill` directory on your `C:\` drive, (or in some other location if you prefer). - - **Example** - - Do not include spaces in your directory path. If you include spaces in the -directory path, Drill fails to run. - - 2. Click the following link to download the latest, stable version of Apache Drill: - - [http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache-drill-0.7.0.tar.gz) - - 3. Move the `apache-drill-<version>.tar.gz` file to the `drill` directory that you created on your `C:\` drive. - - 4. Unzip the `TAR.GZ` file and the resulting `TAR` file. - - a. Right-click `apache-drill-<version>.tar.gz,` and select` 7-Zip>Extract Here`. The utility extracts the `apache-drill-<version>.tar` file. - b. Right-click `apache-drill-<version>.tar, `and select`` 7-Zip>Extract Here`. `The utility extracts the` apache-drill-<version> `folder. - 5. Open the `apache-drill-<version> `folder. - - 6. Open the `bin` folder, and double-click on the `sqlline.bat` file. The Windows command prompt opens. - 7. At the `sqlline>` prompt, type `!connect jdbc:drill:zk=local` and then press `Enter`. - 8. Enter the username and password. - a. When prompted, enter the user name `admin` and then press Enter. - b. When prompted, enter the password `admin` and then press Enter. The cursor blinks for a few seconds and then `0: jdbc:drill:zk=local> `displays in the prompt. - -At this point, you can submit queries to Drill. Refer to the [Query Sample Dat -a](https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minute -s#ApacheDrillin10Minutes-QuerySampleData) section of this document. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/manage/001-conf.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/manage/001-conf.md b/_docs/drill-docs/manage/001-conf.md deleted file mode 100644 index 5b68b40..0000000 --- a/_docs/drill-docs/manage/001-conf.md +++ /dev/null @@ -1,20 +0,0 @@ ---- -title: "Configuration Options" -parent: "Manage Drill" ---- -Drill provides several configuration options that you can enable, disable, or -modify. Modifying certain configuration options can impact Drillâs -performance. Many of Drill's configuration options reside in the `drill- -env.sh` and `drill-override.conf` files. Drill stores these files in the -`/conf` directory. Drill sources` /etc/drill/conf` if it exists. Otherwise, -Drill sources the local `<drill_installation_directory>/conf` directory. - -Refer to the following documentation for information about configuration -options that you can modify: - - * [Memory Allocation](/confluence/display/DRILL/Memory+Allocation) - * [Start-Up Options](/confluence/display/DRILL/Start-Up+Options) - * [Planning and Execution Options](/confluence/display/DRILL/Planning+and+Execution+Options) - * [Persistent Configuration Storage](/confluence/display/DRILL/Persistent+Configuration+Storage) - - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/manage/002-start-stop.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/manage/002-start-stop.md b/_docs/drill-docs/manage/002-start-stop.md deleted file mode 100644 index c533bd3..0000000 --- a/_docs/drill-docs/manage/002-start-stop.md +++ /dev/null @@ -1,45 +0,0 @@ ---- -title: "Starting/Stopping Drill" -parent: "Manage Drill" ---- -How you start Drill depends on the installation method you followed. If you -installed Drill in embedded mode, invoking SQLLine automatically starts a -Drillbit locally. If you installed Drill in distributed mode on one or -multiple nodes in a cluster, you must start the Drillbit service and then -invoke SQLLine. Once SQLLine starts, you can issue queries to Drill. - -### Starting a Drillbit - -If you installed Drill in embedded mode, you do not need to start the -Drillbit. - -To start a Drillbit, navigate to the Drill installation directory, and issue -the following command: - -`bin/drillbit.sh restart` - -### Invoking SQLLine/Connecting to a Schema - -SQLLine is used as the Drill shell. SQLLine connects to relational databases -and executes SQL commands. You invoke SQLLine for Drill in embedded or -distributed mode. If you want to connect directly to a particular schema, you -can indicate the schema name when you invoke SQLLine. - -To start SQLLine, issue the appropriate command for your Drill installation -type: - -<div class="table-wrap"><table class="confluenceTable"><tbody><tr><td class="confluenceTd"><p><strong>Drill Install Type</strong></p></td><td class="confluenceTd"><p><strong>Example</strong></p></td><td class="confluenceTd"><p><strong>Command</strong></p></td></tr><tr><td class="confluenceTd"><p>Embedded</p></td><td class="confluenceTd"><p>Drill installed locally (embedded mode);</p><p>Hive with embedded metastore</p></td><td class="confluenceTd"><p>To connect without specifying a schema, navigate to the Drill installation directory and issue the following command:</p><p><code>$ bin/sqlline -u jdbc:drill:zk=local -n admin -p admin </code><span> </span></p><p>Once you are in the prompt, you can issue<code> USE <schema> </code>or you can use absolute notation: <code>schema.table.column.</code></p><p>To connect to a schema directly, issue the command with the schema name:</p><p><code>$ bin/sqlline -u jdbc:drill:schema=<database>;zk=local -n admin -p admin</code></p></td></t r><tr><td class="confluenceTd"><p>Distributed</p></td><td class="confluenceTd"><p>Drill installed in distributed mode;</p><p>Hive with remote metastore;</p><p>HBase</p></td><td class="confluenceTd"><p>To connect without specifying a schema, navigate to the Drill installation directory and issue the following command:</p><p><code>$ bin/sqlline -u jdbc:drill:zk=<zk1host>:<port>,<zk2host>:<port>,<zk3host>:<port> -n admin -p admin</code></p><p>Once you are in the prompt, you can issue<code> USE <schema> </code>or you can use absolute notation: <code>schema.table.column.</code></p><p>To connect to a schema directly, issue the command with the schema name:</p><p><code>$ bin/sqlline -u jdbc:drill:schema=<database>;zk=<zk1host>:<port>,<zk2host>:<port>,<zk3host>:<port> -n admin -p admin</code></p></td></tr></tbody></table></div> - -When SQLLine starts, the system displays the following prompt: - -`0: [jdbc:drill](http://jdbcdrill):schema=<database>;zk=<zkhost>:<port>>` - -At this point, you can use Drill to query your data source or you can discover -metadata. - -### Exiting SQLLine - -To exit SQLLine, issue the following command: - -`!quit` - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/manage/003-ports.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/manage/003-ports.md b/_docs/drill-docs/manage/003-ports.md deleted file mode 100644 index 70539de..0000000 --- a/_docs/drill-docs/manage/003-ports.md +++ /dev/null @@ -1,9 +0,0 @@ ---- -title: "Ports Used by Drill" -parent: "Manage Drill" ---- -The following table provides a list of the ports that Drill uses, the port -type, and a description of how Drill uses the port: - -<div class="table-wrap"><table class="confluenceTable"><tbody><tr><th class="confluenceTh">Port</th><th colspan="1" class="confluenceTh">Type</th><th class="confluenceTh">Description</th></tr><tr><td valign="top" class="confluenceTd">8047</td><td valign="top" colspan="1" class="confluenceTd">TCP</td><td valign="top" class="confluenceTd">Needed for <span style="color: rgb(34,34,34);">the Drill Web UI.</span><span style="color: rgb(34,34,34);"> </span></td></tr><tr><td valign="top" class="confluenceTd">31010</td><td valign="top" colspan="1" class="confluenceTd">TCP</td><td valign="top" class="confluenceTd">User port address. Used between nodes in a Drill cluster. <br />Needed for an external client, such as Tableau, to connect into the<br />cluster nodes. Also needed for the Drill Web UI.</td></tr><tr><td valign="top" class="confluenceTd">31011</td><td valign="top" colspan="1" class="confluenceTd">TCP</td><td valign="top" class="confluenceTd">Control port address. Used between nodes i n a Drill cluster. <br />Needed for multi-node installation of Apache Drill.</td></tr><tr><td valign="top" colspan="1" class="confluenceTd">31012</td><td valign="top" colspan="1" class="confluenceTd">TCP</td><td valign="top" colspan="1" class="confluenceTd">Data port address. Used between nodes in a Drill cluster. <br />Needed for multi-node installation of Apache Drill.</td></tr><tr><td valign="top" colspan="1" class="confluenceTd">46655</td><td valign="top" colspan="1" class="confluenceTd">UDP</td><td valign="top" colspan="1" class="confluenceTd">Used for JGroups and Infinispan. Needed for multi-node installation of Apache Drill.</td></tr></tbody></table></div> -