http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/design/005-value.md ---------------------------------------------------------------------- diff --git a/_docs/design/005-value.md b/_docs/design/005-value.md new file mode 100644 index 0000000..828376a --- /dev/null +++ b/_docs/design/005-value.md @@ -0,0 +1,163 @@ +--- +title: "Value Vectors" +parent: "Design Docs" +--- +This document defines the data structures required for passing sequences of +columnar data between [Operators](https://docs.google.com/a/maprtech.com/document/d/1zaxkcrK9mYyfpGwX1kAV80z0PCi8abefL45zOzb97dI/edit#bookmark=id.iip15ful18mm). + +## Goals + +### Support Operators Written in Multiple Language + +ValueVectors should support operators written in C/C++/Assembly. To support +this, the underlying ByteBuffer will not require modification when passed +through the JNI interface. The ValueVector will be considered immutable once +constructed. Endianness has not yet been considered. + +### Access + +Reading a random element from a ValueVector must be a constant time operation. +To accomodate, elements are identified by their offset from the start of the +buffer. Repeated, nullable and variable width ValueVectors utilize in an +additional fixed width value vector to index each element. Write access is not +supported once the ValueVector has been constructed by the RecordBatch. + +### Efficient Subsets of Value Vectors + +When an operator returns a subset of values from a ValueVector, it should +reuse the original ValueVector. To accomplish this, a level of indirection is +introduced to skip over certain values in the vector. This level of +indirection is a sequence of offsets which reference an offset in the original +ValueVector and the count of subsequent values which are to be included in the +subset. + +### Pooled Allocation + +ValueVectors utilize one or more buffers under the covers. These buffers will +be drawn from a pool. Value vectors are themselves created and destroyed as a +schema changes during the course of record iteration. + +### Homogenous Value Types + +Each value in a Value Vector is of the same type. The [Record Batch](https://docs.google.com/a/maprtech.com/document/d/1zaxkcrK9mYyfpGwX1kAV80z0PCi8abefL45zOzb97dI/edit#bookmark=kix.s2xuoqnr8obe) implementation is responsible for +creating a new Value Vector any time there is a change in schema. + +## Definitions + +Data Types + +The canonical source for value type definitions is the [Drill +Datatypes](http://bit.ly/15JO9bC) document. The individual types are listed +under the âBasic Data Typesâ tab, while the value vector types can be found +under the âValue Vectorsâ tab. + +Operators + +An operator is responsible for transforming a stream of fields. It operates on +Record Batches or constant values. + +Record Batch + +A set of field values for some range of records. The batch may be composed of +Value Vectors, in which case each batch consists of exactly one schema. + +Value Vector + +The value vector is comprised of one or more contiguous buffers; one which +stores a sequence of values, and zero or more which store any metadata +associated with the ValueVector. + +## Data Structure + +A ValueVector stores values in a ByteBuf, which is a contiguous region of +memory. Additional levels of indirection are used to support variable value +widths, nullable values, repeated values and selection vectors. These levels +of indirection are primarily lookup tables which consist of one or more fixed +width ValueVectors which may be combined (e.g. for nullable, variable width +values). A fixed width ValueVector of non-nullable, non-repeatable values does +not require an indirect lookup; elements can be accessed directly by +multiplying position by stride. + +Fixed Width Values + +Fixed width ValueVectors simply contain a packed sequence of values. Random +access is supported by accessing element n at ByteBuf[0] + Index * Stride, +where Index is 0-based. The following illustrates the underlying buffer of +INT4 values [1 .. 6]: + +![drill query flow]({{ site.baseurl }}/docs/img/value1.png) + +Nullable Values + +Nullable values are represented by a vector of bit values. Each bit in the +vector corresponds to an element in the ValueVector. If the bit is not set, +the value is NULL. Otherwise the value is retrieved from the underlying +buffer. The following illustrates a NullableValueVector of INT4 values 2, 3 +and 6: + +![drill query flow]({{ site.baseurl }}/docs/img/value2.png) + +### Repeated Values + +A repeated ValueVector is used for elements which can contain multiple values +(e.g. a JSON array). A table of offset and count pairs is used to represent +each repeated element in the ValueVector. A count of zero means the element +has no values (note the offset field is unused in this case). The following +illustrates three fields; one with two values, one with no values, and one +with a single value: + +![drill query flow]({{ site.baseurl }}/docs/img/value3.png) + +ValueVector Representation of the equivalent JSON: + +x:[1, 2] + +x:[ ] + +x:[3] + +Variable Width Values + +Variable width values are stored contiguously in a ByteBuf. Each element is +represented by an entry in a fixed width ValueVector of offsets. The length of +an entry is deduced by subtracting the offset of the following field. Because +of this, the offset table will always contain one more entry than total +elements, with the last entry pointing to the end of the buffer. + +![drill query flow]({{ site.baseurl }}/docs/img/value4.png) + +Repeated Map Vectors + +A repeated map vector contains one or more maps (akin to an array of objects +in JSON). The values of each field in the map are stored contiguously within a +ByteBuf. To access a specific record, a lookup table of count and offset pairs +is used. This lookup table points to the first repeated field in each column, +while the count indicates the maximum number of elements for the column. The +following example illustrates a RepeatedMap with two records; one with two +objects, and one with a single object: + +![drill query flow]({{ site.baseurl }}/docs/img/value5.png) + +ValueVector representation of the equivalent JSON: + +x: [ {name:âSamâ, age:1}, {name:âMaxâ, age:2} ] + +x: [ {name:âJoeâ, age:3} ] + +Selection Vectors + +A Selection Vector represents a subset of a ValueVector. It is implemented +with a list of offsets which identify each element in the ValueVector to be +included in the SelectionVector. In the case of a fixed width ValueVector, the +offsets reference the underlying ByteBuf. In the case of a nullable, repeated +or variable width ValueVector, the offset references the corresponding lookup +table. The following illustrates a SelectionVector of INT4 (fixed width) +values 2, 3 and 5 from the original vector of [1 .. 6]: + +![drill query flow]({{ site.baseurl }}/docs/img/value6.png) + +The following illustrates the same ValueVector with nullable fields: + +![drill query flow]({{ site.baseurl }}/docs/img/value7.png) + +
http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/dev-custom-fcn/001-dev-simple.md ---------------------------------------------------------------------- diff --git a/_docs/dev-custom-fcn/001-dev-simple.md b/_docs/dev-custom-fcn/001-dev-simple.md new file mode 100644 index 0000000..ebf3831 --- /dev/null +++ b/_docs/dev-custom-fcn/001-dev-simple.md @@ -0,0 +1,50 @@ +--- +title: "Develop a Simple Function" +parent: "Develop Custom Functions" +--- +Create a class within a Java package that implements Drillâs simple interface +into the program, and include the required information for the function type. +Your function must include data types that Drill supports, such as int or +BigInt. For a list of supported data types, refer to the [SQL Reference](/drill/docs/sql-reference). + +Complete the following steps to develop a simple function using Drillâs simple +function interface: + + 1. Create a Maven project and add the following dependency: + + <dependency> + <groupId>org.apache.drill.exec</groupId> + <artifactId>drill-java-exec</artifactId> + <version>1.0.0-m2-incubating-SNAPSHOT</version> + </dependency> + + 2. Create a class that implements the `DrillSimpleFunc` interface and identify the scope as `FunctionScope.SIMPLE`. + + **Example** + + @FunctionTemplate(name = "myaddints", scope = FunctionScope.SIMPLE, nulls = NullHandling.NULL_IF_NULL) + public static class IntIntAdd implements DrillSimpleFunc { + + 3. Provide the variables used in the code in the `Param` and `Output` bit holders. + + **Example** + + @Param IntHolder in1; + @Param IntHolder in2; + @Output IntHolder out; + + 4. Add the code that performs operations for the function in the `eval()` method. + + **Example** + + public void setup(RecordBatch b) { + } + public void eval() { + out.value = (int) (in1.value + in2.value); + } + + 5. Use the maven-source-plugin to compile the sources and classes JAR files. Verify that an empty `drill-module.conf` is included in the resources folder of the JARs. +Drill searches this module during classpath scanning. If the file is not +included in the resources folder, you can add it to the JAR file or add it to +`etc/drill/conf`. + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/dev-custom-fcn/002-dev-aggregate.md ---------------------------------------------------------------------- diff --git a/_docs/dev-custom-fcn/002-dev-aggregate.md b/_docs/dev-custom-fcn/002-dev-aggregate.md new file mode 100644 index 0000000..d1a3cfb --- /dev/null +++ b/_docs/dev-custom-fcn/002-dev-aggregate.md @@ -0,0 +1,55 @@ +--- +title: "Developing an Aggregate Function" +parent: "Develop Custom Functions" +--- +Create a class within a Java package that implements Drillâs aggregate +interface into the program. Include the required information for the function. +Your function must include data types that Drill supports, such as int or +BigInt. For a list of supported data types, refer to the [SQL Reference](/drill/docs/sql-reference). + +Complete the following steps to create an aggregate function: + + 1. Create a Maven project and add the following dependency: + + <dependency> + <groupId>org.apache.drill.exec</groupId> + <artifactId>drill-java-exec</artifactId> + <version>1.0.0-m2-incubating-SNAPSHOT</version> + </dependency> + 2. Create a class that implements the `DrillAggFunc` interface and identify the scope as `FunctionTemplate.FunctionScope.POINT_AGGREGATE`. + + **Example** + + @FunctionTemplate(name = "count", scope = FunctionTemplate.FunctionScope.POINT_AGGREGATE) + public static class BitCount implements DrillAggFunc{ + 3. Provide the variables used in the code in the `Param, Workspace, `and `Output` bit holders. + + **Example** + + @Param BitHolder in; + @Workspace BitHolder value; + @Output BitHolder out; + 4. Include the `setup(), add(), output(),` and `reset()` methods. + + **Example** + public void setup(RecordBatch b) { + value = new BitHolder(); + value.value = 0; + } + + @Override + public void add() { + value.value++; + } + @Override + public void output() { + out.value = value.value; + } + @Override + public void reset() { + + value.value = 0; + 5. Use the maven-source-plugin to compile the sources and classes JAR files. Verify that an empty `drill-module.conf` is included in the resources folder of the JARs. +Drill searches this module during classpath scanning. If the file is not +included in the resources folder, you can add it to the JAR file or add it to +`etc/drill/conf`. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/dev-custom-fcn/003-add-custom.md ---------------------------------------------------------------------- diff --git a/_docs/dev-custom-fcn/003-add-custom.md b/_docs/dev-custom-fcn/003-add-custom.md new file mode 100644 index 0000000..1858e44 --- /dev/null +++ b/_docs/dev-custom-fcn/003-add-custom.md @@ -0,0 +1,26 @@ +--- +title: "Adding Custom Functions to Drill" +parent: "Develop Custom Functions" +--- +After you develop your custom function and generate the sources and classes +JAR files, add both JAR files to the Drill classpath, and include the name of +the package that contains the classes to the main Drill configuration file. +Restart the Drillbit on each node to refresh the configuration. + +To add a custom function to Drill, complete the following steps: + + 1. Add the sources JAR file and the classes JAR file for the custom function to the Drill classpath on all nodes running a Drillbit. To add the JAR files, copy them to `<drill installation directory>/jars/3rdparty`. + 2. On all nodes running a Drillbit, add the name of the package that contains the classes to the main Drill configuration file in the following location: + + <drill installation directory>/conf/drill-override.conf + To add the package, add the package name to + `drill.logical.function.package+=`. Separate package names with a comma. + + **Example** + + drill.logical.function.package+= [âorg.apache.drill.exec.expr.fn.impl","org.apache.drill.udfsâ] + 3. On each Drill node in the cluster, navigate to the Drill installation directory, and issue the following command to restart the Drillbit: + + <drill installation directory>/bin/drillbit.sh restart + + Now you can issue queries with your custom functions to Drill. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/dev-custom-fcn/004-use-custom.md ---------------------------------------------------------------------- diff --git a/_docs/dev-custom-fcn/004-use-custom.md b/_docs/dev-custom-fcn/004-use-custom.md new file mode 100644 index 0000000..6a0245a --- /dev/null +++ b/_docs/dev-custom-fcn/004-use-custom.md @@ -0,0 +1,55 @@ +--- +title: "Using Custom Functions in Queries" +parent: "Develop Custom Functions" +--- +When you issue a query with a custom function to Drill, Drill searches the +classpath for the function that matches the request in the query. Once Drill +locates the function for the request, Drill processes the query and applies +the function during processing. + +Your Drill installation includes sample files in the Drill classpath. One +sample file, `employee.json`, contains some fictitious employee data that you +can query with a custom function. + +## Simple Function Example + +This example uses the `myaddints` simple function in a query on the +`employee.json` file. + +If you issue the following query to Drill, you can see all of the employee +data within the `employee.json` file: + + 0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json`; + +The query returns the following results: + + | employee_id | full_name | first_name | last_name | position_id | position_title | store_id | department_id | birth_da | + +-------------+------------+------------+------------+-------------+----------------+------------+---------------+----------+----------- + | 1101 | Steve Eurich | Steve | Eurich | 16 | Store Temporary Checker | 12 | 16 | + | 1102 | Mary Pierson | Mary | Pierson | 16 | Store Temporary Checker | 12 | 16 | + | 1103 | Leo Jones | Leo | Jones | 16 | Store Temporary Checker | 12 | 16 | + ⦠+ +Since the `postion_id` and `store_id` columns contain integers, you can issue +a query with the `myaddints` custom function on these columns to add the +integers in the columns. + +The following query tells Drill to apply the `myaddints` function to the +`position_id` and `store_id` columns in the `employee.json` file: + + 0: jdbc:drill:zk=local> SELECT myaddints(CAST(position_id AS int),CAST(store_id AS int)) FROM cp.`employee.json`; + +Since JSON files do not store information about data types, you must apply the +`CAST` function in the query to tell Drill that the columns contain integer +values. + +The query returns the following results: + + +------------+ + | EXPR$0 | + +------------+ + | 28 | + | 28 | + | 36 | + +------------+ + ⦠\ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/dev-custom-fcn/005-cust-interface.md ---------------------------------------------------------------------- diff --git a/_docs/dev-custom-fcn/005-cust-interface.md b/_docs/dev-custom-fcn/005-cust-interface.md new file mode 100644 index 0000000..35af922 --- /dev/null +++ b/_docs/dev-custom-fcn/005-cust-interface.md @@ -0,0 +1,8 @@ +--- +title: "Custom Function Interfaces" +parent: "Develop Custom Functions" +--- +Implement the Drill interface appropriate for the type of function that you +want to develop. Each interface provides a set of required holders where you +input data types that your function uses and required methods that Drill calls +to perform your functionâs operations. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/develop/001-compile.md ---------------------------------------------------------------------- diff --git a/_docs/develop/001-compile.md b/_docs/develop/001-compile.md new file mode 100644 index 0000000..2cf6ac9 --- /dev/null +++ b/_docs/develop/001-compile.md @@ -0,0 +1,37 @@ +--- +title: "Compiling Drill from Source" +parent: "Develop Drill" +--- +## Prerequisites + + * Maven 3.0.4 or later + * Oracle JDK 7 or later + +Run the following commands to verify that you have the correct versions of +Maven and JDK installed: + + java -version + mvn -version + +## 1\. Clone the Repository + + git clone https://git-wip-us.apache.org/repos/asf/incubator-drill.git + +## 2\. Compile the Code + + cd incubator-drill + mvn clean install -DskipTests + +## 3\. Explode the Tarball in the Installation Directory + + mkdir ~/compiled-drill + tar xvzf distribution/target/*.tar.gz --strip=1 -C ~/compiled-drill + +Now that you have Drill installed, you can connect to Drill and query sample +data or you can connect Drill to your data sources. + + * To connect Drill to your data sources, refer to [Connect to Data Sources](/drill/docs/connect-to-data-sources) for instructions. + * To connect to Drill and query sample data, refer to the following topics: + * [Start Drill ](/drill/docs/starting-stopping-drill)(For Drill installed in embedded mode) + * [Query Data ](/drill/docs/query-data) + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/develop/002-setup.md ---------------------------------------------------------------------- diff --git a/_docs/develop/002-setup.md b/_docs/develop/002-setup.md new file mode 100644 index 0000000..19fb554 --- /dev/null +++ b/_docs/develop/002-setup.md @@ -0,0 +1,5 @@ +--- +title: "Setting Up Your Development Environment" +parent: "Develop Drill" +--- +TBD \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/develop/003-patch-tool.md ---------------------------------------------------------------------- diff --git a/_docs/develop/003-patch-tool.md b/_docs/develop/003-patch-tool.md new file mode 100644 index 0000000..3ef3fe5 --- /dev/null +++ b/_docs/develop/003-patch-tool.md @@ -0,0 +1,160 @@ +--- +title: "Drill Patch Review Tool" +parent: "Develop Drill" +--- + * Drill JIRA and Reviewboard script + * 1\. Setup + * 2\. Usage + * 3\. Upload patch + * 4\. Update patch + * JIRA command line tool + * 1\. Download the JIRA command line package + * 2\. Configure JIRA username and password + * Reviewboard + * 1\. Install the post-review tool + * 2\. Configure Stuff + * FAQ + * When I run the script, it throws the following error and exits + * When I run the script, it throws the following error and exits + +### Drill JIRA and Reviewboard script + +#### 1\. Setup + + 1. Follow instructions [here](/drill/docs/drill-patch-review-tool#jira-command-line-tool) to setup the jira-python package + 2. Follow instructions [here](/drill/docs/drill-patch-review-tool#reviewboard) to setup the reviewboard python tools + 3. Install the argparse module + + On Linux -> sudo yum install python-argparse + On Mac -> sudo easy_install argparse + +#### 2\. Usage + + nnarkhed-mn: nnarkhed$ python drill-patch-review.py --help + usage: drill-patch-review.py [-h] -b BRANCH -j JIRA [-s SUMMARY] + [-d DESCRIPTION] [-r REVIEWBOARD] [-t TESTING] + [-v VERSION] [-db] -rbu REVIEWBOARDUSER -rbp REVIEWBOARDPASSWORD + + Drill patch review tool + + optional arguments: + -h, --help show this help message and exit + -b BRANCH, --branch BRANCH + Tracking branch to create diff against + -j JIRA, --jira JIRA JIRA corresponding to the reviewboard + -s SUMMARY, --summary SUMMARY + Summary for the reviewboard + -d DESCRIPTION, --description DESCRIPTION + Description for reviewboard + -r REVIEWBOARD, --rb REVIEWBOARD + Review board that needs to be updated + -t TESTING, --testing-done TESTING + Text for the Testing Done section of the reviewboard + -v VERSION, --version VERSION + Version of the patch + -db, --debug Enable debug mode + -rbu, --reviewboard-user Reviewboard user name + -rbp, --reviewboard-password Reviewboard password + +#### 3\. Upload patch + + 1. Specify the branch against which the patch should be created (-b) + 2. Specify the corresponding JIRA (-j) + 3. Specify an **optional** summary (-s) and description (-d) for the reviewboard + +Example: + + python drill-patch-review.py -b origin/master -j DRILL-241 -rbu tnachen -rbp password + +#### 4\. Update patch + + 1. Specify the branch against which the patch should be created (-b) + 2. Specify the corresponding JIRA (--jira) + 3. Specify the rb to be updated (-r) + 4. Specify an **optional** summary (-s) and description (-d) for the reviewboard, if you want to update it + 5. Specify an **optional** version of the patch. This will be appended to the jira to create a file named JIRA-<version>.patch. The purpose is to be able to upload multiple patches to the JIRA. This has no bearing on the reviewboard update. + +Example: + + python drill-patch-review.py -b origin/master -j DRILL-241 -r 14081 rbp tnachen -rbp password + +### JIRA command line tool + +#### 1\. Download the JIRA command line package + +Install the jira-python package. + + sudo easy_install jira-python + +#### 2\. Configure JIRA username and password + +Include a jira.ini file in your $HOME directory that contains your Apache JIRA +username and password. + + nnarkhed-mn:~ nnarkhed$ cat ~/jira.ini + user=nehanarkhede + password=*********** + +### Reviewboard + +This is a quick tutorial on using [Review Board](https://reviews.apache.org) +with Drill. + +#### 1\. Install the post-review tool + +If you are on RHEL, Fedora or CentOS, follow these steps: + + sudo yum install python-setuptools + sudo easy_install -U RBTools + +If you are on Mac, follow these steps: + + sudo easy_install -U setuptools + sudo easy_install -U RBTools + +For other platforms, follow the [instructions](http://www.reviewboard.org/docs/manual/dev/users/tools/post-review/) to +setup the post-review tool. + +#### 2\. Configure Stuff + +Then you need to configure a few things to make it work. + +First set the review board url to use. You can do this from in git: + + git config reviewboard.url https://reviews.apache.org + +If you checked out using the git wip http url that confusingly won't work with +review board. So you need to configure an override to use the non-http url. +You can do this by adding a config file like this: + + jkreps$ cat ~/.reviewboardrc + REPOSITORY = 'git://git.apache.org/incubator-drill.git' + TARGET_GROUPS = 'drill-git' +GUESS_FIELDS = True + + + +### FAQ + +#### When I run the script, it throws the following error and exits + + nnarkhed$python drill-patch-review.py -b trunk -j DRILL-241 + There don't seem to be any diffs + +There are two reasons for this: + + * The code is not checked into your local branch + * The -b branch is not pointing to the remote branch. In the example above, "trunk" is specified as the branch, which is the local branch. The correct value for the -b (--branch) option is the remote branch. "git branch -r" gives the list of the remote branch names. + +#### When I run the script, it throws the following error and exits + +Error uploading diff + +Your review request still exists, but the diff is not attached. + +One of the most common root causes of this error are that the git remote +branches are not up-to-date. Since the script already does that, it is +probably due to some other problem. You can run the script with the --debug +option that will make post-review run in the debug mode and list the root +cause of the issue. + http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/001-arch.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/001-arch.md b/_docs/drill-docs/001-arch.md deleted file mode 100644 index e4b26fc..0000000 --- a/_docs/drill-docs/001-arch.md +++ /dev/null @@ -1,58 +0,0 @@ ---- -title: "Architectural Overview" -parent: "Apache Drill Documentation" ---- -Apache Drill is a low latency distributed query engine for large-scale -datasets, including structured and semi-structured/nested data. Inspired by -Googleâs Dremel, Drill is designed to scale to several thousands of nodes and -query petabytes of data at interactive speeds that BI/Analytics environments -require. - -### High-Level Architecture - -Drill includes a distributed execution environment, purpose built for large- -scale data processing. At the core of Apache Drill is the âDrillbitâ service, -which is responsible for accepting requests from the client, processing the -queries, and returning results to the client. - -A Drillbit service can be installed and run on all of the required nodes in a -Hadoop cluster to form a distributed cluster environment. When a Drillbit runs -on each data node in the cluster, Drill can maximize data locality during -query execution without moving data over the network or between nodes. Drill -uses ZooKeeper to maintain cluster membership and health-check information. - -Though Drill works in a Hadoop cluster environment, Drill is not tied to -Hadoop and can run in any distributed cluster environment. The only pre- -requisite for Drill is Zookeeper. - -### Query Flow in Drill - -The following image represents the flow of a Drill query: - -![](../img/queryFlow.PNG?version=1&modifica -tionDate=1400017845000&api=v2) - -The flow of a Drill query typically involves the following steps: - - 1. The Drill client issues a query. Any Drillbit in the cluster can accept queries from clients. There is no master-slave concept. - 2. The Drillbit then parses the query, optimizes it, and generates an optimized distributed query plan for fast and efficient execution. - 3. The Drillbit that accepts the query becomes the driving Drillbit node for the request. It gets a list of available Drillbit nodes in the cluster from ZooKeeper. The driving Drillbit determines the appropriate nodes to execute various query plan fragments to maximize data locality. - 4. The Drillbit schedules the execution of query fragments on individual nodes according to the execution plan. - 5. The individual nodes finish their execution and return data to the driving Drillbit. - 6. The driving Drillbit returns results back to the client. - -### Drill Clients - -You can access Drill through the following interfaces: - - * Drill shell (SQLLine) - * Drill Web UI - * ODBC - * JDBC - * C++ API - -Click on either of the following links to continue reading about Drill's -architecture: - - * [Core Modules within a Drillbit](/confluence/display/DRILL/Core+Modules+within+a+Drillbit) - * [Architectural Highlights](/confluence/display/DRILL/Architectural+Highlights) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/002-tutorial.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/002-tutorial.md b/_docs/drill-docs/002-tutorial.md deleted file mode 100644 index 597f994..0000000 --- a/_docs/drill-docs/002-tutorial.md +++ /dev/null @@ -1,58 +0,0 @@ ---- -title: "Apache Drill Tutorial" -parent: "Apache Drill Documentation" ---- -This tutorial uses the MapR Sandbox, which is a Hadoop environment pre- -configured with Apache Drill. - -To complete the tutorial on the MapR Sandbox with Apache Drill, work through -the following pages in order: - - * [Installing the Apache Drill Sandbox](/confluence/display/DRILL/Installing+the+Apache+Drill+Sandbox) - * [Getting to Know the Drill Setup](/confluence/display/DRILL/Getting+to+Know+the+Drill+Setup) - * [Lesson 1: Learn About the Data Set](/confluence/display/DRILL/Lesson+1%3A+Learn+About+the+Data+Set) - * [Lesson 2: Run Queries with ANSI SQL](/confluence/display/DRILL/Lesson+2%3A+Run+Queries+with+ANSI+SQL) - * [Lesson 3: Run Queries on Complex Data Types](/confluence/display/DRILL/Lesson+3%3A+Run+Queries+on+Complex+Data+Types) - * [Summary](/confluence/display/DRILL/Summary) - -# About Apache Drill - -Drill is an Apache open-source SQL query engine for Big Data exploration. -Drill is designed from the ground up to support high-performance analysis on -the semi-structured and rapidly evolving data coming from modern Big Data -applications, while still providing the familiarity and ecosystem of ANSI SQL, -the industry-standard query language. Drill provides plug-and-play integration -with existing Apache Hive and Apache HBase deployments.Apache Drill 0.5 offers -the following key features: - - * Low-latency SQL queries - - * Dynamic queries on self-describing data in files (such as JSON, Parquet, text) and MapR-DB/HBase tables, without requiring metadata definitions in the Hive metastore. - - * ANSI SQL - - * Nested data support - - * Integration with Apache Hive (queries on Hive tables and views, support for all Hive file formats and Hive UDFs) - - * BI/SQL tool integration using standard JDBC/ODBC drivers - -# MapR Sandbox with Apache Drill - -MapR includes Apache Drill as part of the Hadoop distribution. The MapR -Sandbox with Apache Drill is a fully functional single-node cluster that can -be used to get an overview on Apache Drill in a Hadoop environment. Business -and technical analysts, product managers, and developers can use the sandbox -environment to get a feel for the power and capabilities of Apache Drill by -performing various types of queries. Once you get a flavor for the technology, -refer to the [Apache Drill web site](http://incubator.apache.org/drill/) and -[Apache Drill documentation -](https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Wiki)for more -details. - -Note that Hadoop is not a prerequisite for Drill and users can start ramping -up with Drill by running SQL queries directly on the local file system. Refer -to [Apache Drill in 10 minutes](https://cwiki.apache.org/confluence/display/DR -ILL/Apache+Drill+in+10+Minutes) for an introduction to using Drill in local -(embedded) mode. - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/003-yelp.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/003-yelp.md b/_docs/drill-docs/003-yelp.md deleted file mode 100644 index b9339ed..0000000 --- a/_docs/drill-docs/003-yelp.md +++ /dev/null @@ -1,402 +0,0 @@ ---- -title: "Analyzing Yelp JSON Data with Apache Drill" -parent: "Apache Drill Documentation" ---- -[Apache Drill](https://www.mapr.com/products/apache-drill) is one of the -fastest growing open source projects, with the community making rapid progress -with monthly releases. The key difference is Drillâs agility and flexibility. -Along with meeting the table stakes for SQL-on-Hadoop, which is to achieve low -latency performance at scale, Drill allows users to analyze the data without -any ETL or up-front schema definitions. The data could be in any file format -such as text, JSON, or Parquet. Data could have simple types such as string, -integer, dates, or more complex multi-structured data, such as nested maps and -arrays. Data can exist in any file system, local or distributed, such as HDFS, -[MapR FS](https://www.mapr.com/blog/comparing-mapr-fs-and-hdfs-nfs-and- -snapshots), or S3. Drill, has a âno schemaâ approach, which enables you to get -value from your data in just a few minutes. - -Letâs quickly walk through the steps required to install Drill and run it -against the Yelp data set. The publicly available data set used for this -example is downloadable from [Yelp](http://www.yelp.com/dataset_challenge) -(business reviews) and is in JSON format. - -## Installing and Starting Drill - -### Step 1: Download Apache Drill onto your local machine - -[http://incubator.apache.org/drill/download/](http://incubator.apache.org/drill/download/) - -You can also [deploy Drill in clustered mode](https://cwiki.apache.org/conflue -nce/display/DRILL/Deploying+Apache+Drill+in+a+Clustered+Environment) if you -want to scale your environment. - -### Step 2 : Open the Drill tar file - -`tar -xvf apache-drill-0.6.0-incubating.tar` - -### Step 3: Launch sqlline, a JDBC application that ships with Drill - -`bin/sqlline -u jdbc:drill:zk=local` - -Thatâs it! You are now ready explore the data. - -Letâs try out some SQL examples to understand how Drill makes the raw data -analysis extremely easy. - -**Note**: You need to substitute your local path to the Yelp data set in the FROM clause of each query you run. - -## Querying Data with Drill - -### **1\. View the contents of the Yelp business data** - -`0: jdbc:drill:zk=local> !set maxwidth 10000` - -``0: jdbc:drill:zk=local> select * from -dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` -limit 1;`` - - +-------------+--------------+------------+------------+------------+------------+--------------+------------+------------+------------+------------+------------+------------+------------+---------------+ - | business_id | full_address | hours | open | categories | city | review_count | name | longitude | state | stars | latitude | attributes | type | neighborhoods | - +-------------+--------------+------------+------------+------------+------------+--------------+------------+------------+------------+------------+------------+------------+------------+---------------+ - | vcNAWiLM4dR7D2nwwJ7nCA | 4840 E Indian School Rd - Ste 101 - Phoenix, AZ 85018 | {"Tuesday":{"close":"17:00","open":"08:00"},"Friday":{"close":"17:00","open":"08:00"},"Monday":{"close":"17:00","open":"08:00"},"Wednesday":{"close":"17:00","open":"08:00"},"Thursday":{"close":"17:00","open":"08:00"},"Sunday":{},"Saturday":{}} | true | ["Doctors","Health & Medical"] | Phoenix | 7 | Eric Goldberg, MD | -111.983758 | AZ | 3.5 | 33.499313 | {"By Appointment Only":true,"Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | business | [] | - +-------------+--------------+------------+------------+------------+------------+--------------+------------+------------+------------+------------+------------+------------+------------+---------------+ - -**Note: **You can directly query self-describing files such as JSON, Parquet, and text. There is no need to create metadata definitions in the Hive metastore. - -### **2\. Explore the business data set further** - -#### Total reviews in the data set - -``0: jdbc:drill:zk=local> select sum(review_count) as totalreviews from -dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` -;`` - - +--------------+ - | totalreviews | - +--------------+ - | 1236445 | - +--------------+ - -#### Top states and cities in total number of reviews - -``0: jdbc:drill:zk=local> select state, city, count(*) totalreviews from -dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` -group by state, city order by count(*) desc limit 10;`` - - +------------+------------+--------------+ - | state | city | totalreviews | - +------------+------------+--------------+ - | NV | Las Vegas | 12021 | - | AZ | Phoenix | 7499 | - | AZ | Scottsdale | 3605 | - | EDH | Edinburgh | 2804 | - | AZ | Mesa | 2041 | - | AZ | Tempe | 2025 | - | NV | Henderson | 1914 | - | AZ | Chandler | 1637 | - | WI | Madison | 1630 | - | AZ | Glendale | 1196 | - +------------+------------+--------------+ - -#### **Average number of reviews per business star rating** - -``0: jdbc:drill:zk=local> select stars,trunc(avg(review_count)) reviewsavg from -dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` -group by stars order by stars desc;`` - - +------------+------------+ - | stars | reviewsavg | - +------------+------------+ - | 5.0 | 8.0 | - | 4.5 | 28.0 | - | 4.0 | 48.0 | - | 3.5 | 35.0 | - | 3.0 | 26.0 | - | 2.5 | 16.0 | - | 2.0 | 11.0 | - | 1.5 | 9.0 | - | 1.0 | 4.0 | - +------------+------------+ - -#### **Top businesses with high review counts (> 1000)** - -``0: jdbc:drill:zk=local> select name, state, city, `review_count` from -dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` -where review_count > 1000 order by `review_count` desc limit 10;`` - - +------------+------------+------------+----------------------------+ - | name | state | city | review_count | - +------------+------------+------------+----------------------------+ - | Mon Ami Gabi | NV | Las Vegas | 4084 | - | Earl of Sandwich | NV | Las Vegas | 3655 | - | Wicked Spoon | NV | Las Vegas | 3408 | - | The Buffet | NV | Las Vegas | 2791 | - | Serendipity 3 | NV | Las Vegas | 2682 | - | Bouchon | NV | Las Vegas | 2419 | - | The Buffet at Bellagio | NV | Las Vegas | 2404 | - | Bacchanal Buffet | NV | Las Vegas | 2369 | - | The Cosmopolitan of Las Vegas | NV | Las Vegas | 2253 | - | Aria Hotel & Casino | NV | Las Vegas | 2224 | - +------------+------------+------------+----------------------------+ - -#### **Saturday open and close times for a few businesses** - -``0: jdbc:drill:zk=local> select b.name, b.hours.Saturday.`open`, -b.hours.Saturday.`close` -from -dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` -b limit 10;`` - - +------------+------------+----------------------------+ - | name | EXPR$1 | EXPR$2 | - +------------+------------+----------------------------+ - | Eric Goldberg, MD | 08:00 | 17:00 | - | Pine Cone Restaurant | null | null | - | Deforest Family Restaurant | 06:00 | 22:00 | - | Culver's | 10:30 | 22:00 | - | Chang Jiang Chinese Kitchen| 11:00 | 22:00 | - | Charter Communications | null | null | - | Air Quality Systems | null | null | - | McFarland Public Library | 09:00 | 20:00 | - | Green Lantern Restaurant | 06:00 | 02:00 | - | Spartan Animal Hospital | 07:30 | 18:00 | - +------------+------------+----------------------------+ - -** **Note how Drill can traverse and refer through multiple levels of nesting. - -### **3\. Get the amenities of each business in the data set** - -Note that the attributes column in the Yelp business data set has a different -element for every row, representing that businesses can have separate -amenities. Drill makes it easy to quickly access data sets with changing -schemas. - -First, change Drill to work in all text mode (so we can take a look at all of -the data). - - 0: jdbc:drill:zk=local> alter system set `store.json.all_text_mode` = true; - +------------+-----------------------------------+ - | ok | summary | - +------------+-----------------------------------+ - | true | store.json.all_text_mode updated. | - +------------+-----------------------------------+ - -Then, query the attributeâs data. - - 0: jdbc:drill:zk=local> select attributes from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` limit 10; - +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | attributes | - +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ - | {"By Appointment Only":"true","Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | - | {"Take-out":"true","Good For":{"dessert":"false","latenight":"false","lunch":"true","dinner":"false","breakfast":"false","brunch":"false"},"Caters":"false","Noise Level":"averag | - | {"Take-out":"true","Good For":{"dessert":"false","latenight":"false","lunch":"false","dinner":"false","breakfast":"false","brunch":"true"},"Caters":"false","Noise Level":"quiet" | - | {"Take-out":"true","Good For":{},"Takes Reservations":"false","Delivery":"false","Ambience":{},"Parking":{"garage":"false","street":"false","validated":"false","lot":"true","val | - | {"Take-out":"true","Good For":{},"Ambience":{},"Parking":{},"Has TV":"false","Outdoor Seating":"false","Attire":"casual","Music":{},"Hair Types Specialized In":{},"Payment Types | - | {"Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | - | {"Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | - | {"Good For":{},"Ambience":{},"Parking":{},"Wi-Fi":"free","Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | - | {"Take-out":"true","Good For":{"dessert":"false","latenight":"false","lunch":"false","dinner":"true","breakfast":"false","brunch":"false"},"Noise Level":"average","Takes Reserva | - | {"Good For":{},"Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | - +------------+ - -Turn off the all text mode so we can continue to perform arithmetic operations -on data. - - 0: jdbc:drill:zk=local> alter system set `store.json.all_text_mode` = false; - +------------+------------+ - | ok | summary | - +------------+------------+ - | true | store.json.all_text_mode updated. | - -**4\. Explore the restaurant businesses in the data set** - -#### **Number of restaurants in the data set**** ** - - 0: jdbc:drill:zk=local> select count(*) as TotalRestaurants from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` where true=repeated_contains(categories,'Restaurants'); - +------------------+ - | TotalRestaurants | - +------------------+ - | 14303 | - +------------------+ - -#### **Top restaurants in number of reviews** - - 0: jdbc:drill:zk=local> select name,state,city,`review_count` from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` where true=repeated_contains(categories,'Restaurants') order by `review_count` desc limit 10 - . . . . . . . . . . . > ; - +------------+------------+------------+--------------+ - | name | state | city | review_count | - +------------+------------+------------+--------------+ - | Mon Ami Gabi | NV | Las Vegas | 4084 | - | Earl of Sandwich | NV | Las Vegas | 3655 | - | Wicked Spoon | NV | Las Vegas | 3408 | - | The Buffet | NV | Las Vegas | 2791 | - | Serendipity 3 | NV | Las Vegas | 2682 | - | Bouchon | NV | Las Vegas | 2419 | - | The Buffet at Bellagio | NV | Las Vegas | 2404 | - | Bacchanal Buffet | NV | Las Vegas | 2369 | - | Hash House A Go Go | NV | Las Vegas | 2201 | - | Mesa Grill | NV | Las Vegas | 2004 | - +------------+------------+------------+--------------+ - -**Top restaurants in number of listed categories** - - 0: jdbc:drill:zk=local> select name,repeated_count(categories) as categorycount, categories from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` where true=repeated_contains(categories,'Restaurants') order by repeated_count(categories) desc limit 10; - +------------+---------------+------------+ - | name | categorycount | categories | - +------------+---------------+------------+ - | Binion's Hotel & Casino | 10 | ["Arts & Entertainment","Restaurants","Bars","Casinos","Event Planning & Services","Lounges","Nightlife","Hotels & Travel","American (N | - | Stage Deli | 10 | ["Arts & Entertainment","Food","Hotels","Desserts","Delis","Casinos","Sandwiches","Hotels & Travel","Restaurants","Event Planning & Services"] | - | Jillian's | 9 | ["Arts & Entertainment","American (Traditional)","Music Venues","Bars","Dance Clubs","Nightlife","Bowling","Active Life","Restaurants"] | - | Hotel Chocolat | 9 | ["Coffee & Tea","Food","Cafes","Chocolatiers & Shops","Specialty Food","Event Planning & Services","Hotels & Travel","Hotels","Restaurants"] | - | Hotel du Vin & Bistro Edinburgh | 9 | ["Modern European","Bars","French","Wine Bars","Event Planning & Services","Nightlife","Hotels & Travel","Hotels","Restaurants" | - | Elixir | 9 | ["Arts & Entertainment","American (Traditional)","Music Venues","Bars","Cocktail Bars","Nightlife","American (New)","Local Flavor","Restaurants"] | - | Tocasierra Spa and Fitness | 8 | ["Beauty & Spas","Gyms","Medical Spas","Health & Medical","Fitness & Instruction","Active Life","Day Spas","Restaurants"] | - | Costa Del Sol At Sunset Station | 8 | ["Steakhouses","Mexican","Seafood","Event Planning & Services","Hotels & Travel","Italian","Restaurants","Hotels"] | - | Scottsdale Silverado Golf Club | 8 | ["Fashion","Shopping","Sporting Goods","Active Life","Golf","American (New)","Sports Wear","Restaurants"] | - | House of Blues | 8 | ["Arts & Entertainment","Music Venues","Restaurants","Hotels","Event Planning & Services","Hotels & Travel","American (New)","Nightlife"] | - +------------+---------------+------------+ - -#### **Top first categories in number of review counts** - - 0: jdbc:drill:zk=local> select categories[0], count(categories[0]) as categorycount from dfs.`/users/nrentachintala/Downloads/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_business.json` group by categories[0] - order by count(categories[0]) desc limit 10; - +------------+---------------+ - | EXPR$0 | categorycount | - +------------+---------------+ - | Food | 4294 | - | Shopping | 1885 | - | Active Life | 1676 | - | Bars | 1366 | - | Local Services | 1351 | - | Mexican | 1284 | - | Hotels & Travel | 1283 | - | Fast Food | 963 | - | Arts & Entertainment | 906 | - | Hair Salons | 901 | - +------------+---------------+ - -**5\. Explore the Yelp reviews dataset and combine with the businesses.**** ** - -#### **Take a look at the contents of the Yelp reviews dataset.**** ** - - 0: jdbc:drill:zk=local> select * from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_review.json` limit 1; - +------------+------------+------------+------------+------------+------------+------------+-------------+ - | votes | user_id | review_id | stars | date | text | type | business_id | - +------------+------------+------------+------------+------------+------------+------------+-------------+ - | {"funny":0,"useful":2,"cool":1} | Xqd0DzHaiyRqVH3WRG7hzg | 15SdjuK7DmYqUAj6rjGowg | 5 | 2007-05-17 | dr. goldberg offers everything i look for in a general practitioner. he's nice and easy to talk to without being patronizing; he's always on time in seeing his patients; he's affiliated with a top-notch hospital (nyu) which my parents have explained to me is very important in case something happens and you need surgery; and you can get referrals to see specialists without having to see him first. really, what more do you need? i'm sitting here trying to think of any complaints i have about him, but i'm really drawing a blank. | review | vcNAWiLM4dR7D2nwwJ7nCA | - +------------+------------+------------+------------+------------+------------+------------+-------------+ - -#### **Top businesses with cool rated reviews** - -Note that we are combining the Yelp business data set that has the overall -review_count to the Yelp review data, which holds additional details on each -of the reviews themselves. - - 0: jdbc:drill:zk=local> Select b.name from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` b where b.business_id in (SELECT r.business_id FROM dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_review.json` r - GROUP BY r.business_id having sum(r.votes.cool) > 2000 order by sum(r.votes.cool) desc); - +------------+ - | name | - +------------+ - | Earl of Sandwich | - | XS Nightclub | - | The Cosmopolitan of Las Vegas | - | Wicked Spoon | - +------------+ - -**Create a view with the combined business and reviews data sets** - -Note that Drill views are lightweight, and can just be created in the local -file system. Drill in standalone mode comes with a dfs.tmp workspace, which we -can use to create views (or you can can define your own workspaces on a local -or distributed file system). If you want to persist the data physically -instead of in a logical view, you can use CREATE TABLE AS SELECT syntax. - - 0: jdbc:drill:zk=local> create or replace view dfs.tmp.businessreviews as Select b.name,b.stars,b.state,b.city,r.votes.funny,r.votes.useful,r.votes.cool, r.`date` from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` b , dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_review.json` r where r.business_id=b.business_id - +------------+------------+ - | ok | summary | - +------------+------------+ - | true | View 'businessreviews' created successfully in 'dfs.tmp' schema | - +------------+------------+ - -Letâs get the total number of records from the view. - - 0: jdbc:drill:zk=local> select count(*) as Total from dfs.tmp.businessreviews; - +------------+ - | Total | - +------------+ - | 1125458 | - +------------+ - -In addition to these queries, you can get many more deeper insights using -Drillâs [SQL functionality](https://cwiki.apache.org/confluence/display/DRILL/ -SQL+Reference). If you are not comfortable with writing queries manually, you -can use a BI/Analytics tools such as Tableau/MicroStrategy to query raw -files/Hive/HBase data or Drill-created views directly using Drill ODBC/JDBC -drivers. - -The goal of Apache Drill is to provide the freedom and flexibility in -exploring data in ways we have never seen before with SQL technologies. The -community is working on more exciting features around nested data and -supporting data with changing schemas in upcoming releases. - -As an example, a new FLATTEN function is in development (an upcoming feature -in 0.7). This function can be used to dynamically rationalize semi-structured -data so you can apply even deeper SQL functionality. Here is a sample query: - -#### **Get a flattened list of categories for each business** - - 0: jdbc:drill:zk=local> select name, flatten(categories) as category from dfs.`/users/nrentachintala/Downloads/yelp/yelp_academic_dataset_business.json` limit 20; - +------------+------------+ - | name | category | - +------------+------------+ - | Eric Goldberg, MD | Doctors | - | Eric Goldberg, MD | Health & Medical | - | Pine Cone Restaurant | Restaurants | - | Deforest Family Restaurant | American (Traditional) | - | Deforest Family Restaurant | Restaurants | - | Culver's | Food | - | Culver's | Ice Cream & Frozen Yogurt | - | Culver's | Fast Food | - | Culver's | Restaurants | - | Chang Jiang Chinese Kitchen | Chinese | - | Chang Jiang Chinese Kitchen | Restaurants | - | Charter Communications | Television Stations | - | Charter Communications | Mass Media | - | Air Quality Systems | Home Services | - | Air Quality Systems | Heating & Air Conditioning/HVAC | - | McFarland Public Library | Libraries | - | McFarland Public Library | Public Services & Government | - | Green Lantern Restaurant | American (Traditional) | - | Green Lantern Restaurant | Restaurants | - | Spartan Animal Hospital | Veterinarians | - +------------+------------+ - -**Top categories used in business reviews** - - 0: jdbc:drill:zk=local> select celltbl.catl, count(celltbl.catl) categorycnt from (select flatten(categories) catl from dfs.`/users/nrentachintala/Downloads/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_business.json` ) celltbl group by celltbl.catl order by count(celltbl.catl) desc limit 10 ; - +------------+-------------+ - | catl | categorycnt | - +------------+-------------+ - | Restaurants | 14303 | - | Shopping | 6428 | - | Food | 5209 | - | Beauty & Spas | 3421 | - | Nightlife | 2870 | - | Bars | 2378 | - | Health & Medical | 2351 | - | Automotive | 2241 | - | Home Services | 1957 | - | Fashion | 1897 | - +------------+-------------+ - -Stay tuned for more features and upcoming activities in the Drill community. - -To learn more about Drill, please refer to the following resources: - - * Download Drill here:<http://incubator.apache.org/drill/download/> - * 10 reasons we think Drill is cool:<http://incubator.apache.org/drill/why-drill/> - * A simple 10-minute tutorial:<https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes> - * A more comprehensive tutorial:<https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+Tutorial> - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/004-install.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/004-install.md b/_docs/drill-docs/004-install.md deleted file mode 100644 index fe7578c..0000000 --- a/_docs/drill-docs/004-install.md +++ /dev/null @@ -1,20 +0,0 @@ ---- -title: "Install Drill" -parent: "Apache Drill Documentation" ---- -You can install Drill in embedded mode or in distributed mode. Installing -Drill in embedded mode does not require any configuration, which means that -you can quickly get started with Drill. If you want to use Drill in a -clustered Hadoop environment, you can install Drill in distributed mode. -Installing in distributed mode requires some configuration, however once you -install you can connect Drill to your Hive, HBase, or distributed file system -data sources and run queries on them. - -Click on any of the following links for more information about how to install -Drill in embedded or distributed mode: - - * [Apache Drill in 10 Minutes](/confluence/display/DRILL/Apache+Drill+in+10+Minutes) - * [Deploying Apache Drill in a Clustered Environment](/confluence/display/DRILL/Deploying+Apache+Drill+in+a+Clustered+Environment) - * [Installing Drill in Embedded Mode](/confluence/display/DRILL/Installing+Drill+in+Embedded+Mode) - * [Installing Drill in Distributed Mode](/confluence/display/DRILL/Installing+Drill+in+Distributed+Mode) - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/005-connect.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/005-connect.md b/_docs/drill-docs/005-connect.md deleted file mode 100644 index 039fc78..0000000 --- a/_docs/drill-docs/005-connect.md +++ /dev/null @@ -1,49 +0,0 @@ ---- -title: "Connect to Data Sources" -parent: "Apache Drill Documentation" ---- -Apache Drill serves as a query layer that connects to data sources through -storage plugins. Drill uses the storage plugins to interact with data sources. -You can think of a storage plugin as a connection between Drill and a data -source. - -The following image represents the storage plugin layer between Drill and a -data source: - -![](../img/storageplugin.png) - -Storage plugins provide the following information to Drill: - - * Metadata available in the underlying data source - * Location of data - * Interfaces that Drill can use to read from and write to data sources - * A set of storage plugin optimization rules that assist with efficient and faster execution of Drill queries, such as pushdowns, statistics, and partition awareness - -Storage plugins perform scanner and writer functions, and inform the metadata -repository of any known metadata, such as: - - * Schema - * File size - * Data ordering - * Secondary indices - * Number of blocks - -Storage plugins inform the execution engine of any native capabilities, such -as predicate pushdown, joins, and SQL. - -Drill provides storage plugins for files and HBase/M7. Drill also integrates -with Hive through a storage plugin. Hive provides a metadata abstraction layer -on top of files and HBase/M7. - -When you run Drill to query files in HBase/M7, Drill can perform direct -queries on the data or go through Hive, if you have metadata defined there. -Drill integrates with the Hive metastore for metadata and also uses a Hive -SerDe for the deserialization of records. Drill does not invoke the Hive -execution engine for any requests. - -For information about how to connect Drill to your data sources, refer to -storage plugin registration: - - * [Storage Plugin Registration](/confluence/display/DRILL/Storage+Plugin+Registration) - * [MongoDB Plugin for Apache Drill](/confluence/display/DRILL/MongoDB+Plugin+for+Apache+Drill) - * [MapR-DB Plugin for Apache Drill](/confluence/display/DRILL/MapR-DB+Plugin+for+Apache+Drill) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/006-query.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/006-query.md b/_docs/drill-docs/006-query.md deleted file mode 100644 index 4b4fda0..0000000 --- a/_docs/drill-docs/006-query.md +++ /dev/null @@ -1,57 +0,0 @@ ---- -title: "Query Data" -parent: "Apache Drill Documentation" ---- -You can query local and distributed file systems, Hive, and HBase data sources -registered with Drill. If you connected directly to a particular schema when -you invoked SQLLine, you can issue SQL queries against that schema. If you did -not indicate a schema when you invoked SQLLine, you can issue the `USE -<schema>` statement to run your queries against a particular schema. After you -issue the `USE` statement, you can use absolute notation, such as -`schema.table.column`. - -Click on any of the following links for information about various data source -queries and examples: - - * [Querying a File System](/confluence/display/DRILL/Querying+a+File+System) - * [Querying HBase](/confluence/display/DRILL/Querying+HBase) - * [Querying Hive](/confluence/display/DRILL/Querying+Hive) - * [Querying Complex Data](/confluence/display/DRILL/Querying+Complex+Data) - * [Querying the INFORMATION_SCHEMA](/confluence/display/DRILL/Querying+the+INFORMATION_SCHEMA) - * [Querying System Tables](/confluence/display/DRILL/Querying+System+Tables) - * [Drill Interfaces](/confluence/display/DRILL/Drill+Interfaces) - -You may need to use casting functions in some queries. For example, you may -have to cast a string `"100"` to an integer in order to apply a math function -or an aggregate function. - -You can use the EXPLAIN command to analyze errors and troubleshoot queries -that do not run. For example, if you run into a casting error, the query plan -text may help you isolate the problem. - - 0: jdbc:drill:zk=local> !set maxwidth 10000 - 0: jdbc:drill:zk=local> explain plan for select ... ; - -The set command increases the default text display (number of characters). By -default, most of the plan output is hidden. - -You may see errors if you try to use non-standard or unsupported SQL syntax in -a query. - -Remember the following tips when querying data with Drill: - - * Include a semicolon at the end of SQL statements, except when you issue a command with an exclamation point `(!). -`Example: `!set maxwidth 10000` - - * Use backticks around file and directory names that contain special characters and also around reserved words when you query a file system . -The following special characters require backticks: - - * . (period) - * / (forward slash) - * _ (underscore) - -Example: ``SELECT * FROM dfs.default.`sample_data/my_sample.json`; `` - - * `CAST` data to `VARCHAR` if an expression in a query returns `VARBINARY` as the result type in order to view the `VARBINARY` types as readable data. If you do not use the `CAST` function, Drill returns the results as byte data. -Example: `CAST (VARBINARY_expr as VARCHAR(50))` - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/006-sql-ref.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/006-sql-ref.md b/_docs/drill-docs/006-sql-ref.md deleted file mode 100644 index 8818ca3..0000000 --- a/_docs/drill-docs/006-sql-ref.md +++ /dev/null @@ -1,25 +0,0 @@ ---- -title: "Develop Custom Functions" -parent: "Apache Drill Documentation" ---- -Drill supports the ANSI standard for SQL. You can use SQL to query your Hive, -HBase, and distributed file system data sources. Drill can discover the form -of the data when you submit a query. You can query text files and nested data -formats, such as JSON and Parquet. Drill provides special operators and -functions that you can use to _drill down _into nested data formats. - -Drill queries do not require information about the data that you are trying to -access, regardless of its source system or its schema and data types. The -sweet spot for Apache Drill is a SQL query workload against "complex data": -data made up of various types of records and fields, rather than data in a -recognizable relational form (discrete rows and columns). - -Refer to the following SQL reference pages for more information: - - * [Data Types](/confluence/display/DRILL/Data+Types) - * [Operators](/confluence/display/DRILL/Operators) - * [SQL Functions](/confluence/display/DRILL/SQL+Functions) - * [Nested Data Functions](/confluence/display/DRILL/Nested+Data+Functions) - * [SQL Commands Summary](/confluence/display/DRILL/SQL+Commands+Summary) - * [Reserved Keywords](/confluence/display/DRILL/Reserved+Keywords) - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/007-dev-custom-func.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/007-dev-custom-func.md b/_docs/drill-docs/007-dev-custom-func.md deleted file mode 100644 index 9bc8e65..0000000 --- a/_docs/drill-docs/007-dev-custom-func.md +++ /dev/null @@ -1,47 +0,0 @@ ---- -title: "Develop Custom Functions" -parent: "Apache Drill Documentation" ---- - -Drill provides a high performance Java API with interfaces that you can -implement to develop simple and aggregate custom functions. Custom functions -are reusable SQL functions that you develop in Java to encapsulate code that -processes column values during a query. Custom functions can perform -calculations and transformations that built-in SQL operators and functions do -not provide. Custom functions are called from within a SQL statement, like a -regular function, and return a single value. - -### Simple Function - -A simple function operates on a single row and produces a single row as the -output. When you include a simple function in a query, the function is called -once for each row in the result set. Mathematical and string functions are -examples of simple functions. - -### Aggregate Function - -Aggregate functions differ from simple functions in the number of rows that -they accept as input. An aggregate function operates on multiple input rows -and produces a single row as output. The COUNT(), MAX(), SUM(), and AVG() -functions are examples of aggregate functions. You can use an aggregate -function in a query with a GROUP BY clause to produce a result set with a -separate aggregate value for each combination of values from the GROUP BY -clause. - -### Process - -To develop custom functions that you can use in your Drill queries, you must -complete the following tasks: - - 1. Create a Java program that implements Drillâs simple or aggregate interface, and compile a sources and a classes JAR file. - 2. Add the sources and classes JAR files to Drillâs classpath. - 3. Add the name of the package that contains the classes to Drillâs main configuration file, drill-override.conf. - -Click on one of the following links to learn how to create custom functions -for Drill: - - * [Developing a Simple Function](/confluence/display/DRILL/Developing+a+Simple+Function) - * [Developing an Aggregate Function](/confluence/display/DRILL/Developing+an+Aggregate+Function) - * [Adding Custom Functions to Drill](/confluence/display/DRILL/Adding+Custom+Functions+to+Drill) - * [Using Custom Functions in Queries](/confluence/display/DRILL/Using+Custom+Functions+in+Queries) - * [Custom Function Interfaces](/confluence/display/DRILL/Custom+Function+Interfaces) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/008-manage.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/008-manage.md b/_docs/drill-docs/008-manage.md deleted file mode 100644 index e629b20..0000000 --- a/_docs/drill-docs/008-manage.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -title: "Manage Drill" -parent: "Apache Drill Documentation" ---- -When using Drill, you may need to stop and restart a Drillbit on a node, or -modify various options. For example, the default storage format for CTAS -statements is Parquet. You can modify the default setting so that output data -is stored in CSV or JSON format. - -You can use certain SQL commands to manage Drill from within the Drill shell -(SQLLine). You can also modify Drill configuration options, such as memory -allocation, in Drill's configuration files. - -Refer to the following documentation for information about managing Drill in -your cluster: - - * [Configuration Options](/confluence/display/DRILL/Configuration+Options) - * [Starting/Stopping Drill](/confluence/pages/viewpage.action?pageId=44994063) - * [Ports Used by Drill](/confluence/display/DRILL/Ports+Used+by+Drill) - * [Partition Pruning](/confluence/display/DRILL/Partition+Pruning) - * [Monitoring and Canceling Queries in the Drill Web UI](/confluence/display/DRILL/Monitoring+and+Canceling+Queries+in+the+Drill+Web+UI) - - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/009-develop.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/009-develop.md b/_docs/drill-docs/009-develop.md deleted file mode 100644 index d95f986..0000000 --- a/_docs/drill-docs/009-develop.md +++ /dev/null @@ -1,16 +0,0 @@ ---- -title: "Develop Drill" -parent: "Apache Drill Documentation" ---- -To develop Drill, you compile Drill from source code and then set up a project -in Eclipse for use as your development environment. To review or contribute to -Drill code, you must complete the steps required to install and use the Drill -patch review tool. - -For information about contributing to the Apache Drill project, you can refer -to the following pages: - - * [Compiling Drill from Source](/confluence/display/DRILL/Compiling+Drill+from+Source) - * [Setting Up Your Development Environment](/confluence/display/DRILL/Setting+Up+Your+Development+Environment) - * [Drill Patch Review Tool](/confluence/display/DRILL/Drill+Patch+Review+Tool) - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/010-rn.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/010-rn.md b/_docs/drill-docs/010-rn.md deleted file mode 100644 index f196714..0000000 --- a/_docs/drill-docs/010-rn.md +++ /dev/null @@ -1,192 +0,0 @@ ---- -title: "Release Notes" -parent: "Apache Drill Documentation" ---- -## Apache Drill 0.7.0 Release Notes - -Apache Drill 0.7.0, the third beta release for Drill, is designed to help -enthusiasts start working and experimenting with Drill. It also continues the -Drill monthly release cycle as we drive towards general availability. - -This release is available as -[binary](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache- -drill-0.7.0.tar.gz) and -[source](http://www.apache.org/dyn/closer.cgi/drill/drill-0.7.0/apache- -drill-0.7.0-src.tar.gz) tarballs that are compiled against Apache Hadoop. -Drill has been tested against MapR, Cloudera, and Hortonworks Hadoop -distributions. There are associated build profiles and JIRAs that can help you -run Drill against your preferred distribution - -Apache Drill 0.7.0 Key Features - - * No more dependency on UDP/Multicast - Making it possible for Drill to work well in the following scenarios: - - * UDP multicast not enabled (as in EC2) - - * Cluster spans multiple subnets - - * Cluster has multihome configuration - - * New functions to natively work with nested data - KVGen and Flatten - - * Support for Hive 0.13 (Hive 0.12 with Drill is not supported any more) - - * Improved performance when querying Hive tables and File system through partition pruning - - * Improved performance for HBase with LIKE operator pushdown - - * Improved memory management - - * Drill web UI monitoring and query profile improvements - - * Ability to parse files without explicit extensions using default storage format specification - - * Fixes for dealing with complex/nested data objects in Parquet/JSON - - * Fast schema return - Improved experience working with BI/query tools by returning metadata quickly - - * Several hang related fixes - - * Parquet writer fixes for handling large datasets - - * Stability improvements in ODBC and JDBC drivers - -Apache Drill 0.7.0 Key Notes and Limitations - - * The current release supports in-memory and beyond-memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. - * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Other operations, such as streaming aggregate, may have partial support that leads to unexpected results. - -## Apache Drill 0.6.0 Release Notes - -Apache Drill 0.6.0, the second beta release for Drill, is designed to help -enthusiasts start working and experimenting with Drill. It also continues the -Drill monthly release cycle as we drive towards general availability. - -This release is available as [binary](http://www.apache.org/dyn/closer.cgi/inc -ubator/drill/drill-0.5.0-incubating/apache-drill-0.5.0-incubating.tar.gz) and -[source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.5.0-incu -bating/apache-drill-0.5.0-incubating-src.tar.gz) tarballs that are compiled -against Apache Hadoop. Drill has been tested against MapR, Cloudera, and -Hortonworks Hadoop distributions. There are associated build profiles and -JIRAs that can help you run Drill against your preferred distribution. - -Apache Drill 0.6.0 Key Features - -This release is primarily a bug fix release, with [more than 30 JIRAs closed]( -https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&vers -ion=12327472), but there are some notable features: - - * Direct ANSI SQL access to MongoDB, using the latest [MongoDB Plugin for Apache Drill](/confluence/display/DRILL/MongoDB+Plugin+for+Apache+Drill) - * Filesystem query performance improvements with partition pruning - * Ability to use the file system as a persistent store for query profiles and diagnostic information - * Window function support (alpha) - -Apache Drill 0.6.0 Key Notes and Limitations - - * The current release supports in-memory and beyond-memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. - * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Other operations, such as streaming aggregate, may have partial support that leads to unexpected results. - -## Apache Drill 0.5.0 Release Notes - -Apache Drill 0.5.0, the first beta release for Drill, is designed to help -enthusiasts start working and experimenting with Drill. It also continues the -Drill monthly release cycle as we drive towards general availability. - -The 0.5.0 release is primarily a bug fix release, with [more than 100 JIRAs](h -ttps://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&versi -on=12324880) closed, but there are some notable features. For information -about the features, see the [Apache Drill Blog for the 0.5.0 -release](https://blogs.apache.org/drill/entry/apache_drill_beta_release_see). - -This release is available as [binary](http://www.apache.org/dyn/closer.cgi/inc -ubator/drill/drill-0.5.0-incubating/apache-drill-0.5.0-incubating.tar.gz) and -[source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.5.0-incu -bating/apache-drill-0.5.0-incubating-src.tar.gz) tarballs that are compiled -against Apache Hadoop. Drill has been tested against MapR, Cloudera, and -Hortonworks Hadoop distributions. There are associated build profiles and -JIRAs that can help you run Drill against your preferred distribution. - -Apache Drill 0.5.0 Key Notes and Limitations - - * The current release supports in memory and beyond memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. - * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Others operations, such as streaming aggregate, may have partial support that leads to unexpected results. - * There are known issues with joining text files without using an intervening view. See [DRILL-1401](https://issues.apache.org/jira/browse/DRILL-1401) for more information. - -## Apache Drill 0.4.0 Release Notes - -The 0.4.0 release is a developer preview release, designed to help enthusiasts -start to work with and experiment with Drill. It is the first Drill release -that provides distributed query execution. - -This release is built upon [more than 800 -JIRAs](https://issues.apache.org/jira/browse/DRILL/fixforversion/12324963/). -It is a pre-beta release on the way towards Drill. As a developer snapshot, -the release contains a large number of outstanding bugs that will make some -use cases challenging. Feel free to consult outstanding issues [targeted for -the 0.5.0 -release](https://issues.apache.org/jira/browse/DRILL/fixforversion/12324880/) -to see whether your use case is affected. - -To read more about this release and new features introduced, please view the -[0.4.0 announcement blog -entry](https://blogs.apache.org/drill/entry/announcing_apache_drill_0_4). - -The release is available as both [binary](http://www.apache.org/dyn/closer.cgi -/incubator/drill/drill-0.4.0-incubating/apache-drill-0.4.0-incubating.tar.gz) -and [source](http://www.apache.org/dyn/closer.cgi/incubator/drill/drill-0.4.0- -incubating/apache-drill-0.4.0-incubating-src.tar.gz) tarballs. In both cases, -these are compiled against Apache Hadoop. Drill has also been tested against -MapR, Cloudera and Hortonworks Hadoop distributions and there are associated -build profiles or JIRAs that can help you run against your preferred -distribution. - -Some Key Notes & Limitations - - * The current release supports in memory and beyond memory execution. However, users must disable memory-intensive hash aggregate and hash join operations to leverage this functionality. - * In many cases,merge join operations return incorrect results. - * Use of a local filter in a join âonâ clause when using left, right or full outer joins may result in incorrect results. - * Because of known memory leaks and memory overrun issues you may need more memory and you may need to restart the system in some cases. - * Some types of complex expressions, especially those involving empty arrays may fail or return incorrect results. - * While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior (such as Sort). Others operations (such as streaming aggregate) may have partial support that leads to unexpected results. - * Protobuf, UDF, query plan interfaces and all interfaces are subject to change in incompatible ways. - * Multiplication of some types of DECIMAL(28+,*) will return incorrect result. - -## Apache Drill M1 -- Release Notes (Apache Drill Alpha) - -### Milestone 1 Goals - -The first release of Apache Drill is designed as a technology preview for -people to better understand the architecture and vision. It is a functional -release tying to piece together the key components of a next generation MPP -query engine. It is designed to allow milestone 2 (M2) to focus on -architectural analysis and performance optimization. - - * Provide a new optimistic DAG execution engine for data analysis - * Build a new columnar shredded in-memory format and execution model that minimizes data serialization/deserialization costs and operator complexity - * Provide a model for runtime generated functions and relational operators that minimizes complexity and maximizes performance - * Support queries against columnar on disk format (Parquet) and JSON - * Support the most common set of standard SQL read-only phrases using ANSI standards. Includes: SELECT, FROM, WHERE, HAVING, ORDER, GROUP BY, IN, DISTINCT, LEFT JOIN, RIGHT JOIN, INNER JOIN - * Support schema-on-read querying and execution - * Build a set of columnar operation primitives including Merge Join, Sort, Streaming Aggregate, Filter, Selection Vector removal. - * Support unlimited level of subqueries and correlated subqueries - * Provided an extensible query-language agnostic JSON-base logical data flow syntax. - * Support complex data type manipulation via logical plan operations - -### Known Issues - -SQL Parsing -Because Apache Drill is built to support late-bound changing schemas while SQL -is statically typed, there are couple of special requirements that are -required writing SQL queries. These are limited to the current release and -will be correct in a future milestone release. - - * All tables are exposed as a single map field that contains - * Drill Alpha doesn't support implicit or explicit casts outside those required above. - * Drill Alpha does not include, there are currently a couple of differences for how to write a query in In order to query against - -UDFs - - * Drill currently supports simple and aggregate functions using scalar, repeated and - * Nested data support incomplete. Drill Alpha supports nested data structures as well repeated fields. However, - * asd - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/011-contribute.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/011-contribute.md b/_docs/drill-docs/011-contribute.md deleted file mode 100644 index 282ab8a..0000000 --- a/_docs/drill-docs/011-contribute.md +++ /dev/null @@ -1,11 +0,0 @@ ---- -title: "Contribute to Drill" -parent: "Apache Drill Documentation" ---- -The Apache Drill community welcomes your support. Please read [Apache Drill -Contribution Guidelines](https://cwiki.apache.org/confluence/display/DRILL/Apa -che+Drill+Contribution+Guidelines) for information about how to contribute to -the project. If you would like to contribute to the project and need some -ideas for what to do, please read [Apache Drill Contribution -Ideas](/confluence/display/DRILL/Apache+Drill+Contribution+Ideas). - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/012-sample-ds.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/012-sample-ds.md b/_docs/drill-docs/012-sample-ds.md deleted file mode 100644 index fe63f6b..0000000 --- a/_docs/drill-docs/012-sample-ds.md +++ /dev/null @@ -1,11 +0,0 @@ ---- -title: "Sample Datasets" -parent: "Apache Drill Documentation" ---- -Use any of the following sample datasets provided to test Drill: - - * [AOL Search](/confluence/display/DRILL/AOL+Search) - * [Enron Emails](/confluence/display/DRILL/Enron+Emails) - * [Wikipedia Edit History](/confluence/display/DRILL/Wikipedia+Edit+History) - - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/013-design.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/013-design.md b/_docs/drill-docs/013-design.md deleted file mode 100644 index 57d73c1..0000000 --- a/_docs/drill-docs/013-design.md +++ /dev/null @@ -1,14 +0,0 @@ ---- -title: "Design Docs" -parent: "Apache Drill Documentation" ---- -Review the Apache Drill design docs for early descriptions of Apache Drill -functionality, terms, and goals, and reference the research articles to learn -about Apache Drill's history: - - * [Drill Plan Syntax](/confluence/display/DRILL/Drill+Plan+Syntax) - * [RPC Overview](/confluence/display/DRILL/RPC+Overview) - * [Query Stages](/confluence/display/DRILL/Query+Stages) - * [Useful Research](/confluence/display/DRILL/Useful+Research) - * [Value Vectors](/confluence/display/DRILL/Value+Vectors) - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/014-progress.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/014-progress.md b/_docs/drill-docs/014-progress.md deleted file mode 100644 index 2a1538c..0000000 --- a/_docs/drill-docs/014-progress.md +++ /dev/null @@ -1,9 +0,0 @@ ---- -title: "Progress Reports" -parent: "Apache Drill Documentation" ---- -Review the following Apache Drill progress reports for a summary of issues, -progression of the project, summary of mailing list discussions, and events: - - * [2014 Q1 Drill Report](/confluence/display/DRILL/2014+Q1+Drill+Report) - http://git-wip-us.apache.org/repos/asf/drill/blob/d959a210/_docs/drill-docs/015-archived-pages.md ---------------------------------------------------------------------- diff --git a/_docs/drill-docs/015-archived-pages.md b/_docs/drill-docs/015-archived-pages.md deleted file mode 100644 index b2a29c3..0000000 --- a/_docs/drill-docs/015-archived-pages.md +++ /dev/null @@ -1,9 +0,0 @@ ---- -title: "Archived Pages" -parent: "Apache Drill Documentation" ---- -The following pages have been archived: - -* How to Run Drill with Sample Data -* Meet Apache Drill -