Author: buildbot
Date: Sun Nov 9 07:16:18 2014
New Revision: 928467
Log:
Staging update by buildbot for drill
Modified:
websites/staging/drill/trunk/content/ (props changed)
websites/staging/drill/trunk/content/drill/download.html
websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html
Propchange: websites/staging/drill/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sun Nov 9 07:16:18 2014
@@ -1 +1 @@
-1637518
+1637630
Modified: websites/staging/drill/trunk/content/drill/download.html
==============================================================================
--- websites/staging/drill/trunk/content/drill/download.html (original)
+++ websites/staging/drill/trunk/content/drill/download.html Sun Nov 9
07:16:18 2014
@@ -67,7 +67,7 @@
<div class="int_text download">
- <h2>The latest release is Drill 0.6.0, released November 7,
2014</h2>
+ <h2>The latest release is Drill 0.6.0, released November 1,
2014</h2>
<br>
<table>
Modified:
websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html
==============================================================================
---
websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html
(original)
+++
websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html
Sun Nov 9 07:16:18 2014
@@ -92,45 +92,39 @@ font-family: Consolas, "Liberation Mono"
<!-- Blog -->
-<p>There are several options available for SQL-on-Hadoop today. What makes
Drill different? </p>
-<p>Here are the top 10 reasons why Drill is a valuable and innovative
technology in your toolset for interactive data exploration on big data</p>
-<div align="center">
-<p><img alt="Apache Drill"
src="https://www.mapr.com/sites/default/files/blogimages/Apache-Drill.png"
style="height:39px; width:551px"></p>
-<p style="margin-left:40px"><img alt="quick and easy ramp up for apache drill"
src="https://www.mapr.com/sites/default/files/blogimages/Quick-Easy-Ramp-Up-2.png"
style="height:329px; width:550px; padding-right:35px"></p>
-</div>
-
-<h2>1. Quick and easy ramp up</h2>
-<p>First and foremost, it takes just minutes to start working with Apache
Drill. Install it on a local Windows or Mac machine and do queries right away -
you don't even need Hadoop.</p><p>Here are three simple steps to run your first
query with Drill.</p>
+<h2>1. Get started in minutes</h2>
+<p>It only takes a couple minutes to start working with Drill. Untar it on
your Mac or Windows laptop and run a query on a local file. No need to set up
any infrastructure. No need to define schemas. Just point at the data and
drill!</p>
<pre>
-
-// Install, launch SQLLine CLI and query a JSON file on local file system
-$ tar -xvf apache-drill-0.5.0-incubating.tar
-
-$ apache-drill-0.5.0-incubating/bin/sqlline -u jdbc:drill:zk=local
-
-0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json` limit 5;
+$ tar -xvf apache-drill-0.6.0-incubating.tar.gz
+$ apache-drill-0.6.0-incubating/bin/sqlline -u jdbc:drill:zk=local
+0: jdbc:drill:zk=local> SELECT * FROM dfs.root.`path/to/employee.json` limit 5;
+-------------+------------------+------------+------------+-------------+----------------------+------------+---------------+-----
| employee_id | full_name | first_name | last_name | position_id |
position_title | store_id | department_id | birt
+-------------+------------------+------------+------------+-------------+----------------------+------------+---------------+------+
-
| 1 | Sheri Nowmer | Sheri | Nowmer | 1 |
President | 0 | 1 | 19
| 2 | Derrick Whelply | Derrick | Whelply | 2 | VP
Country Manager | 0 | 1 |
| 4 | Michael Spence | Michael | Spence | 2 | VP
Country Manager | 0 | 1 |
| 5 | Maya Gutierrez | Maya | Gutierrez | 2 | VP
Country Manager | 0 | 1 |
| 6 | Roberta Damstra | Roberta | Damstra | 3 | VP
Information Systems | 0 | 2 |
+-------------+------------------+------------+------------+-------------+----------------------+------------+---------------+-----
-
</pre>
+<h2>2. Schema-free JSON model</h2>
+<p>Drill is the world's first and only distributed SQL engine that doesn't
require schemas. It shares the same schema-free JSON model as MongoDB and
Elasticsearch. Instead of spending weeks or months defining schemas,
transforming data (ETL) and maintaining those schemas, simply point Drill at
your data (file, directory, HBase table, etc.) and run your queries. Drill
automatically understands the structure of the data. Drill's self-service
approach reduces the burden on IT and increases the productivity and agility of
analysts and developers.</p>
+<h2>3. Query complex, semi-structured data in-situ</h2><p>Drill's schema-free
JSON model allows you to query complex, semi-structured data in situ. No need
to flatten or transform the data prior to or during query execution. Drill also
provides intuitive extensions to SQL to work with nested data. Here's a simple
query on a JSON file demonstrating how to access nested elements and arrays:</p>
+<pre>
+SELECT * FROM (SELECT t.trans_id,
+ t.trans_info.prod_id[0] AS prod_id,
+ t.trans_info.purch_flag AS purchased
+ FROM `clicks/clicks.json` t) sq
+WHERE sq.prod_id BETWEEN 700 AND 750 AND
+ sq.purchased = 'true'
+ORDER BY sq.prod_id;
+</pre>
-
-
-
-
-
-
-<h2>2. Supports ANSI SQL - as you know it</h2><p>Apache Drill is compatible
with ANSI SQL standards. This means that users don't need to learn a new query
language or know the nuances of "SQL Like" to work with Drill or migrate
existing workloads to Drill. </p><p>Drill supports SQL 2003 syntax and
provides all the key SQL data types (such as DATE, INTERVAL, TIMESTAMP,
VARCHAR, DECIMAL) and query constructs (such as correlated sub-queries, joins
in WHERE clause) to provide a smooth and familiar analytics experience.
</p><p>Here is an example of a TPC-H standard query that runs in Drill "as is".
</p>
+<h2>4. Real SQL - not "SQL-like"</h2>
+<p>Drill supports the standard SQL:2003 syntax. No need to learn a new
"SQL-like" language or struggle with a semi-functional BI tool. Drill supports
many data types including DATE, INTERVAL, TIMESTAMP, VARCHAR and DECIMAL, as
well as complex query constructs such as correlated sub-queries and joins in
WHERE clauses. Here is an example of a TPC-H standard query that runs in Drill
"as is":</p>
<pre>
# TPC-H query 4
SELECT o.o_orderpriority, count(*) AS order_count
@@ -146,62 +140,32 @@ WHERE o.o_orderdate >= date '1996-10-01'
ORDER BY o.o_orderpriority;
</pre>
-<h2>3. Works with your BI tools</h2><p>Apache Drill integrates with the BI/SQL
tools such as Tableau, MicroStrategy, Pentaho and Jaspersoft using JDBC/ODBC
drivers. This means that users can now use same BI/Analytics tools they are
deeply familiar with in order to perform proactive business intelligence using
more raw data, up-to-date data and new types of data available in Hadoop/NoSQL
stores at a significantly low cost and rapid time to market. </p><p>Here is a
quick look at the Drill ODBC Driver DSN UI - Drill explorer - a data
exploration environment to understand Drill data and create views along with a
BI visualization using Drill as a data source. </p><p
style="margin-left:40px"><img alt="MapR Drill ODBC Driver DSN Setup"
src="https://www.mapr.com/sites/default/files/blogimages/MapR-Drill-ODBC-Driver-DSN-Setup.png"
style="height:498px; width:450px"></p><p style="margin-left:40px"><img
alt="data exploration enviroment"
src="https://www.mapr.com/sites/default/files/blogimages
/Data-exploration-enviroment.png" style="height:354px; width:600px"></p><p
style="margin-left:40px"><img alt="Tableau example"
src="https://www.mapr.com/sites/default/files/blogimages/Tableau-example.png"
style="height:583px; width:600px"></p><h2>4. Supports self-describing data with
no ETL</h2><p>Self-describing data is where schema is specified as part of the
data itself. File formats such as Parquet, JSON, Protobuf, XML, Avro and NoSQL
databases are all examples of self-describing data. Some of these data formats
are also dynamic and complex in that every record in the data can have its own
set of columns/attributes and each column can be semi-structured/nested.
</p><p>Think about a JSON document with multiple levels of nesting and
optional/repeated elements at each level or a wide HBase table with 100s-1000s
of columns with varying schema across rows. How about third party data that you
are looking to leverage in BI/Analytics, but you have no control on how schemas
will evolve?
</p><p>Drill supports querying self-describing data without defining and
managing any centralized schema definitions in Hive metastore. Schema is
discovered dynamically on the fly when the queries come in. </p><p>Dynamic
schema discovery with no upfront modeling/schema management means that
companies now can eliminate time delays of weeks/months of ETL before data is
available to users for data exploration. Users can get more
up-to-date/real-time data in order to make informed and timely decisions.
</p><p>Here are a few quick examples on querying files and directories using
Drill. </p>
-<pre>
-//clicks.json is a file and logs is a partitioned directory by year & month on
Hadoop
+<h2>5. Leverage standard BI tools</h2>
+<p>Drill works with standard BI tools. You can keep using the tools you love,
such as Tableau, MicroStrategy, QlikView and Excel. No need to introduce yet
another visualization or dashboard tool. Combine a self-service BI tool with
the only self-service SQL engine to enable true self-service data
exploration.</p>
-0: jdbc:drill:> select * from `clicks/clicks.json` limit 2;
-0: jdbc:drill:> select cust_id, dir1 month_no, count(*) month_count from logs
-where dir0=2014 group by cust_id, dir1 order by cust_id, month_no limit 10;
-</pre>
-
-
-
-
-<h2>5. Handles Complex Data Types</h2><p>Drill comes with a flexible JSON-like
data model to natively query and process complex/multi-structured data. The
data doesn't need to be flattened or transformed either at the design time or
runtime providing high performance for queries on complex data. Drill provides
intuitive extensions to SQL to work with nested data using MAP and ARRAY data
types. </p><p>Here is an example indicating how Drill queries a JSON file and
accesses the nested maps and array fields. </p>
+<h2>6. Interactive queries on Hive tables</h2><p>Apache Drill lets you
leverage your investments in Hive. You can run interactive queries with Drill
on your Hive tables and access all Hive input/output formats (including custom
SerDes). You can join tables associated with different Hive metastores, and you
can join a Hive table with an HBase table or a directory of log files. Here's a
simple query in Drill on a Hive table:</p>
<pre>
-// prod_id is an array field in clicks.json file
-
-select * from (select t.trans_id, t.trans_info.prod_id[0] as prodid,
-t.trans_info.purch_flag as purchased
-from `clicks/clicks.json` t) sq
-where sq.prodid between 700 and 750 and sq.purchased='true' order by sq.prodid;
+SELECT `month`, state, sum(order_total) AS sales
+FROM hive.orders
+GROUP BY `month`, state
+ORDER BY 3 DESC LIMIT 5;
</pre>
-<h2>6. Plays Well with Hive</h2><p>Apache Drill lets you reuse investments
made in existing Hive deployments. You can do queries on Hive tables and access
100+ Hive input/output formats (including custom serdes) with no re-work. Drill
serves as a complement to Hive deployments by offering low latency
queries.</p><p>Here is a sample Hive storage plugin configuration looks like in
Drill, followed by a query on a Hive table. </p>
+<h2>7. Access multiple data sources</h2><p>Drill is designed with
extensibility in mind. It provides out-of-the-box connectivity to file systems
(local or distributed file systems such as S3, HDFS and MapR-FS), HBase and
Hive. You can implement a storage plugin to make Drill work with any other data
source. Drill can combine data from multiple data sources on the fly in a
single query, with no centralized metadata definitions. Here's a query that
combines data from a Hive table, an HBase table (view) and a JSON file:</p>
<pre>
-//Storage plugin configuration for Hive
-hive
-
-{
- "type": "hive",
- "enabled": true,
- "configProps": {
- "hive.metastore.uris": "thrift://localhost:9083",
- "hive.metastore.sasl.enabled": "false"
- }
-}
-
-//Query on a Hive table 'orders'
-0: jdbc:drill:> select `month`, state, sum(order_total) as sales from
hive.orders
-group by `month`, state order by 3 desc limit 5;
-
+SELECT custview.membership, sum(orders.order_total) AS sales
+FROM hive.orders, custview, dfs.`clicks/clicks.json` c
+WHERE orders.cust_id = custview.cust_id AND orders.cust_id =
c.user_info.cust_id
+GROUP BY custview.membership
+ORDER BY 2;
</pre>
+<h2>8. User-Defined Functions (UDFs)</h2><p>Drill exposes a simple and
high-performance Java API to build custom functions (UDFs and UDAFs) so that
you can add your own business logic. If you have already built UDFs in Hive,
you can reuse them with Drill with no modifications. Refer to <a
href="https://cwiki.apache.org/confluence/display/DRILL/Develop+Custom+Functions">Developing
Custom Functions</a> for more information.
+</p>
+<h2>9. High performance</h2><p>Drill is designed fround the ground up for high
throughput and low latency. It doesn't use a general purpose execution engine
like MapReduce, Tez or Spark. As a result, Drill is able to deliver its
unparalleled flexibility (schema-free JSON model) without compromising
performance. Drill's optimizer leverages rule- and cost-based techniques, as
well as data locality and operator push-down (the ability to push down query
fragments into the back-end data sources). Drill also provides a columnar and
vectorized execution engine, resulting in higher memory and CPU efficiency.</p>
-<h2>7. Works with Hadoop and Beyond</h2><p>Drill is designed with
extensibility in mind. It provides out-of-the-box connectivity to file systems
(local or distributed file systems such as S3, HDFS, MapR-FS), HBase, or Hive.
The storage plugin interface is extensible to other NoSQL stores (such as
Couchbase, Elasticsearch, MongoDB) or relational databases (such as Postgres,
MySQL, etc.) or your own custom store. Drill can also combine data from all
these data sources in a single query on the fly without any central metadata
definitions.</p><p>Here is an example Drill that combines data from Hive, HBase
and JSON. </p>
-
-<pre>
-// Hive table 'orders', HBase view 'custview' and JSON file 'clicks.json' are
joined together
-
-select custview.membership, sum(orders.order_total)
-as sales from hive.orders, custview,
dfs.`/mapr/demo.mapr.com/data/nested/clicks/clicks.json` c
-where orders.cust_id=custview.cust_id and orders.cust_id=c.user_info.cust_id
-group by custview.membership order by 2;
-</pre>
-
-<h2>8. Ease of UDFs</h2><p>Drill exposes an easy and high performance Java API
to build custom functions (UDFs and UDAFs) and extend SQL for the data and the
business logic that is specific to your organization. If you have already built
UDFs in Hive, you can reuse them with Drill with no modifications. Refer to <a
href="https://cwiki.apache.org/confluence/display/DRILL/Develop+Custom+Functions">Developing
Custom Functions</a> for more information. </p><h2>9. Provides low latency
queries</h2><p>Drill is built from the ground up for short and low-latency
queries on large datasets. Drill doesn't use MapReduce; instead it comes with a
distributed SQL MPP engine to execute queries in parallel on a cluster. Any of
the Drillbits (core service in Drill) is capable of receiving requests from
users. The optimizer in Drill is sophisticated and leverages various rule-
based and cost-based techniques, optimization capabilities of the data sources,
along with data locality to determine the most
efficient query plan and then distribute the execution across multiple nodes
in the cluster. Drill also provides a columnar and vectorized execution engine
to offer high memory and CPU efficiencies along with rapid performance for a
wide variety of analytic queries. </p><h2>10. Supports large
datasets</h2><p>Drill is built to scale to big data needs and is not restricted
by memory available on the cluster nodes. For performance, Drill tries to do
query execution in-memory when possible, using an optimistic/pipelined model
and spills to disk only if the working dataset doesn't fit in memory.
</p><p>For more examples on how to use Drill, download <a
href="https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill">Apache
Drill sandbox</a> and try out the <a
href="http://doc.mapr.com/display/MapR/Apache+Drill+Tutorial">sandbox
tutorial</a>.
+<h2>10. Scales from a single laptop to a 1000-node cluster</h2><p>Drill is
available as a simple download you can run on your laptop. When you're ready to
analyze larger datasets, simply deploy Drill on your Hadoop cluster (up to 1000
commodity servers). Drill leverages the aggregate memory in the cluster to
execute queries using an optimistic pipelined model, and automatically spills
to disk when the working set doesn't fit in memory.</p>.
<!-- Last Line -->
</div>