top-10-reasons-for-using-drill.html

buildbot Sat, 08 Nov 2014 23:17:32 -0800

Author: buildbot
Date: Sun Nov  9 07:16:18 2014
New Revision: 928467

Log:
Staging update by buildbot for drill


Modified:
    websites/staging/drill/trunk/content/   (props changed)
    websites/staging/drill/trunk/content/drill/download.html
    
websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html

Propchange: websites/staging/drill/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sun Nov  9 07:16:18 2014
@@ -1 +1 @@
-1637518
+1637630

Modified: websites/staging/drill/trunk/content/drill/download.html
==============================================================================
--- websites/staging/drill/trunk/content/drill/download.html (original)
+++ websites/staging/drill/trunk/content/drill/download.html Sun Nov  9 
07:16:18 2014
@@ -67,7 +67,7 @@
         
         <div class="int_text download">
                
-            <h2>The latest release is Drill 0.6.0, released November 7, 
2014</h2>
+            <h2>The latest release is Drill 0.6.0, released November 1, 
2014</h2>
           <br>
             
             <table>

Modified: 
websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html
==============================================================================
--- 
websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html 
(original)
+++ 
websites/staging/drill/trunk/content/drill/top-10-reasons-for-using-drill.html 
Sun Nov  9 07:16:18 2014
@@ -92,45 +92,39 @@ font-family: Consolas, "Liberation Mono"
            <!-- Blog -->
                                                
 
-<p>There are several options available for SQL-on-Hadoop today. What makes 
Drill different? </p>
-<p>Here are the top 10 reasons why Drill is a valuable and innovative 
technology in your toolset for interactive data exploration on big data</p>
-<div align="center">
-<p><img alt="Apache Drill" 
src="https://www.mapr.com/sites/default/files/blogimages/Apache-Drill.png"; 
style="height:39px; width:551px"></p>
-<p style="margin-left:40px"><img alt="quick and easy ramp up for apache drill" 
src="https://www.mapr.com/sites/default/files/blogimages/Quick-Easy-Ramp-Up-2.png";
 style="height:329px; width:550px; padding-right:35px"></p>
-</div>
-
-<h2>1. Quick and easy ramp up</h2>
-<p>First and foremost, it takes just minutes to start working with Apache 
Drill. Install it on a local Windows or Mac machine and do queries right away - 
you don't even need Hadoop.</p><p>Here are three simple steps to run your first 
query with Drill.</p>
 
+<h2>1. Get started in minutes</h2>
+<p>It only takes a couple minutes to start working with Drill. Untar it on 
your Mac or Windows laptop and run a query on a local file. No need to set up 
any infrastructure. No need to define schemas. Just point at the data and 
drill!</p>
 <pre>
- 
-// Install, launch SQLLine CLI and query a JSON file on local file system
-$ tar -xvf apache-drill-0.5.0-incubating.tar  
-                 
-$ apache-drill-0.5.0-incubating/bin/sqlline -u jdbc:drill:zk=local
-
-0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json` limit 5;
+$ tar -xvf apache-drill-0.6.0-incubating.tar.gz
+$ apache-drill-0.6.0-incubating/bin/sqlline -u jdbc:drill:zk=local
+0: jdbc:drill:zk=local> SELECT * FROM dfs.root.`path/to/employee.json` limit 5;
 
+-------------+------------------+------------+------------+-------------+----------------------+------------+---------------+-----
 | employee_id | full_name        | first_name | last_name  | position_id | 
position_title       |  store_id  | department_id | birt 
 
+-------------+------------------+------------+------------+-------------+----------------------+------------+---------------+------+
-
 | 1           | Sheri Nowmer     | Sheri      | Nowmer     | 1           | 
President            | 0          | 1             | 19   
 | 2           | Derrick Whelply  | Derrick    | Whelply    | 2           | VP 
Country Manager   | 0          | 1             |
 | 4           | Michael Spence   | Michael    | Spence     | 2           | VP 
Country Manager   | 0          | 1             |
 | 5           | Maya Gutierrez   | Maya       | Gutierrez  | 2           | VP 
Country Manager   | 0          | 1             |
 | 6           | Roberta Damstra  | Roberta    | Damstra    | 3           | VP 
Information Systems | 0        | 2             |
 
+-------------+------------------+------------+------------+-------------+----------------------+------------+---------------+-----
- 
 </pre>
+<h2>2. Schema-free JSON model</h2>
+<p>Drill is the world's first and only distributed SQL engine that doesn't 
require schemas. It shares the same schema-free JSON model as MongoDB and 
Elasticsearch. Instead of spending weeks or months defining schemas, 
transforming data (ETL) and maintaining those schemas, simply point Drill at 
your data (file, directory, HBase table, etc.) and run your queries. Drill 
automatically understands the structure of the data. Drill's self-service 
approach reduces the burden on IT and increases the productivity and agility of 
analysts and developers.</p>
 
+<h2>3. Query complex, semi-structured data in-situ</h2><p>Drill's schema-free 
JSON model allows you to query complex, semi-structured data in situ. No need 
to flatten or transform the data prior to or during query execution. Drill also 
provides intuitive extensions to SQL to work with nested data. Here's a simple 
query on a JSON file demonstrating how to access nested elements and arrays:</p>
+<pre>
+SELECT * FROM (SELECT t.trans_id,
+                      t.trans_info.prod_id[0] AS prod_id,
+                      t.trans_info.purch_flag AS purchased
+               FROM `clicks/clicks.json` t) sq
+WHERE sq.prod_id BETWEEN 700 AND 750 AND
+      sq.purchased = 'true'
+ORDER BY sq.prod_id;
+</pre>
 
-
-
-
-
-
-
-<h2>2. Supports ANSI SQL - as you know it</h2><p>Apache Drill is compatible 
with ANSI SQL standards. This means that users don't need to learn a new query 
language or know the nuances of "SQL Like" to work with Drill or migrate 
existing workloads to Drill.  </p><p>Drill supports SQL 2003 syntax and 
provides all the key SQL data types (such as DATE, INTERVAL, TIMESTAMP, 
VARCHAR, DECIMAL) and query constructs (such as correlated sub-queries, joins 
in WHERE clause) to provide a smooth and familiar analytics experience.  
</p><p>Here is an example of a TPC-H standard query that runs in Drill "as is". 
 </p>
+<h2>4. Real SQL - not "SQL-like"</h2>
+<p>Drill supports the standard SQL:2003 syntax. No need to learn a new 
"SQL-like" language or struggle with a semi-functional BI tool. Drill supports 
many data types including DATE, INTERVAL, TIMESTAMP, VARCHAR and DECIMAL, as 
well as complex query constructs such as correlated sub-queries and joins in 
WHERE clauses. Here is an example of a TPC-H standard query that runs in Drill 
"as is":</p>
 <pre>
 # TPC-H query 4
 SELECT  o.o_orderpriority, count(*) AS order_count
@@ -146,62 +140,32 @@ WHERE o.o_orderdate >= date '1996-10-01'
       ORDER BY o.o_orderpriority;
 </pre>
 
-<h2>3. Works with your BI tools</h2><p>Apache Drill integrates with the BI/SQL 
tools such as Tableau, MicroStrategy, Pentaho and Jaspersoft using JDBC/ODBC 
drivers. This means that users can now use same BI/Analytics tools they are 
deeply familiar with in order to perform proactive business intelligence using 
more raw data, up-to-date data and new types of data available in Hadoop/NoSQL 
stores at a significantly low cost and rapid time to market.  </p><p>Here is a 
quick look at the Drill ODBC Driver DSN UI - Drill explorer - a data 
exploration environment to understand Drill data and create views along with a 
BI visualization using Drill as a data source.  </p><p 
style="margin-left:40px"><img alt="MapR Drill ODBC Driver DSN Setup" 
src="https://www.mapr.com/sites/default/files/blogimages/MapR-Drill-ODBC-Driver-DSN-Setup.png";
 style="height:498px; width:450px"></p><p style="margin-left:40px"><img 
alt="data exploration enviroment" 
src="https://www.mapr.com/sites/default/files/blogimages
 /Data-exploration-enviroment.png" style="height:354px; width:600px"></p><p 
style="margin-left:40px"><img alt="Tableau example" 
src="https://www.mapr.com/sites/default/files/blogimages/Tableau-example.png"; 
style="height:583px; width:600px"></p><h2>4. Supports self-describing data with 
no ETL</h2><p>Self-describing data is where schema is specified as part of the 
data itself. File formats such as Parquet, JSON, Protobuf, XML, Avro and NoSQL 
databases are all examples of self-describing data. Some of these data formats 
are also dynamic and complex in that every record in the data can have its own 
set of columns/attributes and each column can be semi-structured/nested.  
</p><p>Think about a JSON document with multiple levels of nesting and 
optional/repeated elements at each level or a wide HBase table with 100s-1000s 
of columns with varying schema across rows. How about third party data that you 
are looking to leverage in BI/Analytics, but you have no control on how schemas 
will evolve?
   </p><p>Drill supports querying self-describing data without defining and 
managing any centralized schema definitions in Hive metastore. Schema is 
discovered dynamically on the fly when the queries come in.  </p><p>Dynamic 
schema discovery with no upfront modeling/schema management means that 
companies now can eliminate time delays of weeks/months of ETL before data is 
available to users for data exploration. Users can get more 
up-to-date/real-time data in order to make informed and timely decisions.  
</p><p>Here are a few quick examples on querying files and directories using 
Drill.  </p>
-<pre>
-//clicks.json is a file and logs is a partitioned directory by year & month on 
Hadoop
+<h2>5. Leverage standard BI tools</h2>
+<p>Drill works with standard BI tools. You can keep using the tools you love, 
such as Tableau, MicroStrategy, QlikView and Excel. No need to introduce yet 
another visualization or dashboard tool. Combine a self-service BI tool with 
the only self-service SQL engine to enable true self-service data 
exploration.</p>
 
-0: jdbc:drill:> select * from `clicks/clicks.json` limit 2;
-0: jdbc:drill:> select cust_id, dir1 month_no, count(*) month_count from logs 
-where dir0=2014 group by cust_id, dir1 order by cust_id, month_no limit 10;
-</pre>
-
-
-
-
-<h2>5. Handles Complex Data Types</h2><p>Drill comes with a flexible JSON-like 
data model to natively query and process complex/multi-structured data. The 
data doesn't need to be flattened or transformed either at the design time or 
runtime providing high performance for queries on complex data. Drill provides 
intuitive extensions to SQL to work with nested data using MAP and ARRAY data 
types.  </p><p>Here is an example indicating how Drill queries a JSON file and 
accesses the nested maps and array fields.  </p>
+<h2>6. Interactive queries on Hive tables</h2><p>Apache Drill lets you 
leverage your investments in Hive. You can run interactive queries with Drill 
on your Hive tables and access all Hive input/output formats (including custom 
SerDes). You can join tables associated with different Hive metastores, and you 
can join a Hive table with an HBase table or a directory of log files. Here's a 
simple query in Drill on a Hive table:</p>
 <pre>
-// prod_id is an array field in clicks.json file  
-
-select * from (select t.trans_id, t.trans_info.prod_id[0] as prodid,
-t.trans_info.purch_flag as purchased
-from `clicks/clicks.json` t) sq
-where sq.prodid between 700 and 750 and sq.purchased='true' order by sq.prodid;
+SELECT `month`, state, sum(order_total) AS sales
+FROM hive.orders 
+GROUP BY `month`, state
+ORDER BY 3 DESC LIMIT 5;
 </pre>
 
-<h2>6. Plays Well with Hive</h2><p>Apache Drill lets you reuse investments 
made in existing Hive deployments. You can do queries on Hive tables and access 
100+ Hive input/output formats (including custom serdes) with no re-work. Drill 
serves as a complement to Hive deployments by offering low latency 
queries.</p><p>Here is a sample Hive storage plugin configuration looks like in 
Drill, followed by a query on a Hive table.  </p>
+<h2>7. Access multiple data sources</h2><p>Drill is designed with 
extensibility in mind. It provides out-of-the-box connectivity to file systems 
(local or distributed file systems such as S3, HDFS and MapR-FS), HBase and 
Hive. You can implement a storage plugin to make Drill work with any other data 
source. Drill can combine data from multiple data sources on the fly in a 
single query, with no centralized metadata definitions. Here's a query that 
combines data from a Hive table, an HBase table (view) and a JSON file:</p>
 <pre>
-//Storage plugin configuration for Hive
-hive
-
-{
- "type": "hive",
- "enabled": true,
- "configProps": {
-   "hive.metastore.uris": "thrift://localhost:9083",
-   "hive.metastore.sasl.enabled": "false"
- }
-}
-
-//Query on a Hive table 'orders'
-0: jdbc:drill:> select `month`, state, sum(order_total) as sales from 
hive.orders 
-group by `month`, state order by 3 desc limit 5;
-
+SELECT custview.membership, sum(orders.order_total) AS sales
+FROM hive.orders, custview, dfs.`clicks/clicks.json` c 
+WHERE orders.cust_id = custview.cust_id AND orders.cust_id = 
c.user_info.cust_id 
+GROUP BY custview.membership
+ORDER BY 2;
 </pre>
 
+<h2>8. User-Defined Functions (UDFs)</h2><p>Drill exposes a simple and 
high-performance Java API to build custom functions (UDFs and UDAFs) so that 
you can add your own business logic. If you have already built UDFs in Hive, 
you can reuse them with Drill with no modifications. Refer to <a 
href="https://cwiki.apache.org/confluence/display/DRILL/Develop+Custom+Functions";>Developing
 Custom Functions</a> for more information.
+</p>
 
+<h2>9. High performance</h2><p>Drill is designed fround the ground up for high 
throughput and low latency. It doesn't use a general purpose execution engine 
like MapReduce, Tez or Spark. As a result, Drill is able to deliver its 
unparalleled flexibility (schema-free JSON model) without compromising 
performance. Drill's optimizer leverages rule- and cost-based techniques, as 
well as data locality and operator push-down (the ability to push down query 
fragments into the back-end data sources). Drill also provides a columnar and 
vectorized execution engine, resulting in higher memory and CPU efficiency.</p>
 
-<h2>7. Works with Hadoop and Beyond</h2><p>Drill is designed with 
extensibility in mind. It provides out-of-the-box connectivity to file systems 
(local or distributed file systems such as S3, HDFS, MapR-FS), HBase, or Hive. 
The storage plugin interface is extensible to other NoSQL stores (such as 
Couchbase, Elasticsearch, MongoDB) or relational databases (such as Postgres, 
MySQL, etc.) or your own custom store. Drill can also combine data from all 
these data sources in a single query on the fly without any central metadata 
definitions.</p><p>Here is an example Drill that combines data from Hive, HBase 
and JSON. </p>
-
-<pre>
-// Hive table 'orders', HBase view 'custview' and JSON file 'clicks.json' are 
joined together
-
-select custview.membership, sum(orders.order_total) 
-as sales from hive.orders, custview, 
dfs.`/mapr/demo.mapr.com/data/nested/clicks/clicks.json` c 
-where orders.cust_id=custview.cust_id and orders.cust_id=c.user_info.cust_id 
-group by custview.membership order by 2;
-</pre>
-
-<h2>8. Ease of UDFs</h2><p>Drill exposes an easy and high performance Java API 
to build custom functions (UDFs and UDAFs) and extend SQL for the data and the 
business logic that is specific to your organization. If you have already built 
UDFs in Hive, you can reuse them with Drill with no modifications. Refer to <a 
href="https://cwiki.apache.org/confluence/display/DRILL/Develop+Custom+Functions";>Developing
 Custom Functions</a> for more information.  </p><h2>9. Provides low latency 
queries</h2><p>Drill is built from the ground up for short and low-latency 
queries on large datasets. Drill doesn't use MapReduce; instead it comes with a 
distributed SQL MPP engine to execute queries in parallel on a cluster. Any of 
the Drillbits (core service in Drill) is capable of receiving requests from 
users. The optimizer in Drill is sophisticated and leverages various rule- 
based and cost-based techniques, optimization capabilities of the data sources, 
along with data locality to determine the most
  efficient query plan and then distribute the execution across multiple nodes 
in the cluster. Drill also provides a columnar and vectorized execution engine 
to offer high memory and CPU efficiencies along with rapid performance for a 
wide variety of analytic queries.  </p><h2>10. Supports large 
datasets</h2><p>Drill is built to scale to big data needs and is not restricted 
by memory available on the cluster nodes. For performance, Drill tries to do 
query execution in-memory when possible, using an optimistic/pipelined model 
and spills to disk only if the working dataset doesn't fit in memory.  
</p><p>For more examples on how to use Drill, download  <a 
href="https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill";>Apache
 Drill sandbox</a>  and try out the  <a 
href="http://doc.mapr.com/display/MapR/Apache+Drill+Tutorial";>sandbox 
tutorial</a>. 
+<h2>10. Scales from a single laptop to a 1000-node cluster</h2><p>Drill is 
available as a simple download you can run on your laptop. When you're ready to 
analyze larger datasets, simply deploy Drill on your Hadoop cluster (up to 1000 
commodity servers). Drill leverages the aggregate memory in the cluster to 
execute queries using an optimistic pipelined model, and automatically spills 
to disk when the working set doesn't fit in memory.</p>.
                                                
                                                <!-- Last Line -->
             </div>

svn commit: r928467 - in /websites/staging/drill/trunk/content: ./ drill/download.html drill/top-10-reasons-for-using-drill.html

Reply via email to