Re: [PR] [docs] Updating Rollup tutorial (druid)

via GitHub Mon, 22 Jul 2024 17:29:33 -0700


vtlim commented on code in PR #16762:
URL: https://github.com/apache/druid/pull/16762#discussion_r1687264673



##########
docs/tutorials/tutorial-rollup.md:
##########
@@ -24,18 +24,18 @@ sidebar_label: Aggregate data with rollup
   -->
 
 
-Apache Druid can summarize raw data at ingestion time using a process we refer 
to as "rollup". Rollup is a first-level aggregation operation over a selected 
set of columns that reduces the size of stored data.
+Apache Druid&circledR; can summarize raw data at ingestion time using a 
process known as "rollup". Rollup is a first-level aggregation operation over a 
selected set of columns that reduces the size of stored data.

Review Comment:
   ```suggestion
   Apache Druid&circledR; can summarize raw data at ingestion time using a 
process known as "rollup." Rollup is a first-level aggregation operation over a 
selected set of columns that reduces the size of stored data.
   ```



##########
docs/tutorials/tutorial-rollup.md:
##########
@@ -24,18 +24,18 @@ sidebar_label: Aggregate data with rollup
   -->
 
 
-Apache Druid can summarize raw data at ingestion time using a process we refer 
to as "rollup". Rollup is a first-level aggregation operation over a selected 
set of columns that reduces the size of stored data.
+Apache Druid&circledR; can summarize raw data at ingestion time using a 
process known as "rollup". Rollup is a first-level aggregation operation over a 
selected set of columns that reduces the size of stored data.
 
-This tutorial will demonstrate the effects of rollup on an example dataset.
+This tutorial demonstrates the effects of rollup on an example dataset.
 
-For this tutorial, we'll assume you've already downloaded Druid as described in
-the [single-machine quickstart](index.md) and have it running on your local 
machine.
+For this tutorial, you should have Druid downloaded as described in
+the [single-machine quickstart](index.md) and have it running on your local 
machine. The examples in the tutorial use the [multi-stage 
query](../multi-stage-query/index.md) (MSQ) task engine to execute SQL 
statements.
 
-It will also be helpful to have finished [Load a 
file](../tutorials/tutorial-batch.md) and [Query 
data](../tutorials/tutorial-query.md) tutorials.
+It is helpful to have finished [Load a file](../tutorials/tutorial-batch.md) 
and [Query data](../tutorials/tutorial-query.md) tutorials.
 
 ## Example data
 
-For this tutorial, we'll use a small sample of network flow event data, 
representing packet and byte counts for traffic from a source to a destination 
IP address that occurred within a particular second.
+For this tutorial, you use a small sample of network flow event data, 
representing packet and byte counts for traffic from a source to a destination 
IP address that occurred within a particular second.

Review Comment:
   ```suggestion
   For this tutorial, you use a small sample of network flow event data, 
representing IP traffic.
   The data contains packet and byte counts from a source IP address to a 
destination IP address.
   ```



##########
docs/tutorials/tutorial-rollup.md:
##########
@@ -49,150 +49,101 @@ For this tutorial, we'll use a small sample of network 
flow event data, represen
 {"timestamp":"2018-01-02T21:35:45Z","srcIP":"7.7.7.7", 
"dstIP":"8.8.8.8","packets":12,"bytes":2818}
 ```
 
-A file containing this sample input data is located at 
`quickstart/tutorial/rollup-data.json`.
+The tutorial guides you through how to ingest this data using rollup.
 
-We'll ingest this data using the following ingestion task spec, located at 
`quickstart/tutorial/rollup-index.json`.
+## Load the example data
 
-```json
-{
-  "type" : "index_parallel",
-  "spec" : {
-    "dataSchema" : {
-      "dataSource" : "rollup-tutorial",
-      "dimensionsSpec" : {
-        "dimensions" : [
-          "srcIP",
-          "dstIP"
-        ]
-      },
-      "timestampSpec": {
-        "column": "timestamp",
-        "format": "iso"
-      },
-      "metricsSpec" : [
-        { "type" : "count", "name" : "count" },
-        { "type" : "longSum", "name" : "packets", "fieldName" : "packets" },
-        { "type" : "longSum", "name" : "bytes", "fieldName" : "bytes" }
-      ],
-      "granularitySpec" : {
-        "type" : "uniform",
-        "segmentGranularity" : "week",
-        "queryGranularity" : "minute",
-        "intervals" : ["2018-01-01/2018-01-03"],
-        "rollup" : true
-      }
-    },
-    "ioConfig" : {
-      "type" : "index_parallel",
-      "inputSource" : {
-        "type" : "local",
-        "baseDir" : "quickstart/tutorial",
-        "filter" : "rollup-data.json"
-      },
-      "inputFormat" : {
-        "type" : "json"
-      },
-      "appendToExisting" : false
-    },
-    "tuningConfig" : {
-      "type" : "index_parallel",
-      "partitionsSpec": {
-        "type": "dynamic"
-      },
-      "maxRowsInMemory" : 25000
-    }
-  }
-}
+Load the sample dataset using INSERT and EXTERN functions. The EXTERN function 
lets you read external data or write to an external location.
+
+In the Druid web console, go to the Query view and run the following query:
+
+```sql
+INSERT INTO "rollup_tutorial"
+WITH "inline_data" AS (
+  SELECT *
+  FROM TABLE(EXTERN('{
+    "type":"inline",
+    
"data":"{\"timestamp\":\"2018-01-01T01:01:35Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":20,\"bytes\":9024}\n{\"timestamp\":\"2018-01-01T01:02:14Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":38,\"bytes\":6289}\n{\"timestamp\":\"2018-01-01T01:01:59Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":11,\"bytes\":5780}\n{\"timestamp\":\"2018-01-01T01:01:51Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":255,\"bytes\":21133}\n{\"timestamp\":\"2018-01-01T01:02:29Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":377,\"bytes\":359971}\n{\"timestamp\":\"2018-01-01T01:03:29Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":49,\"bytes\":10204}\n{\"timestamp\":\"2018-01-02T21:33:14Z\",\"srcIP\":\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":38,\"bytes\":6289}\n{\"timestamp\":\"2018-01-02T21:33:45Z\",\"srcIP\":\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":123,\"bytes\":93999}\n{\"timestamp\":\"2018-01-02T21:35:45Z\",\"srcIP\"
 :\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":12,\"bytes\":2818}"}', 
+    '{"type":"json"}')) 
+    EXTEND ("timestamp" VARCHAR, "srcIP" VARCHAR, "dstIP" VARCHAR, "packets" 
BIGINT, "bytes" BIGINT)
+)
+SELECT
+  FLOOR(TIME_PARSE("timestamp") TO MINUTE) AS __time,
+  "srcIP",
+  "dstIP",
+  SUM("bytes") AS "bytes",
+  COUNT(*) AS "count",
+  SUM("packets") AS "packets"
+FROM "inline_data"
+GROUP BY 1, 2, 3
+PARTITIONED BY DAY
 ```
 
-Rollup has been enabled by setting `"rollup" : true` in the `granularitySpec`.
+Note that the query uses the `FLOOR` function to give the `__time` a 
granularity of `MINUTE`. The query defines the dimensions of the rollup by 
grouping columns 1, 2, and 3, which corresponds to the `timestamp`, `srcIP`, 
and `dstIP` columns. The query defines the metrics of the rollup by aggregating 
the `bytes` and `packets` columns.
 
-Note that we have `srcIP` and `dstIP` defined as dimensions, a longSum metric 
is defined for the `packets` and `bytes` columns, and the `queryGranularity` 
has been defined as `minute`.
+After the ingestion completes, you can query the data.
 
-We will see how these definitions are used after we load this data.
-
-## Load the example data
+## Query the example data
 
-From the apache-druid-{{DRUIDVERSION}} package root, run the following command:
+Open a new tab in the Query view and run the following query to see what data 
was ingested:

Review Comment:
   * Specify that this is the web console (you can also add a link to it)
   * Query should be in bold since it's the name of the view



##########
docs/tutorials/tutorial-rollup.md:
##########
@@ -49,150 +49,101 @@ For this tutorial, we'll use a small sample of network 
flow event data, represen
 {"timestamp":"2018-01-02T21:35:45Z","srcIP":"7.7.7.7", 
"dstIP":"8.8.8.8","packets":12,"bytes":2818}
 ```
 
-A file containing this sample input data is located at 
`quickstart/tutorial/rollup-data.json`.
+The tutorial guides you through how to ingest this data using rollup.
 
-We'll ingest this data using the following ingestion task spec, located at 
`quickstart/tutorial/rollup-index.json`.
+## Load the example data
 
-```json
-{
-  "type" : "index_parallel",
-  "spec" : {
-    "dataSchema" : {
-      "dataSource" : "rollup-tutorial",
-      "dimensionsSpec" : {
-        "dimensions" : [
-          "srcIP",
-          "dstIP"
-        ]
-      },
-      "timestampSpec": {
-        "column": "timestamp",
-        "format": "iso"
-      },
-      "metricsSpec" : [
-        { "type" : "count", "name" : "count" },
-        { "type" : "longSum", "name" : "packets", "fieldName" : "packets" },
-        { "type" : "longSum", "name" : "bytes", "fieldName" : "bytes" }
-      ],
-      "granularitySpec" : {
-        "type" : "uniform",
-        "segmentGranularity" : "week",
-        "queryGranularity" : "minute",
-        "intervals" : ["2018-01-01/2018-01-03"],
-        "rollup" : true
-      }
-    },
-    "ioConfig" : {
-      "type" : "index_parallel",
-      "inputSource" : {
-        "type" : "local",
-        "baseDir" : "quickstart/tutorial",
-        "filter" : "rollup-data.json"
-      },
-      "inputFormat" : {
-        "type" : "json"
-      },
-      "appendToExisting" : false
-    },
-    "tuningConfig" : {
-      "type" : "index_parallel",
-      "partitionsSpec": {
-        "type": "dynamic"
-      },
-      "maxRowsInMemory" : 25000
-    }
-  }
-}
+Load the sample dataset using INSERT and EXTERN functions. The EXTERN function 
lets you read external data or write to an external location.

Review Comment:
   A few things:
   * INSERT isn't a function
   * The description for EXTERN isn't applicable to this tutorial
   
   ```suggestion
   Load the sample dataset using the EXTERN function to read data provided 
inline with the query.
   ```



##########
docs/tutorials/tutorial-rollup.md:
##########
@@ -24,18 +24,18 @@ sidebar_label: Aggregate data with rollup
   -->
 
 
-Apache Druid can summarize raw data at ingestion time using a process we refer 
to as "rollup". Rollup is a first-level aggregation operation over a selected 
set of columns that reduces the size of stored data.
+Apache Druid&circledR; can summarize raw data at ingestion time using a 
process known as "rollup". Rollup is a first-level aggregation operation over a 
selected set of columns that reduces the size of stored data.

Review Comment:
   Can you add a link to the actual rollup docs? Either link it from existing 
text, or add a new sentence ("For more information...")
   https://druid.apache.org/docs/latest/multi-stage-query/concepts#rollup
   https://druid.apache.org/docs/latest/ingestion/rollup



##########
docs/tutorials/tutorial-rollup.md:
##########
@@ -49,150 +49,101 @@ For this tutorial, we'll use a small sample of network 
flow event data, represen
 {"timestamp":"2018-01-02T21:35:45Z","srcIP":"7.7.7.7", 
"dstIP":"8.8.8.8","packets":12,"bytes":2818}
 ```
 
-A file containing this sample input data is located at 
`quickstart/tutorial/rollup-data.json`.
+The tutorial guides you through how to ingest this data using rollup.
 
-We'll ingest this data using the following ingestion task spec, located at 
`quickstart/tutorial/rollup-index.json`.
+## Load the example data
 
-```json
-{
-  "type" : "index_parallel",
-  "spec" : {
-    "dataSchema" : {
-      "dataSource" : "rollup-tutorial",
-      "dimensionsSpec" : {
-        "dimensions" : [
-          "srcIP",
-          "dstIP"
-        ]
-      },
-      "timestampSpec": {
-        "column": "timestamp",
-        "format": "iso"
-      },
-      "metricsSpec" : [
-        { "type" : "count", "name" : "count" },
-        { "type" : "longSum", "name" : "packets", "fieldName" : "packets" },
-        { "type" : "longSum", "name" : "bytes", "fieldName" : "bytes" }
-      ],
-      "granularitySpec" : {
-        "type" : "uniform",
-        "segmentGranularity" : "week",
-        "queryGranularity" : "minute",
-        "intervals" : ["2018-01-01/2018-01-03"],
-        "rollup" : true
-      }
-    },
-    "ioConfig" : {
-      "type" : "index_parallel",
-      "inputSource" : {
-        "type" : "local",
-        "baseDir" : "quickstart/tutorial",
-        "filter" : "rollup-data.json"
-      },
-      "inputFormat" : {
-        "type" : "json"
-      },
-      "appendToExisting" : false
-    },
-    "tuningConfig" : {
-      "type" : "index_parallel",
-      "partitionsSpec": {
-        "type": "dynamic"
-      },
-      "maxRowsInMemory" : 25000
-    }
-  }
-}
+Load the sample dataset using INSERT and EXTERN functions. The EXTERN function 
lets you read external data or write to an external location.
+
+In the Druid web console, go to the Query view and run the following query:
+
+```sql
+INSERT INTO "rollup_tutorial"
+WITH "inline_data" AS (
+  SELECT *
+  FROM TABLE(EXTERN('{
+    "type":"inline",
+    
"data":"{\"timestamp\":\"2018-01-01T01:01:35Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":20,\"bytes\":9024}\n{\"timestamp\":\"2018-01-01T01:02:14Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":38,\"bytes\":6289}\n{\"timestamp\":\"2018-01-01T01:01:59Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":11,\"bytes\":5780}\n{\"timestamp\":\"2018-01-01T01:01:51Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":255,\"bytes\":21133}\n{\"timestamp\":\"2018-01-01T01:02:29Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":377,\"bytes\":359971}\n{\"timestamp\":\"2018-01-01T01:03:29Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":49,\"bytes\":10204}\n{\"timestamp\":\"2018-01-02T21:33:14Z\",\"srcIP\":\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":38,\"bytes\":6289}\n{\"timestamp\":\"2018-01-02T21:33:45Z\",\"srcIP\":\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":123,\"bytes\":93999}\n{\"timestamp\":\"2018-01-02T21:35:45Z\",\"srcIP\"
 :\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":12,\"bytes\":2818}"}', 
+    '{"type":"json"}')) 
+    EXTEND ("timestamp" VARCHAR, "srcIP" VARCHAR, "dstIP" VARCHAR, "packets" 
BIGINT, "bytes" BIGINT)
+)
+SELECT
+  FLOOR(TIME_PARSE("timestamp") TO MINUTE) AS __time,
+  "srcIP",
+  "dstIP",
+  SUM("bytes") AS "bytes",
+  COUNT(*) AS "count",
+  SUM("packets") AS "packets"
+FROM "inline_data"
+GROUP BY 1, 2, 3
+PARTITIONED BY DAY
 ```
 
-Rollup has been enabled by setting `"rollup" : true` in the `granularitySpec`.
+Note that the query uses the `FLOOR` function to give the `__time` a 
granularity of `MINUTE`. The query defines the dimensions of the rollup by 
grouping columns 1, 2, and 3, which corresponds to the `timestamp`, `srcIP`, 
and `dstIP` columns. The query defines the metrics of the rollup by aggregating 
the `bytes` and `packets` columns.
 
-Note that we have `srcIP` and `dstIP` defined as dimensions, a longSum metric 
is defined for the `packets` and `bytes` columns, and the `queryGranularity` 
has been defined as `minute`.
+After the ingestion completes, you can query the data.
 
-We will see how these definitions are used after we load this data.
-
-## Load the example data
+## Query the example data
 
-From the apache-druid-{{DRUIDVERSION}} package root, run the following command:
+Open a new tab in the Query view and run the following query to see what data 
was ingested:

Review Comment:
   ```suggestion
   Open a new tab in the Query view. Run the following query to view the 
ingested data:
   ```



##########
docs/tutorials/tutorial-rollup.md:
##########
@@ -49,150 +49,101 @@ For this tutorial, we'll use a small sample of network 
flow event data, represen
 {"timestamp":"2018-01-02T21:35:45Z","srcIP":"7.7.7.7", 
"dstIP":"8.8.8.8","packets":12,"bytes":2818}
 ```
 
-A file containing this sample input data is located at 
`quickstart/tutorial/rollup-data.json`.
+The tutorial guides you through how to ingest this data using rollup.
 
-We'll ingest this data using the following ingestion task spec, located at 
`quickstart/tutorial/rollup-index.json`.
+## Load the example data
 
-```json
-{
-  "type" : "index_parallel",
-  "spec" : {
-    "dataSchema" : {
-      "dataSource" : "rollup-tutorial",
-      "dimensionsSpec" : {
-        "dimensions" : [
-          "srcIP",
-          "dstIP"
-        ]
-      },
-      "timestampSpec": {
-        "column": "timestamp",
-        "format": "iso"
-      },
-      "metricsSpec" : [
-        { "type" : "count", "name" : "count" },
-        { "type" : "longSum", "name" : "packets", "fieldName" : "packets" },
-        { "type" : "longSum", "name" : "bytes", "fieldName" : "bytes" }
-      ],
-      "granularitySpec" : {
-        "type" : "uniform",
-        "segmentGranularity" : "week",
-        "queryGranularity" : "minute",
-        "intervals" : ["2018-01-01/2018-01-03"],
-        "rollup" : true
-      }
-    },
-    "ioConfig" : {
-      "type" : "index_parallel",
-      "inputSource" : {
-        "type" : "local",
-        "baseDir" : "quickstart/tutorial",
-        "filter" : "rollup-data.json"
-      },
-      "inputFormat" : {
-        "type" : "json"
-      },
-      "appendToExisting" : false
-    },
-    "tuningConfig" : {
-      "type" : "index_parallel",
-      "partitionsSpec": {
-        "type": "dynamic"
-      },
-      "maxRowsInMemory" : 25000
-    }
-  }
-}
+Load the sample dataset using INSERT and EXTERN functions. The EXTERN function 
lets you read external data or write to an external location.
+
+In the Druid web console, go to the Query view and run the following query:
+
+```sql
+INSERT INTO "rollup_tutorial"
+WITH "inline_data" AS (
+  SELECT *
+  FROM TABLE(EXTERN('{
+    "type":"inline",
+    
"data":"{\"timestamp\":\"2018-01-01T01:01:35Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":20,\"bytes\":9024}\n{\"timestamp\":\"2018-01-01T01:02:14Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":38,\"bytes\":6289}\n{\"timestamp\":\"2018-01-01T01:01:59Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":11,\"bytes\":5780}\n{\"timestamp\":\"2018-01-01T01:01:51Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":255,\"bytes\":21133}\n{\"timestamp\":\"2018-01-01T01:02:29Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":377,\"bytes\":359971}\n{\"timestamp\":\"2018-01-01T01:03:29Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":49,\"bytes\":10204}\n{\"timestamp\":\"2018-01-02T21:33:14Z\",\"srcIP\":\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":38,\"bytes\":6289}\n{\"timestamp\":\"2018-01-02T21:33:45Z\",\"srcIP\":\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":123,\"bytes\":93999}\n{\"timestamp\":\"2018-01-02T21:35:45Z\",\"srcIP\"
 :\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":12,\"bytes\":2818}"}', 
+    '{"type":"json"}')) 
+    EXTEND ("timestamp" VARCHAR, "srcIP" VARCHAR, "dstIP" VARCHAR, "packets" 
BIGINT, "bytes" BIGINT)
+)
+SELECT
+  FLOOR(TIME_PARSE("timestamp") TO MINUTE) AS __time,
+  "srcIP",
+  "dstIP",
+  SUM("bytes") AS "bytes",
+  COUNT(*) AS "count",
+  SUM("packets") AS "packets"
+FROM "inline_data"
+GROUP BY 1, 2, 3
+PARTITIONED BY DAY
 ```
 
-Rollup has been enabled by setting `"rollup" : true` in the `granularitySpec`.
+Note that the query uses the `FLOOR` function to give the `__time` a 
granularity of `MINUTE`. The query defines the dimensions of the rollup by 
grouping columns 1, 2, and 3, which corresponds to the `timestamp`, `srcIP`, 
and `dstIP` columns. The query defines the metrics of the rollup by aggregating 
the `bytes` and `packets` columns.

Review Comment:
   See how Druid describes the schema model in terms of dimensions and metrics:
   https://druid.apache.org/docs/latest/ingestion/schema-model
   
   ```suggestion
   Note that the query uses the `FLOOR` function to combine rows based on 
MINUTE granularity.
   In the query, you group by dimensions, the `timestamp`, `srcIP`, and `dstIP` 
columns.
   You apply aggregations for the metrics, specifically to sum the `bytes` and 
`packets` columns and to add a column to count the number of rows that get 
rolled up.
   ```



##########
docs/tutorials/tutorial-rollup.md:
##########
@@ -24,18 +24,18 @@ sidebar_label: Aggregate data with rollup
   -->
 
 
-Apache Druid can summarize raw data at ingestion time using a process we refer 
to as "rollup". Rollup is a first-level aggregation operation over a selected 
set of columns that reduces the size of stored data.
+Apache Druid&circledR; can summarize raw data at ingestion time using a 
process known as "rollup". Rollup is a first-level aggregation operation over a 
selected set of columns that reduces the size of stored data.
 
-This tutorial will demonstrate the effects of rollup on an example dataset.
+This tutorial demonstrates the effects of rollup on an example dataset.
 
-For this tutorial, we'll assume you've already downloaded Druid as described in
-the [single-machine quickstart](index.md) and have it running on your local 
machine.
+For this tutorial, you should have Druid downloaded as described in
+the [single-machine quickstart](index.md) and have it running on your local 
machine. The examples in the tutorial use the [multi-stage 
query](../multi-stage-query/index.md) (MSQ) task engine to execute SQL 
statements.
 
-It will also be helpful to have finished [Load a 
file](../tutorials/tutorial-batch.md) and [Query 
data](../tutorials/tutorial-query.md) tutorials.
+It is helpful to have finished [Load a file](../tutorials/tutorial-batch.md) 
and [Query data](../tutorials/tutorial-query.md) tutorials.

Review Comment:
   Not sure about this sentence, but here's a suggestion
   ```suggestion
   Before proceeding, it's recommended to complete the tutorials to [Load a 
file](../tutorials/tutorial-batch.md) and [Query 
data](../tutorials/tutorial-query.md).
   ```



##########
docs/tutorials/tutorial-rollup.md:
##########
@@ -49,150 +49,101 @@ For this tutorial, we'll use a small sample of network 
flow event data, represen
 {"timestamp":"2018-01-02T21:35:45Z","srcIP":"7.7.7.7", 
"dstIP":"8.8.8.8","packets":12,"bytes":2818}
 ```
 
-A file containing this sample input data is located at 
`quickstart/tutorial/rollup-data.json`.
+The tutorial guides you through how to ingest this data using rollup.

Review Comment:
   ```suggestion
   The tutorial demonstrates how to apply rollup at ingestion and shows the 
effect of rollup at query time.
   ```



##########
docs/tutorials/tutorial-rollup.md:
##########
@@ -49,150 +49,101 @@ For this tutorial, we'll use a small sample of network 
flow event data, represen
 {"timestamp":"2018-01-02T21:35:45Z","srcIP":"7.7.7.7", 
"dstIP":"8.8.8.8","packets":12,"bytes":2818}
 ```
 
-A file containing this sample input data is located at 
`quickstart/tutorial/rollup-data.json`.
+The tutorial guides you through how to ingest this data using rollup.
 
-We'll ingest this data using the following ingestion task spec, located at 
`quickstart/tutorial/rollup-index.json`.
+## Load the example data
 
-```json
-{
-  "type" : "index_parallel",
-  "spec" : {
-    "dataSchema" : {
-      "dataSource" : "rollup-tutorial",
-      "dimensionsSpec" : {
-        "dimensions" : [
-          "srcIP",
-          "dstIP"
-        ]
-      },
-      "timestampSpec": {
-        "column": "timestamp",
-        "format": "iso"
-      },
-      "metricsSpec" : [
-        { "type" : "count", "name" : "count" },
-        { "type" : "longSum", "name" : "packets", "fieldName" : "packets" },
-        { "type" : "longSum", "name" : "bytes", "fieldName" : "bytes" }
-      ],
-      "granularitySpec" : {
-        "type" : "uniform",
-        "segmentGranularity" : "week",
-        "queryGranularity" : "minute",
-        "intervals" : ["2018-01-01/2018-01-03"],
-        "rollup" : true
-      }
-    },
-    "ioConfig" : {
-      "type" : "index_parallel",
-      "inputSource" : {
-        "type" : "local",
-        "baseDir" : "quickstart/tutorial",
-        "filter" : "rollup-data.json"
-      },
-      "inputFormat" : {
-        "type" : "json"
-      },
-      "appendToExisting" : false
-    },
-    "tuningConfig" : {
-      "type" : "index_parallel",
-      "partitionsSpec": {
-        "type": "dynamic"
-      },
-      "maxRowsInMemory" : 25000
-    }
-  }
-}
+Load the sample dataset using INSERT and EXTERN functions. The EXTERN function 
lets you read external data or write to an external location.
+
+In the Druid web console, go to the Query view and run the following query:
+
+```sql
+INSERT INTO "rollup_tutorial"
+WITH "inline_data" AS (
+  SELECT *
+  FROM TABLE(EXTERN('{
+    "type":"inline",
+    
"data":"{\"timestamp\":\"2018-01-01T01:01:35Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":20,\"bytes\":9024}\n{\"timestamp\":\"2018-01-01T01:02:14Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":38,\"bytes\":6289}\n{\"timestamp\":\"2018-01-01T01:01:59Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":11,\"bytes\":5780}\n{\"timestamp\":\"2018-01-01T01:01:51Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":255,\"bytes\":21133}\n{\"timestamp\":\"2018-01-01T01:02:29Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":377,\"bytes\":359971}\n{\"timestamp\":\"2018-01-01T01:03:29Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":49,\"bytes\":10204}\n{\"timestamp\":\"2018-01-02T21:33:14Z\",\"srcIP\":\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":38,\"bytes\":6289}\n{\"timestamp\":\"2018-01-02T21:33:45Z\",\"srcIP\":\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":123,\"bytes\":93999}\n{\"timestamp\":\"2018-01-02T21:35:45Z\",\"srcIP\"
 :\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":12,\"bytes\":2818}"}', 
+    '{"type":"json"}')) 
+    EXTEND ("timestamp" VARCHAR, "srcIP" VARCHAR, "dstIP" VARCHAR, "packets" 
BIGINT, "bytes" BIGINT)
+)
+SELECT
+  FLOOR(TIME_PARSE("timestamp") TO MINUTE) AS __time,
+  "srcIP",
+  "dstIP",
+  SUM("bytes") AS "bytes",
+  COUNT(*) AS "count",
+  SUM("packets") AS "packets"
+FROM "inline_data"
+GROUP BY 1, 2, 3
+PARTITIONED BY DAY
 ```
 
-Rollup has been enabled by setting `"rollup" : true` in the `granularitySpec`.
+Note that the query uses the `FLOOR` function to give the `__time` a 
granularity of `MINUTE`. The query defines the dimensions of the rollup by 
grouping columns 1, 2, and 3, which corresponds to the `timestamp`, `srcIP`, 
and `dstIP` columns. The query defines the metrics of the rollup by aggregating 
the `bytes` and `packets` columns.
 
-Note that we have `srcIP` and `dstIP` defined as dimensions, a longSum metric 
is defined for the `packets` and `bytes` columns, and the `queryGranularity` 
has been defined as `minute`.
+After the ingestion completes, you can query the data.
 
-We will see how these definitions are used after we load this data.
-
-## Load the example data
+## Query the example data
 
-From the apache-druid-{{DRUIDVERSION}} package root, run the following command:
+Open a new tab in the Query view and run the following query to see what data 
was ingested:
 
-```bash
-bin/post-index-task --file quickstart/tutorial/rollup-index.json --url 
http://localhost:8081
+```sql
+SELECT * FROM "rollup_tutorial"
 ```
 
-After the script completes, we will query the data.
+Returns the following:
 
-## Query the example data
+| `__time` | `srcIP` | `dstIP` | `bytes` | `count` | `packets` |
+| -- | -- | -- | -- | -- | -- |
+| `2018-01-01T01:01:00.000Z` | `1.1.1.1` | `2.2.2.2` | `35,937` | `3` | `286` |
+| `2018-01-01T01:02:00.000Z` | `1.1.1.1` | `2.2.2.2` | `366,260` | `2` | `415` 
|
+| `2018-01-01T01:03:00.000Z` | `1.1.1.1` | `2.2.2.2` | `10,204` | `1` | `49` |
+| `2018-01-02T21:33:00.000Z` | `7.7.7.7` | `8.8.8.8` | `100,288` | `2` | `161` 
|
+| `2018-01-02T21:35:00.000Z` | `7.7.7.7` | `8.8.8.8` | `2,818` | `1` | `12` |
 
-Let's run `bin/dsql` and issue a `select * from "rollup-tutorial";` query to 
see what data was ingested.
-
-```bash
-$ bin/dsql
-Welcome to dsql, the command-line client for Druid SQL.
-Type "\h" for help.
-dsql> select * from "rollup-tutorial";
-┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
-│ __time                   │ bytes  │ count │ dstIP   │ packets │ srcIP   │
-├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤
-│ 2018-01-01T01:01:00.000Z │  35937 │     3 │ 2.2.2.2 │     286 │ 1.1.1.1 │
-│ 2018-01-01T01:02:00.000Z │ 366260 │     2 │ 2.2.2.2 │     415 │ 1.1.1.1 │
-│ 2018-01-01T01:03:00.000Z │  10204 │     1 │ 2.2.2.2 │      49 │ 1.1.1.1 │
-│ 2018-01-02T21:33:00.000Z │ 100288 │     2 │ 8.8.8.8 │     161 │ 7.7.7.7 │
-│ 2018-01-02T21:35:00.000Z │   2818 │     1 │ 8.8.8.8 │      12 │ 7.7.7.7 │
-└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘
-Retrieved 5 rows in 1.18s.
-
-dsql>
-```
 
-Let's look at the three events in the original input data that occurred during 
`2018-01-01T01:01`:
+Consider the three events in the original input data that occur over the 
course of minute `2018-01-01T01:01`:
 
 ```json
 {"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":20,"bytes":9024}
 {"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":255,"bytes":21133}
 {"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":11,"bytes":5780}
 ```
 
-These three rows have been "rolled up" into the following row:
+Druid combines the three rows into the following during rollup:
 
-```bash
-┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
-│ __time                   │ bytes  │ count │ dstIP   │ packets │ srcIP   │
-├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤
-│ 2018-01-01T01:01:00.000Z │  35937 │     3 │ 2.2.2.2 │     286 │ 1.1.1.1 │
-└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘
-```
+| `__time` | `srcIP` | `dstIP` | `bytes` | `count` | `packets` |
+| -- | -- | -- | -- | -- | -- |
+| `2018-01-01T01:01:00.000Z` | `1.1.1.1` | `2.2.2.2` | `35,937` | `3` | `286` |
 
 The input rows have been grouped by the timestamp and dimension columns 
`{timestamp, srcIP, dstIP}` with sum aggregations on the metric columns 
`packets` and `bytes`.
 
-Before the grouping occurs, the timestamps of the original input data are 
bucketed/floored by minute, due to the `"queryGranularity":"minute"` setting in 
the ingestion spec.
+Before the grouping occurs, the timestamps of the original input data are 
bucketed/floored by minute, due to the `FLOOR(TIME_PARSE("timestamp") TO 
MINUTE)` function in the query.
 
-Likewise, these two events that occurred during `2018-01-01T01:02` have been 
rolled up:
+Consider the two events in the original input data that occur over the course 
of minute `2018-01-01T01:02`:

Review Comment:
   The query section is a bit long -- add subheader titles to help break it up 
for easier navigation



##########
docs/tutorials/tutorial-rollup.md:
##########
@@ -49,150 +49,101 @@ For this tutorial, we'll use a small sample of network 
flow event data, represen
 {"timestamp":"2018-01-02T21:35:45Z","srcIP":"7.7.7.7", 
"dstIP":"8.8.8.8","packets":12,"bytes":2818}
 ```
 
-A file containing this sample input data is located at 
`quickstart/tutorial/rollup-data.json`.
+The tutorial guides you through how to ingest this data using rollup.
 
-We'll ingest this data using the following ingestion task spec, located at 
`quickstart/tutorial/rollup-index.json`.
+## Load the example data
 
-```json
-{
-  "type" : "index_parallel",
-  "spec" : {
-    "dataSchema" : {
-      "dataSource" : "rollup-tutorial",
-      "dimensionsSpec" : {
-        "dimensions" : [
-          "srcIP",
-          "dstIP"
-        ]
-      },
-      "timestampSpec": {
-        "column": "timestamp",
-        "format": "iso"
-      },
-      "metricsSpec" : [
-        { "type" : "count", "name" : "count" },
-        { "type" : "longSum", "name" : "packets", "fieldName" : "packets" },
-        { "type" : "longSum", "name" : "bytes", "fieldName" : "bytes" }
-      ],
-      "granularitySpec" : {
-        "type" : "uniform",
-        "segmentGranularity" : "week",
-        "queryGranularity" : "minute",
-        "intervals" : ["2018-01-01/2018-01-03"],
-        "rollup" : true
-      }
-    },
-    "ioConfig" : {
-      "type" : "index_parallel",
-      "inputSource" : {
-        "type" : "local",
-        "baseDir" : "quickstart/tutorial",
-        "filter" : "rollup-data.json"
-      },
-      "inputFormat" : {
-        "type" : "json"
-      },
-      "appendToExisting" : false
-    },
-    "tuningConfig" : {
-      "type" : "index_parallel",
-      "partitionsSpec": {
-        "type": "dynamic"
-      },
-      "maxRowsInMemory" : 25000
-    }
-  }
-}
+Load the sample dataset using INSERT and EXTERN functions. The EXTERN function 
lets you read external data or write to an external location.
+
+In the Druid web console, go to the Query view and run the following query:
+
+```sql
+INSERT INTO "rollup_tutorial"
+WITH "inline_data" AS (
+  SELECT *
+  FROM TABLE(EXTERN('{
+    "type":"inline",
+    
"data":"{\"timestamp\":\"2018-01-01T01:01:35Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":20,\"bytes\":9024}\n{\"timestamp\":\"2018-01-01T01:02:14Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":38,\"bytes\":6289}\n{\"timestamp\":\"2018-01-01T01:01:59Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":11,\"bytes\":5780}\n{\"timestamp\":\"2018-01-01T01:01:51Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":255,\"bytes\":21133}\n{\"timestamp\":\"2018-01-01T01:02:29Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":377,\"bytes\":359971}\n{\"timestamp\":\"2018-01-01T01:03:29Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":49,\"bytes\":10204}\n{\"timestamp\":\"2018-01-02T21:33:14Z\",\"srcIP\":\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":38,\"bytes\":6289}\n{\"timestamp\":\"2018-01-02T21:33:45Z\",\"srcIP\":\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":123,\"bytes\":93999}\n{\"timestamp\":\"2018-01-02T21:35:45Z\",\"srcIP\"
 :\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":12,\"bytes\":2818}"}', 
+    '{"type":"json"}')) 
+    EXTEND ("timestamp" VARCHAR, "srcIP" VARCHAR, "dstIP" VARCHAR, "packets" 
BIGINT, "bytes" BIGINT)
+)
+SELECT
+  FLOOR(TIME_PARSE("timestamp") TO MINUTE) AS __time,
+  "srcIP",
+  "dstIP",
+  SUM("bytes") AS "bytes",
+  COUNT(*) AS "count",
+  SUM("packets") AS "packets"
+FROM "inline_data"
+GROUP BY 1, 2, 3
+PARTITIONED BY DAY
 ```
 
-Rollup has been enabled by setting `"rollup" : true` in the `granularitySpec`.
+Note that the query uses the `FLOOR` function to give the `__time` a 
granularity of `MINUTE`. The query defines the dimensions of the rollup by 
grouping columns 1, 2, and 3, which corresponds to the `timestamp`, `srcIP`, 
and `dstIP` columns. The query defines the metrics of the rollup by aggregating 
the `bytes` and `packets` columns.
 
-Note that we have `srcIP` and `dstIP` defined as dimensions, a longSum metric 
is defined for the `packets` and `bytes` columns, and the `queryGranularity` 
has been defined as `minute`.
+After the ingestion completes, you can query the data.
 
-We will see how these definitions are used after we load this data.
-
-## Load the example data
+## Query the example data
 
-From the apache-druid-{{DRUIDVERSION}} package root, run the following command:
+Open a new tab in the Query view and run the following query to see what data 
was ingested:
 
-```bash
-bin/post-index-task --file quickstart/tutorial/rollup-index.json --url 
http://localhost:8081
+```sql
+SELECT * FROM "rollup_tutorial"
 ```
 
-After the script completes, we will query the data.
+Returns the following:
 
-## Query the example data
+| `__time` | `srcIP` | `dstIP` | `bytes` | `count` | `packets` |
+| -- | -- | -- | -- | -- | -- |
+| `2018-01-01T01:01:00.000Z` | `1.1.1.1` | `2.2.2.2` | `35,937` | `3` | `286` |
+| `2018-01-01T01:02:00.000Z` | `1.1.1.1` | `2.2.2.2` | `366,260` | `2` | `415` 
|
+| `2018-01-01T01:03:00.000Z` | `1.1.1.1` | `2.2.2.2` | `10,204` | `1` | `49` |
+| `2018-01-02T21:33:00.000Z` | `7.7.7.7` | `8.8.8.8` | `100,288` | `2` | `161` 
|
+| `2018-01-02T21:35:00.000Z` | `7.7.7.7` | `8.8.8.8` | `2,818` | `1` | `12` |
 
-Let's run `bin/dsql` and issue a `select * from "rollup-tutorial";` query to 
see what data was ingested.
-
-```bash
-$ bin/dsql
-Welcome to dsql, the command-line client for Druid SQL.
-Type "\h" for help.
-dsql> select * from "rollup-tutorial";
-┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
-│ __time                   │ bytes  │ count │ dstIP   │ packets │ srcIP   │
-├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤
-│ 2018-01-01T01:01:00.000Z │  35937 │     3 │ 2.2.2.2 │     286 │ 1.1.1.1 │
-│ 2018-01-01T01:02:00.000Z │ 366260 │     2 │ 2.2.2.2 │     415 │ 1.1.1.1 │
-│ 2018-01-01T01:03:00.000Z │  10204 │     1 │ 2.2.2.2 │      49 │ 1.1.1.1 │
-│ 2018-01-02T21:33:00.000Z │ 100288 │     2 │ 8.8.8.8 │     161 │ 7.7.7.7 │
-│ 2018-01-02T21:35:00.000Z │   2818 │     1 │ 8.8.8.8 │      12 │ 7.7.7.7 │
-└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘
-Retrieved 5 rows in 1.18s.
-
-dsql>
-```
 
-Let's look at the three events in the original input data that occurred during 
`2018-01-01T01:01`:
+Consider the three events in the original input data that occur over the 
course of minute `2018-01-01T01:01`:
 
 ```json
 {"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":20,"bytes":9024}
 {"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":255,"bytes":21133}
 {"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":11,"bytes":5780}
 ```
 
-These three rows have been "rolled up" into the following row:
+Druid combines the three rows into the following during rollup:
 
-```bash
-┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
-│ __time                   │ bytes  │ count │ dstIP   │ packets │ srcIP   │
-├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤
-│ 2018-01-01T01:01:00.000Z │  35937 │     3 │ 2.2.2.2 │     286 │ 1.1.1.1 │
-└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘
-```
+| `__time` | `srcIP` | `dstIP` | `bytes` | `count` | `packets` |
+| -- | -- | -- | -- | -- | -- |
+| `2018-01-01T01:01:00.000Z` | `1.1.1.1` | `2.2.2.2` | `35,937` | `3` | `286` |
 
 The input rows have been grouped by the timestamp and dimension columns 
`{timestamp, srcIP, dstIP}` with sum aggregations on the metric columns 
`packets` and `bytes`.

Review Comment:
   ```suggestion
   The input rows were grouped by the timestamp and dimension columns 
`{timestamp, srcIP, dstIP}` with sum aggregations on the metric columns 
`packets` and `bytes`.
   ```



##########
docs/tutorials/tutorial-rollup.md:
##########
@@ -49,150 +49,101 @@ For this tutorial, we'll use a small sample of network 
flow event data, represen
 {"timestamp":"2018-01-02T21:35:45Z","srcIP":"7.7.7.7", 
"dstIP":"8.8.8.8","packets":12,"bytes":2818}
 ```
 
-A file containing this sample input data is located at 
`quickstart/tutorial/rollup-data.json`.
+The tutorial guides you through how to ingest this data using rollup.
 
-We'll ingest this data using the following ingestion task spec, located at 
`quickstart/tutorial/rollup-index.json`.
+## Load the example data
 
-```json
-{
-  "type" : "index_parallel",
-  "spec" : {
-    "dataSchema" : {
-      "dataSource" : "rollup-tutorial",
-      "dimensionsSpec" : {
-        "dimensions" : [
-          "srcIP",
-          "dstIP"
-        ]
-      },
-      "timestampSpec": {
-        "column": "timestamp",
-        "format": "iso"
-      },
-      "metricsSpec" : [
-        { "type" : "count", "name" : "count" },
-        { "type" : "longSum", "name" : "packets", "fieldName" : "packets" },
-        { "type" : "longSum", "name" : "bytes", "fieldName" : "bytes" }
-      ],
-      "granularitySpec" : {
-        "type" : "uniform",
-        "segmentGranularity" : "week",
-        "queryGranularity" : "minute",
-        "intervals" : ["2018-01-01/2018-01-03"],
-        "rollup" : true
-      }
-    },
-    "ioConfig" : {
-      "type" : "index_parallel",
-      "inputSource" : {
-        "type" : "local",
-        "baseDir" : "quickstart/tutorial",
-        "filter" : "rollup-data.json"
-      },
-      "inputFormat" : {
-        "type" : "json"
-      },
-      "appendToExisting" : false
-    },
-    "tuningConfig" : {
-      "type" : "index_parallel",
-      "partitionsSpec": {
-        "type": "dynamic"
-      },
-      "maxRowsInMemory" : 25000
-    }
-  }
-}
+Load the sample dataset using INSERT and EXTERN functions. The EXTERN function 
lets you read external data or write to an external location.
+
+In the Druid web console, go to the Query view and run the following query:
+
+```sql
+INSERT INTO "rollup_tutorial"
+WITH "inline_data" AS (
+  SELECT *
+  FROM TABLE(EXTERN('{
+    "type":"inline",
+    
"data":"{\"timestamp\":\"2018-01-01T01:01:35Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":20,\"bytes\":9024}\n{\"timestamp\":\"2018-01-01T01:02:14Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":38,\"bytes\":6289}\n{\"timestamp\":\"2018-01-01T01:01:59Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":11,\"bytes\":5780}\n{\"timestamp\":\"2018-01-01T01:01:51Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":255,\"bytes\":21133}\n{\"timestamp\":\"2018-01-01T01:02:29Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":377,\"bytes\":359971}\n{\"timestamp\":\"2018-01-01T01:03:29Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":49,\"bytes\":10204}\n{\"timestamp\":\"2018-01-02T21:33:14Z\",\"srcIP\":\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":38,\"bytes\":6289}\n{\"timestamp\":\"2018-01-02T21:33:45Z\",\"srcIP\":\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":123,\"bytes\":93999}\n{\"timestamp\":\"2018-01-02T21:35:45Z\",\"srcIP\"
 :\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":12,\"bytes\":2818}"}', 
+    '{"type":"json"}')) 
+    EXTEND ("timestamp" VARCHAR, "srcIP" VARCHAR, "dstIP" VARCHAR, "packets" 
BIGINT, "bytes" BIGINT)
+)
+SELECT
+  FLOOR(TIME_PARSE("timestamp") TO MINUTE) AS __time,
+  "srcIP",
+  "dstIP",
+  SUM("bytes") AS "bytes",
+  COUNT(*) AS "count",
+  SUM("packets") AS "packets"
+FROM "inline_data"
+GROUP BY 1, 2, 3
+PARTITIONED BY DAY
 ```
 
-Rollup has been enabled by setting `"rollup" : true` in the `granularitySpec`.
+Note that the query uses the `FLOOR` function to give the `__time` a 
granularity of `MINUTE`. The query defines the dimensions of the rollup by 
grouping columns 1, 2, and 3, which corresponds to the `timestamp`, `srcIP`, 
and `dstIP` columns. The query defines the metrics of the rollup by aggregating 
the `bytes` and `packets` columns.
 
-Note that we have `srcIP` and `dstIP` defined as dimensions, a longSum metric 
is defined for the `packets` and `bytes` columns, and the `queryGranularity` 
has been defined as `minute`.
+After the ingestion completes, you can query the data.
 
-We will see how these definitions are used after we load this data.
-
-## Load the example data
+## Query the example data
 
-From the apache-druid-{{DRUIDVERSION}} package root, run the following command:
+Open a new tab in the Query view and run the following query to see what data 
was ingested:
 
-```bash
-bin/post-index-task --file quickstart/tutorial/rollup-index.json --url 
http://localhost:8081
+```sql
+SELECT * FROM "rollup_tutorial"
 ```
 
-After the script completes, we will query the data.
+Returns the following:
 
-## Query the example data
+| `__time` | `srcIP` | `dstIP` | `bytes` | `count` | `packets` |
+| -- | -- | -- | -- | -- | -- |
+| `2018-01-01T01:01:00.000Z` | `1.1.1.1` | `2.2.2.2` | `35,937` | `3` | `286` |
+| `2018-01-01T01:02:00.000Z` | `1.1.1.1` | `2.2.2.2` | `366,260` | `2` | `415` 
|
+| `2018-01-01T01:03:00.000Z` | `1.1.1.1` | `2.2.2.2` | `10,204` | `1` | `49` |
+| `2018-01-02T21:33:00.000Z` | `7.7.7.7` | `8.8.8.8` | `100,288` | `2` | `161` 
|
+| `2018-01-02T21:35:00.000Z` | `7.7.7.7` | `8.8.8.8` | `2,818` | `1` | `12` |
 
-Let's run `bin/dsql` and issue a `select * from "rollup-tutorial";` query to 
see what data was ingested.
-
-```bash
-$ bin/dsql
-Welcome to dsql, the command-line client for Druid SQL.
-Type "\h" for help.
-dsql> select * from "rollup-tutorial";
-┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
-│ __time                   │ bytes  │ count │ dstIP   │ packets │ srcIP   │
-├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤
-│ 2018-01-01T01:01:00.000Z │  35937 │     3 │ 2.2.2.2 │     286 │ 1.1.1.1 │
-│ 2018-01-01T01:02:00.000Z │ 366260 │     2 │ 2.2.2.2 │     415 │ 1.1.1.1 │
-│ 2018-01-01T01:03:00.000Z │  10204 │     1 │ 2.2.2.2 │      49 │ 1.1.1.1 │
-│ 2018-01-02T21:33:00.000Z │ 100288 │     2 │ 8.8.8.8 │     161 │ 7.7.7.7 │
-│ 2018-01-02T21:35:00.000Z │   2818 │     1 │ 8.8.8.8 │      12 │ 7.7.7.7 │
-└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘
-Retrieved 5 rows in 1.18s.
-
-dsql>
-```
 
-Let's look at the three events in the original input data that occurred during 
`2018-01-01T01:01`:
+Consider the three events in the original input data that occur over the 
course of minute `2018-01-01T01:01`:
 
 ```json
 {"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":20,"bytes":9024}
 {"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":255,"bytes":21133}
 {"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":11,"bytes":5780}
 ```
 
-These three rows have been "rolled up" into the following row:
+Druid combines the three rows into the following during rollup:
 
-```bash
-┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
-│ __time                   │ bytes  │ count │ dstIP   │ packets │ srcIP   │
-├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤
-│ 2018-01-01T01:01:00.000Z │  35937 │     3 │ 2.2.2.2 │     286 │ 1.1.1.1 │
-└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘
-```
+| `__time` | `srcIP` | `dstIP` | `bytes` | `count` | `packets` |
+| -- | -- | -- | -- | -- | -- |
+| `2018-01-01T01:01:00.000Z` | `1.1.1.1` | `2.2.2.2` | `35,937` | `3` | `286` |
 
 The input rows have been grouped by the timestamp and dimension columns 
`{timestamp, srcIP, dstIP}` with sum aggregations on the metric columns 
`packets` and `bytes`.
 
-Before the grouping occurs, the timestamps of the original input data are 
bucketed/floored by minute, due to the `"queryGranularity":"minute"` setting in 
the ingestion spec.
+Before the grouping occurs, the timestamps of the original input data are 
bucketed/floored by minute, due to the `FLOOR(TIME_PARSE("timestamp") TO 
MINUTE)` function in the query.

Review Comment:
   ```suggestion
   Before the grouping occurs, the timestamps of the original input data are 
bucketed (floored) by minute, due to the `FLOOR(TIME_PARSE("timestamp") TO 
MINUTE)` expression in the query.
   ```



##########
docs/tutorials/tutorial-rollup.md:
##########
@@ -49,150 +49,101 @@ For this tutorial, we'll use a small sample of network 
flow event data, represen
 {"timestamp":"2018-01-02T21:35:45Z","srcIP":"7.7.7.7", 
"dstIP":"8.8.8.8","packets":12,"bytes":2818}
 ```
 
-A file containing this sample input data is located at 
`quickstart/tutorial/rollup-data.json`.
+The tutorial guides you through how to ingest this data using rollup.
 
-We'll ingest this data using the following ingestion task spec, located at 
`quickstart/tutorial/rollup-index.json`.
+## Load the example data
 
-```json
-{
-  "type" : "index_parallel",
-  "spec" : {
-    "dataSchema" : {
-      "dataSource" : "rollup-tutorial",
-      "dimensionsSpec" : {
-        "dimensions" : [
-          "srcIP",
-          "dstIP"
-        ]
-      },
-      "timestampSpec": {
-        "column": "timestamp",
-        "format": "iso"
-      },
-      "metricsSpec" : [
-        { "type" : "count", "name" : "count" },
-        { "type" : "longSum", "name" : "packets", "fieldName" : "packets" },
-        { "type" : "longSum", "name" : "bytes", "fieldName" : "bytes" }
-      ],
-      "granularitySpec" : {
-        "type" : "uniform",
-        "segmentGranularity" : "week",
-        "queryGranularity" : "minute",
-        "intervals" : ["2018-01-01/2018-01-03"],
-        "rollup" : true
-      }
-    },
-    "ioConfig" : {
-      "type" : "index_parallel",
-      "inputSource" : {
-        "type" : "local",
-        "baseDir" : "quickstart/tutorial",
-        "filter" : "rollup-data.json"
-      },
-      "inputFormat" : {
-        "type" : "json"
-      },
-      "appendToExisting" : false
-    },
-    "tuningConfig" : {
-      "type" : "index_parallel",
-      "partitionsSpec": {
-        "type": "dynamic"
-      },
-      "maxRowsInMemory" : 25000
-    }
-  }
-}
+Load the sample dataset using INSERT and EXTERN functions. The EXTERN function 
lets you read external data or write to an external location.
+
+In the Druid web console, go to the Query view and run the following query:
+
+```sql
+INSERT INTO "rollup_tutorial"
+WITH "inline_data" AS (
+  SELECT *
+  FROM TABLE(EXTERN('{
+    "type":"inline",
+    
"data":"{\"timestamp\":\"2018-01-01T01:01:35Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":20,\"bytes\":9024}\n{\"timestamp\":\"2018-01-01T01:02:14Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":38,\"bytes\":6289}\n{\"timestamp\":\"2018-01-01T01:01:59Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":11,\"bytes\":5780}\n{\"timestamp\":\"2018-01-01T01:01:51Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":255,\"bytes\":21133}\n{\"timestamp\":\"2018-01-01T01:02:29Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":377,\"bytes\":359971}\n{\"timestamp\":\"2018-01-01T01:03:29Z\",\"srcIP\":\"1.1.1.1\",\"dstIP\":\"2.2.2.2\",\"packets\":49,\"bytes\":10204}\n{\"timestamp\":\"2018-01-02T21:33:14Z\",\"srcIP\":\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":38,\"bytes\":6289}\n{\"timestamp\":\"2018-01-02T21:33:45Z\",\"srcIP\":\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":123,\"bytes\":93999}\n{\"timestamp\":\"2018-01-02T21:35:45Z\",\"srcIP\"
 :\"7.7.7.7\",\"dstIP\":\"8.8.8.8\",\"packets\":12,\"bytes\":2818}"}', 
+    '{"type":"json"}')) 
+    EXTEND ("timestamp" VARCHAR, "srcIP" VARCHAR, "dstIP" VARCHAR, "packets" 
BIGINT, "bytes" BIGINT)
+)
+SELECT
+  FLOOR(TIME_PARSE("timestamp") TO MINUTE) AS __time,
+  "srcIP",
+  "dstIP",
+  SUM("bytes") AS "bytes",
+  COUNT(*) AS "count",
+  SUM("packets") AS "packets"
+FROM "inline_data"
+GROUP BY 1, 2, 3
+PARTITIONED BY DAY
 ```
 
-Rollup has been enabled by setting `"rollup" : true` in the `granularitySpec`.
+Note that the query uses the `FLOOR` function to give the `__time` a 
granularity of `MINUTE`. The query defines the dimensions of the rollup by 
grouping columns 1, 2, and 3, which corresponds to the `timestamp`, `srcIP`, 
and `dstIP` columns. The query defines the metrics of the rollup by aggregating 
the `bytes` and `packets` columns.
 
-Note that we have `srcIP` and `dstIP` defined as dimensions, a longSum metric 
is defined for the `packets` and `bytes` columns, and the `queryGranularity` 
has been defined as `minute`.
+After the ingestion completes, you can query the data.
 
-We will see how these definitions are used after we load this data.
-
-## Load the example data
+## Query the example data
 
-From the apache-druid-{{DRUIDVERSION}} package root, run the following command:
+Open a new tab in the Query view and run the following query to see what data 
was ingested:
 
-```bash
-bin/post-index-task --file quickstart/tutorial/rollup-index.json --url 
http://localhost:8081
+```sql
+SELECT * FROM "rollup_tutorial"
 ```
 
-After the script completes, we will query the data.
+Returns the following:
 
-## Query the example data
+| `__time` | `srcIP` | `dstIP` | `bytes` | `count` | `packets` |
+| -- | -- | -- | -- | -- | -- |
+| `2018-01-01T01:01:00.000Z` | `1.1.1.1` | `2.2.2.2` | `35,937` | `3` | `286` |
+| `2018-01-01T01:02:00.000Z` | `1.1.1.1` | `2.2.2.2` | `366,260` | `2` | `415` 
|
+| `2018-01-01T01:03:00.000Z` | `1.1.1.1` | `2.2.2.2` | `10,204` | `1` | `49` |
+| `2018-01-02T21:33:00.000Z` | `7.7.7.7` | `8.8.8.8` | `100,288` | `2` | `161` 
|
+| `2018-01-02T21:35:00.000Z` | `7.7.7.7` | `8.8.8.8` | `2,818` | `1` | `12` |
 
-Let's run `bin/dsql` and issue a `select * from "rollup-tutorial";` query to 
see what data was ingested.
-
-```bash
-$ bin/dsql
-Welcome to dsql, the command-line client for Druid SQL.
-Type "\h" for help.
-dsql> select * from "rollup-tutorial";
-┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
-│ __time                   │ bytes  │ count │ dstIP   │ packets │ srcIP   │
-├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤
-│ 2018-01-01T01:01:00.000Z │  35937 │     3 │ 2.2.2.2 │     286 │ 1.1.1.1 │
-│ 2018-01-01T01:02:00.000Z │ 366260 │     2 │ 2.2.2.2 │     415 │ 1.1.1.1 │
-│ 2018-01-01T01:03:00.000Z │  10204 │     1 │ 2.2.2.2 │      49 │ 1.1.1.1 │
-│ 2018-01-02T21:33:00.000Z │ 100288 │     2 │ 8.8.8.8 │     161 │ 7.7.7.7 │
-│ 2018-01-02T21:35:00.000Z │   2818 │     1 │ 8.8.8.8 │      12 │ 7.7.7.7 │
-└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘
-Retrieved 5 rows in 1.18s.
-
-dsql>
-```
 
-Let's look at the three events in the original input data that occurred during 
`2018-01-01T01:01`:
+Consider the three events in the original input data that occur over the 
course of minute `2018-01-01T01:01`:
 
 ```json
 {"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":20,"bytes":9024}
 {"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":255,"bytes":21133}
 {"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":11,"bytes":5780}
 ```
 
-These three rows have been "rolled up" into the following row:
+Druid combines the three rows into the following during rollup:
 
-```bash
-┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
-│ __time                   │ bytes  │ count │ dstIP   │ packets │ srcIP   │
-├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤
-│ 2018-01-01T01:01:00.000Z │  35937 │     3 │ 2.2.2.2 │     286 │ 1.1.1.1 │
-└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘
-```
+| `__time` | `srcIP` | `dstIP` | `bytes` | `count` | `packets` |
+| -- | -- | -- | -- | -- | -- |
+| `2018-01-01T01:01:00.000Z` | `1.1.1.1` | `2.2.2.2` | `35,937` | `3` | `286` |
 
 The input rows have been grouped by the timestamp and dimension columns 
`{timestamp, srcIP, dstIP}` with sum aggregations on the metric columns 
`packets` and `bytes`.
 
-Before the grouping occurs, the timestamps of the original input data are 
bucketed/floored by minute, due to the `"queryGranularity":"minute"` setting in 
the ingestion spec.
+Before the grouping occurs, the timestamps of the original input data are 
bucketed/floored by minute, due to the `FLOOR(TIME_PARSE("timestamp") TO 
MINUTE)` function in the query.
 
-Likewise, these two events that occurred during `2018-01-01T01:02` have been 
rolled up:
+Consider the two events in the original input data that occur over the course 
of minute `2018-01-01T01:02`:
 
 ```json
 {"timestamp":"2018-01-01T01:02:14Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":38,"bytes":6289}
 {"timestamp":"2018-01-01T01:02:29Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":377,"bytes":359971}
 ```
 
-```bash
-┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
-│ __time                   │ bytes  │ count │ dstIP   │ packets │ srcIP   │
-├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤
-│ 2018-01-01T01:02:00.000Z │ 366260 │     2 │ 2.2.2.2 │     415 │ 1.1.1.1 │
-└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘
-```
+The rows have been grouped into the following during rollup:
+
+| `__time` | `srcIP` | `dstIP` | `bytes` | `count` | `packets` |
+| -- | -- | -- | -- | -- | -- |
+| `2018-01-01T01:02:00.000Z` | `1.1.1.1` | `2.2.2.2` | `366,260` | `2` | `415` 
|
 
-For the last event recording traffic between 1.1.1.1 and 2.2.2.2, no rollup 
took place, because this was the only event that occurred during 
`2018-01-01T01:03`:
+In the original input data, only one event occurs over the course of minute 
`2018-01-01T01:03`:
 
 ```json
 {"timestamp":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", 
"dstIP":"2.2.2.2","packets":49,"bytes":10204}
 ```
 
-```bash
-┌──────────────────────────┬────────┬───────┬─────────┬─────────┬─────────┐
-│ __time                   │ bytes  │ count │ dstIP   │ packets │ srcIP   │
-├──────────────────────────┼────────┼───────┼─────────┼─────────┼─────────┤
-│ 2018-01-01T01:03:00.000Z │  10204 │     1 │ 2.2.2.2 │      49 │ 1.1.1.1 │
-└──────────────────────────┴────────┴───────┴─────────┴─────────┴─────────┘
-```
+Therefore no rollup takes place:
+
+| `__time` | `srcIP` | `dstIP` | `bytes` | `count` | `packets` |
+| -- | -- | -- | -- | -- | -- |
+| `2018-01-01T01:03:00.000Z` | `1.1.1.1` | `2.2.2.2` | `10,204` | `1` | `49` |
 
 Note that the `count` metric shows how many rows in the original input data 
contributed to the final "rolled up" row.

Review Comment:
   I think this sentence better belongs somewhere earlier in the doc



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Re: [PR] [docs] Updating Rollup tutorial (druid)

Reply via email to