This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new 50b1fac22c Docs: Add query syntax to `COPY` docs (#7388)
50b1fac22c is described below
commit 50b1fac22cc9a56f7751ba4d001243bd2c28c9c9
Author: Andrew Lamb <[email protected]>
AuthorDate: Thu Aug 24 09:40:58 2023 -0400
Docs: Add query syntax to `COPY` docs (#7388)
* Docs: Add query syntax to `COPY` docs
* prettier
* Apply suggestions from code review
Co-authored-by: Seth Paydar <[email protected]>
---------
Co-authored-by: Seth Paydar <[email protected]>
---
docs/source/user-guide/sql/ddl.md | 3 +++
docs/source/user-guide/sql/dml.md | 29 +++++++++++++++++++++++++++--
2 files changed, 30 insertions(+), 2 deletions(-)
diff --git a/docs/source/user-guide/sql/ddl.md
b/docs/source/user-guide/sql/ddl.md
index 0dcc4517b5..f566b8342e 100644
--- a/docs/source/user-guide/sql/ddl.md
+++ b/docs/source/user-guide/sql/ddl.md
@@ -19,6 +19,9 @@
# DDL
+DDL stands for "Data Definition Language" and relates to creating and
+modifying catalog objects such as Tables.
+
## CREATE DATABASE
Create catalog with specified name.
diff --git a/docs/source/user-guide/sql/dml.md
b/docs/source/user-guide/sql/dml.md
index 26f291c15a..9794fba4aa 100644
--- a/docs/source/user-guide/sql/dml.md
+++ b/docs/source/user-guide/sql/dml.md
@@ -19,14 +19,19 @@
# DML
+DML stands for "Data Manipulation Language" and relates to inserting
+and modifying data in tables.
+
## COPY
-Copy a table to file(s). Supported file formats are `parquet`, `csv`, and
`json`.
+Copies the contents of a table or query to file(s). Supported file
+formats are `parquet`, `csv`, and `json` and can be inferred based on
+filename if writing to a single file.
The `PER_THREAD_OUTPUT` option treats `file_name` as a directory and writes a
file per thread within it.
<pre>
-COPY <i><b>table_name</i></b> TO '<i><b>file_name</i></b>' [ (
<i><b>option</i></b> [, ... ] ) ]
+COPY { <i><b>table_name</i></b> | <i><b>query</i></b> } TO
'<i><b>file_name</i></b>' [ ( <i><b>option</i></b> [, ... ] ) ]
where <i><b>option</i></b> can be one of:
FORMAT <i><b>format_name</i></b>
@@ -35,6 +40,8 @@ where <i><b>option</i></b> can be one of:
ROW_GROUP_LIMIT_BYTES <i><b>integer</i></b>
</pre>
+Copy the contents of `source_table` to `file_name.json` in JSON format:
+
```sql
> COPY source_table TO 'file_name.json';
+-------+
@@ -42,7 +49,12 @@ where <i><b>option</i></b> can be one of:
+-------+
| 2 |
+-------+
+```
+
+Copy the contents of `source_table` to one or more Parquet formatted
+files in the `dir_name` directory:
+```sql
> COPY source_table TO 'dir_name' (FORMAT parquet, PER_THREAD_OUTPUT true);
+-------+
| count |
@@ -51,6 +63,19 @@ where <i><b>option</i></b> can be one of:
+-------+
```
+Run the query `SELECT * from source ORDER BY time` and write the
+results (maintaining the order) to a parquet file named
+`output.parquet` with a maximum parquet row group size of 10MB:
+
+```sql
+> COPY (SELECT * from source ORDER BY time) TO 'output.parquet'
(ROW_GROUP_LIMIT_BYTES 10000000);
++-------+
+| count |
++-------+
+| 2 |
++-------+
+```
+
## INSERT
Insert values into a table.