[arrow-datafusion] branch main updated: Docs: Add query syntax to `COPY` docs (#7388)

alamb Thu, 24 Aug 2023 06:41:09 -0700

This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git



The following commit(s) were added to refs/heads/main by this push:
     new 50b1fac22c Docs: Add query syntax to `COPY` docs (#7388)
50b1fac22c is described below

commit 50b1fac22cc9a56f7751ba4d001243bd2c28c9c9
Author: Andrew Lamb <[email protected]>
AuthorDate: Thu Aug 24 09:40:58 2023 -0400

    Docs: Add query syntax to `COPY` docs (#7388)
    
    * Docs: Add query syntax to `COPY` docs
    
    * prettier
    
    * Apply suggestions from code review
    
    Co-authored-by: Seth Paydar <[email protected]>
    
    ---------
    
    Co-authored-by: Seth Paydar <[email protected]>
---
 docs/source/user-guide/sql/ddl.md |  3 +++
 docs/source/user-guide/sql/dml.md | 29 +++++++++++++++++++++++++++--
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/docs/source/user-guide/sql/ddl.md 
b/docs/source/user-guide/sql/ddl.md
index 0dcc4517b5..f566b8342e 100644
--- a/docs/source/user-guide/sql/ddl.md
+++ b/docs/source/user-guide/sql/ddl.md
@@ -19,6 +19,9 @@
 
 # DDL
 
+DDL stands for "Data Definition Language" and relates to creating and
+modifying catalog objects such as Tables.
+
 ## CREATE DATABASE
 
 Create catalog with specified name.
diff --git a/docs/source/user-guide/sql/dml.md 
b/docs/source/user-guide/sql/dml.md
index 26f291c15a..9794fba4aa 100644
--- a/docs/source/user-guide/sql/dml.md
+++ b/docs/source/user-guide/sql/dml.md
@@ -19,14 +19,19 @@
 
 # DML
 
+DML stands for "Data Manipulation Language" and relates to inserting
+and modifying data in tables.
+
 ## COPY
 
-Copy a table to file(s). Supported file formats are `parquet`, `csv`, and 
`json`.
+Copies the contents of a table or query to file(s). Supported file
+formats are `parquet`, `csv`, and `json` and can be inferred based on
+filename if writing to a single file.
 
 The `PER_THREAD_OUTPUT` option treats `file_name` as a directory and writes a 
file per thread within it.
 
 <pre>
-COPY <i><b>table_name</i></b> TO '<i><b>file_name</i></b>' [ ( 
<i><b>option</i></b> [, ... ] ) ]
+COPY { <i><b>table_name</i></b> | <i><b>query</i></b> } TO 
'<i><b>file_name</i></b>' [ ( <i><b>option</i></b> [, ... ] ) ]
 
 where <i><b>option</i></b> can be one of:
     FORMAT <i><b>format_name</i></b>
@@ -35,6 +40,8 @@ where <i><b>option</i></b> can be one of:
     ROW_GROUP_LIMIT_BYTES <i><b>integer</i></b>
 </pre>
 
+Copy the contents of `source_table` to `file_name.json` in JSON format:
+
 ```sql
 > COPY source_table TO 'file_name.json';
 +-------+
@@ -42,7 +49,12 @@ where <i><b>option</i></b> can be one of:
 +-------+
 | 2     |
 +-------+
+```
+
+Copy the contents of `source_table` to one or more Parquet formatted
+files in the `dir_name` directory:
 
+```sql
 > COPY source_table TO 'dir_name' (FORMAT parquet, PER_THREAD_OUTPUT true);
 +-------+
 | count |
@@ -51,6 +63,19 @@ where <i><b>option</i></b> can be one of:
 +-------+
 ```
 
+Run the query `SELECT * from source ORDER BY time` and write the
+results (maintaining the order) to a parquet file named
+`output.parquet` with a maximum parquet row group size of 10MB:
+
+```sql
+> COPY (SELECT * from source ORDER BY time) TO 'output.parquet' 
(ROW_GROUP_LIMIT_BYTES 10000000);
++-------+
+| count |
++-------+
+| 2     |
++-------+
+```
+
 ## INSERT
 
 Insert values into a table.

[arrow-datafusion] branch main updated: Docs: Add query syntax to `COPY` docs (#7388)

Reply via email to