spark git commit: [SPARK-15894][SQL][DOC] Update docs for controlling #partitions

lian Mon, 20 Jun 2016 23:28:32 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 dbf7f48b6 -> 4e193d3da



[SPARK-15894][SQL][DOC] Update docs for controlling #partitions

## What changes were proposed in this pull request?
Update docs for two parameters `spark.sql.files.maxPartitionBytes` and 
`spark.sql.files.openCostInBytes ` in Other Configuration Options.

## How was this patch tested?
N/A

Author: Takeshi YAMAMURO <linguin....@gmail.com>

Closes #13797 from maropu/SPARK-15894-2.

(cherry picked from commit 41e0ffb19f678e9b1e87f747a5e4e3d44964e39a)
Signed-off-by: Cheng Lian <l...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4e193d3d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4e193d3d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4e193d3d

Branch: refs/heads/branch-2.0
Commit: 4e193d3daf5bdfb38d7df6da5b7abdd53888ec99
Parents: dbf7f48
Author: Takeshi YAMAMURO <linguin....@gmail.com>
Authored: Tue Jun 21 14:27:16 2016 +0800
Committer: Cheng Lian <l...@databricks.com>
Committed: Tue Jun 21 14:27:31 2016 +0800

----------------------------------------------------------------------
 docs/sql-programming-guide.md | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/4e193d3d/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 4206f73..ddf8f70 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -2016,6 +2016,23 @@ that these options will be deprecated in future release 
as more optimizations ar
 <table class="table">
   <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
   <tr>
+    <td><code>spark.sql.files.maxPartitionBytes</code></td>
+    <td>134217728 (128 MB)</td>
+    <td>
+      The maximum number of bytes to pack into a single partition when reading 
files.
+    </td>
+  </tr>
+  <tr>
+    <td><code>spark.sql.files.openCostInBytes</code></td>
+    <td>4194304 (4 MB)</td>
+    <td>
+      The estimated cost to open a file, measured by the number of bytes could 
be scanned in the same
+      time. This is used when putting multiple files into a partition. It is 
better to over estimated,
+      then the partitions with small files will be faster than partitions with 
bigger files (which is
+      scheduled first).
+    </td>
+  </tr>
+  <tr>
     <td><code>spark.sql.autoBroadcastJoinThreshold</code></td>
     <td>10485760 (10 MB)</td>
     <td>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-15894][SQL][DOC] Update docs for controlling #partitions

Reply via email to