[GitHub] spark pull request #22801: [SPARK-25656][SQL][DOC][EXAMPLE] Add a doc and ex...

2018-10-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22801#discussion_r227434947
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala
 ---
@@ -56,6 +56,13 @@ object SQLDataSourceExample {
   .option("header", "true")
   .load("examples/src/main/resources/people.csv")
 // $example off:manual_load_options_csv$
+// $example on:manual_save_options_orc$
+usersDF.write.format("orc")
+  .option("orc.bloom.filter.columns", "favorite_color")
+  .option("orc.dictionary.key.threshold", "1.0")
+  .option("orc.column.encoding.direct", "name")
+  .save("users_with_options.orc")
--- End diff --

Also, cc @dbtsai .
This doc is only for Spark 3.0.0 since `orc.column.encoding.direct` is 
added to `master` branch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22801: [SPARK-25656][SQL][DOC][EXAMPLE] Add a doc and ex...

2018-10-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22801#discussion_r227457282
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -118,6 +118,10 @@ df <- 
read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSch
 namesAndAges <- select(df, "name", "age")
 # $example off:manual_load_options_csv$
 
+# $example on:manual_save_options_orc$
+df <- read.df("examples/src/main/resources/users.orc", "orc")
+write.orc(df, "users_with_options.orc", 
orc.bloom.filter.columns="favorite_color", orc.dictionary.key.threshold=1.0, 
orc.column.encoding.direct="name")
+# $example off:manual_save_options_orc$
--- End diff --

@felixcheung . Could you review this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22801: [SPARK-25656][SQL][DOC][EXAMPLE] Add a doc and ex...

2018-10-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22801#discussion_r227466415
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -118,6 +118,10 @@ df <- 
read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSch
 namesAndAges <- select(df, "name", "age")
 # $example off:manual_load_options_csv$
 
+# $example on:manual_save_options_orc$
+df <- read.df("examples/src/main/resources/users.orc", "orc")
+write.orc(df, "users_with_options.orc", 
orc.bloom.filter.columns="favorite_color", orc.dictionary.key.threshold=1.0, 
orc.column.encoding.direct="name")
+# $example off:manual_save_options_orc$
--- End diff --

we should put space after param
(gosh same for csv example above)

`orc.bloom.filter.columns = "favorite_color", orc.dictionary.key.threshold 
= 1.0, orc.column.encoding.direct = "name")`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22801: [SPARK-25656][SQL][DOC][EXAMPLE] Add a doc and ex...

2018-10-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22801#discussion_r227466613
  
--- Diff: examples/src/main/python/sql/datasource.py ---
@@ -57,6 +57,15 @@ def basic_datasource_example(spark):
  format="csv", sep=":", inferSchema="true", 
header="true")
 # $example off:manual_load_options_csv$
 
+# $example on:manual_save_options_orc$
+df = spark.read.orc("examples/src/main/resources/users.orc")
+(df.write.format("orc")
+.option("orc.bloom.filter.columns", "favorite_color")
+.option("orc.dictionary.key.threshold", "1.0")
+.option("orc.column.encoding.direct", 'name')
--- End diff --

use same quote? `"` or `'` for name?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22801: [SPARK-25656][SQL][DOC][EXAMPLE] Add a doc and ex...

2018-10-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22801#discussion_r227478845
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -118,6 +118,10 @@ df <- 
read.df("examples/src/main/resources/people.csv", "csv", sep=";", inferSch
 namesAndAges <- select(df, "name", "age")
 # $example off:manual_load_options_csv$
 
+# $example on:manual_save_options_orc$
+df <- read.df("examples/src/main/resources/users.orc", "orc")
+write.orc(df, "users_with_options.orc", 
orc.bloom.filter.columns="favorite_color", orc.dictionary.key.threshold=1.0, 
orc.column.encoding.direct="name")
+# $example off:manual_save_options_orc$
--- End diff --

Thank you!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22801: [SPARK-25656][SQL][DOC][EXAMPLE] Add a doc and ex...

2018-10-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22801#discussion_r227478986
  
--- Diff: examples/src/main/python/sql/datasource.py ---
@@ -57,6 +57,15 @@ def basic_datasource_example(spark):
  format="csv", sep=":", inferSchema="true", 
header="true")
 # $example off:manual_load_options_csv$
 
+# $example on:manual_save_options_orc$
+df = spark.read.orc("examples/src/main/resources/users.orc")
+(df.write.format("orc")
+.option("orc.bloom.filter.columns", "favorite_color")
+.option("orc.dictionary.key.threshold", "1.0")
+.option("orc.column.encoding.direct", 'name')
--- End diff --

Yep!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22801: [SPARK-25656][SQL][DOC][EXAMPLE] Add a doc and ex...

2018-10-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22801


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org