[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-06 Thread keypointt
GitHub user keypointt opened a pull request:

https://github.com/apache/spark/pull/14082

[SPARK-16381][SQL][SparkR] Update SQL examples and programming guide for R 
language binding

https://issues.apache.org/jira/browse/SPARK-16381

## What changes were proposed in this pull request?

Update SQL examples and programming guide for R language binding.

Here I just follow example 
https://github.com/apache/spark/compare/master...liancheng:example-snippet-extraction,
 created a separate R file to store all the example code.

## How was this patch tested?

Manual test on my local machine.
Screenshot as below:

![screen shot 2016-07-06 at 4 52 25 
pm](https://cloud.githubusercontent.com/assets/3925641/16638180/13925a58-439a-11e6-8d57-8451a63dcae9.png)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/keypointt/spark SPARK-16381

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14082.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14082


commit e7def7c491c8fb06a73aea2f2e072dbe0e59c1da
Author: Xin Ren 
Date:   2016-07-06T23:45:21Z

[SPARK-16381] move example code to a separate R file




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70001842
  
--- Diff: docs/sql-programming-guide.md ---
@@ -343,50 +336,8 @@ In addition to simple column references and 
expressions, DataFrames also have a
 
 
 
-{% highlight r %}
-# Create the DataFrame
-df <- read.json("examples/src/main/resources/people.json")
-
-# Show the content of the DataFrame
-showDF(df)
-## age  name
-## null Michael
-## 30   Andy
-## 19   Justin
-
-# Print the schema in a tree format
-printSchema(df)
-## root
-## |-- age: long (nullable = true)
-## |-- name: string (nullable = true)
 
-# Select only the "name" column
-showDF(select(df, "name"))
-## name
-## Michael
-## Andy
-## Justin
-
-# Select everybody, but increment the age by 1
-showDF(select(df, df$name, df$age + 1))
-## name(age + 1)
-## Michael null
-## Andy31
-## Justin  20
-
-# Select people older than 21
-showDF(where(df, df$age > 21))
-## age name
-## 30  Andy
-
-# Count people by age
-showDF(count(groupBy(df, "age")))
-## age  count
-## null 1
-## 19   1
-## 30   1
-
-{% endhighlight %}
+{% include_example untyped_transformations r/RSparkSQLExample.R %}
--- End diff --

this is just internal stuff, but "untyped_transformations" is a bit odd? 
shouldn't we call this "dataframe_operations" or something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70001993
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,175 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+showDF(df)
--- End diff --

Not new in this PR, I think `head(df)` would be more intuitive?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70002123
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,175 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+showDF(df)
--- End diff --

ditto other instances of `showDF` below


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70002444
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,175 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+showDF(df)
+# $example off:create_DataFrames$
+
+
+# $example on:untyped_transformations$
+# Create the DataFrame
+df <- read.json("examples/src/main/resources/people.json")
+
+# Show the content of the DataFrame
+showDF(df)
+## age  name
+## null Michael
+## 30   Andy
+## 19   Justin
+
+# Print the schema in a tree format
+printSchema(df)
+## root
+## |-- age: long (nullable = true)
+## |-- name: string (nullable = true)
+
+# Select only the "name" column
+showDF(select(df, "name"))
+## name
+## Michael
+## Andy
+## Justin
+
+# Select everybody, but increment the age by 1
+showDF(select(df, df$name, df$age + 1))
+## name(age + 1)
+## Michael null
+## Andy31
+## Justin  20
+
+# Select people older than 21
+showDF(where(df, df$age > 21))
+## age name
+## 30  Andy
+
+# Count people by age
+showDF(count(groupBy(df, "age")))
+## age  count
+## null 1
+## 19   1
+## 30   1
+# $example off:untyped_transformations$
+
+
+# $example on:sql_query$
+df <- sql("SELECT * FROM table")
--- End diff --

this example line is fine in doc but it won't run in an example R file - I 
think it does illustrate how to run a SQL query but there is no setup to create 
a temp view `table` before using it...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70002571
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,175 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+showDF(df)
+# $example off:create_DataFrames$
+
+
+# $example on:untyped_transformations$
+# Create the DataFrame
+df <- read.json("examples/src/main/resources/people.json")
+
+# Show the content of the DataFrame
+showDF(df)
+## age  name
+## null Michael
+## 30   Andy
+## 19   Justin
+
+# Print the schema in a tree format
+printSchema(df)
+## root
+## |-- age: long (nullable = true)
+## |-- name: string (nullable = true)
+
+# Select only the "name" column
+showDF(select(df, "name"))
+## name
+## Michael
+## Andy
+## Justin
+
+# Select everybody, but increment the age by 1
+showDF(select(df, df$name, df$age + 1))
+## name(age + 1)
+## Michael null
+## Andy31
+## Justin  20
+
+# Select people older than 21
+showDF(where(df, df$age > 21))
+## age name
+## 30  Andy
+
+# Count people by age
+showDF(count(groupBy(df, "age")))
+## age  count
+## null 1
+## 19   1
+## 30   1
+# $example off:untyped_transformations$
+
+
+# $example on:sql_query$
+df <- sql("SELECT * FROM table")
+# $example off:sql_query$
+
+
+# $example on:source_parquet$
+df <- read.df("examples/src/main/resources/users.parquet")
+write.df(select(df, "name", "favorite_color"), "namesAndFavColors.parquet")
+# $example off:source_parquet$
+
+
+# $example on:source_json$
+df <- read.df("examples/src/main/resources/people.json", "json")
+write.df(select(df, "name", "age"), "namesAndAges.parquet", "parquet")
+# $example off:source_json$
+
+
+# $example on:direct_query$
+df <- sql("SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`")
+# $example off:direct_query$
+
+
+# $example on:load_programmatically$
+schemaPeople # The SparkDataFrame from the previous example.
+
+# SparkDataFrame can be saved as Parquet files, maintaining the schema 
information.
+write.parquet(schemaPeople, "people.parquet")
+
+# Read in the Parquet file created above. Parquet files are 
self-describing so the schema is preserved.
+# The result of loading a parquet file is also a DataFrame.
+parquetFile <- read.parquet("people.parquet")
+
+# Parquet files can also be used to create a temporary view and then used 
in SQL statements.
+createOrReplaceTempView(parquetFile, "parquetFile")
+teenagers <- sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 
19")
+schema <- structType(structField("name", "string"))
+teenNames <- dapply(df, function(p) { cbind(paste("Name:", p$name)) }, 
schema)
+for (teenName in collect(teenNames)$name) {
+  cat(teenName, "\n")
+}
--- End diff --

it might be good to add run output like in L133 or below..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-07 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70007444
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,175 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+showDF(df)
--- End diff --

you mean all `showDF()` be replaced by `head()`? eg. change 
`showDF(select(df, "name"))` to `head(select(df, "name"))` too? 

or should we leave both `showDF()` and `head()` as examples to reader?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-07 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70008001
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,175 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+showDF(df)
--- End diff --

I'd use `head` as the default one for most examples. It feels most natural. 
We can then add one line to the programming guide that reads like "You can also 
`showDF` to print the first few rows and optionally truncate the printing of 
long values"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-07 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70008864
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,175 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+showDF(df)
--- End diff --

I just ran `showDF()` and it seems this method not working, while `head()` 
works fine.

Is it my problem when building sparkR by `build/mvn -DskipTests -Psparkr 
-Phive package`?

```
> df <- read.json("examples/src/main/resources/people.json")

> showDF(df)
16/07/07 17:02:54 WARN RBackendHandler: cannot find matching method class 
org.apache.spark.sql.Dataset.showString. Candidates are:
16/07/07 17:02:54 WARN RBackendHandler: showString(int,int)
16/07/07 17:02:54 ERROR RBackendHandler: showString on 7 failed
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :

> showDF(select(df, "name"))
16/07/07 17:03:38 WARN RBackendHandler: cannot find matching method class 
org.apache.spark.sql.Dataset.showString. Candidates are:
16/07/07 17:03:38 WARN RBackendHandler: showString(int,int)
16/07/07 17:03:38 ERROR RBackendHandler: showString on 18 failed
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-07 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70009456
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,175 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+showDF(df)
--- End diff --

Do the unit tests pass ? We have a unit test for `showDF`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-07 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70011061
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,175 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+showDF(df)
--- End diff --

sorry, just ignore above one

I re-build with `build/mvn -DskipTests -Psparkr package ` and everything 
works...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-07 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70011133
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,175 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+showDF(df)
--- End diff --

@shivaram your idea is better, vote+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-07 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70011796
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,175 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+showDF(df)
+# $example off:create_DataFrames$
+
+
+# $example on:untyped_transformations$
+# Create the DataFrame
+df <- read.json("examples/src/main/resources/people.json")
+
+# Show the content of the DataFrame
+showDF(df)
+## age  name
+## null Michael
+## 30   Andy
+## 19   Justin
+
+# Print the schema in a tree format
+printSchema(df)
+## root
+## |-- age: long (nullable = true)
+## |-- name: string (nullable = true)
+
+# Select only the "name" column
+showDF(select(df, "name"))
+## name
+## Michael
+## Andy
+## Justin
+
+# Select everybody, but increment the age by 1
+showDF(select(df, df$name, df$age + 1))
+## name(age + 1)
+## Michael null
+## Andy31
+## Justin  20
+
+# Select people older than 21
+showDF(where(df, df$age > 21))
+## age name
+## 30  Andy
+
+# Count people by age
+showDF(count(groupBy(df, "age")))
+## age  count
+## null 1
+## 19   1
+## 30   1
+# $example off:untyped_transformations$
+
+
+# $example on:sql_query$
+df <- sql("SELECT * FROM table")
--- End diff --

here should I add more to create the `table`? or just leave it since it's 
only for demonstration purpose?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-07 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70012365
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,175 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+showDF(df)
+# $example off:create_DataFrames$
+
+
+# $example on:untyped_transformations$
+# Create the DataFrame
+df <- read.json("examples/src/main/resources/people.json")
+
+# Show the content of the DataFrame
+showDF(df)
+## age  name
+## null Michael
+## 30   Andy
+## 19   Justin
+
+# Print the schema in a tree format
+printSchema(df)
+## root
+## |-- age: long (nullable = true)
+## |-- name: string (nullable = true)
+
+# Select only the "name" column
+showDF(select(df, "name"))
+## name
+## Michael
+## Andy
+## Justin
+
+# Select everybody, but increment the age by 1
+showDF(select(df, df$name, df$age + 1))
+## name(age + 1)
+## Michael null
+## Andy31
+## Justin  20
+
+# Select people older than 21
+showDF(where(df, df$age > 21))
+## age name
+## 30  Andy
+
+# Count people by age
+showDF(count(groupBy(df, "age")))
+## age  count
+## null 1
+## 19   1
+## 30   1
+# $example off:untyped_transformations$
+
+
+# $example on:sql_query$
+df <- sql("SELECT * FROM table")
--- End diff --

Lets register `df` from above using `createExternalTable` and then run the 
query. We should aim for a case where this R file should be executable on its 
own


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70121398
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,198 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session()
--- End diff --

The python code snippet shows how to set appName, options etc. Could we do 
something similar here ? i.e something like 
```
sparkR.session(appName='MyApp', 
sparkConfig=list(spark.executor.memory="1g"))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70121544
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,198 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+head(df)
+
+# Another method to print the first few rows and optionally truncate the 
printing of long values
+head(df)
--- End diff --

This should be `showDF(df)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70121939
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,198 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+head(df)
+
+# Another method to print the first few rows and optionally truncate the 
printing of long values
+head(df)
+# $example off:create_DataFrames$
+
+
+# $example on:dataframe_operations$
+# Create the DataFrame
+df <- read.json("examples/src/main/resources/people.json")
+
+# Show the content of the DataFrame
+head(df)
+## age  name
+## null Michael
+## 30   Andy
+## 19   Justin
+
+# Print the schema in a tree format
+printSchema(df)
+## root
+## |-- age: long (nullable = true)
+## |-- name: string (nullable = true)
+
+# Select only the "name" column
+head(select(df, "name"))
+## name
+## Michael
+## Andy
+## Justin
+
+# Select everybody, but increment the age by 1
+head(select(df, df$name, df$age + 1))
+## name(age + 1)
+## Michael null
+## Andy31
+## Justin  20
+
+# Select people older than 21
+head(where(df, df$age > 21))
+## age name
+## 30  Andy
+
+# Count people by age
+head(count(groupBy(df, "age")))
+## age  count
+## null 1
+## 19   1
+## 30   1
+# $example off:dataframe_operations$
+
+
+# Create a DataFrame from json file
+path <- file.path(Sys.getenv("SPARK_HOME"), 
"examples/src/main/resources/people.json")
--- End diff --

can we use the same `df` from before or do we need to create a new one here 
?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70122235
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,198 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session()
--- End diff --

sure I'll add it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70122204
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,198 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+head(df)
+
+# Another method to print the first few rows and optionally truncate the 
printing of long values
+head(df)
--- End diff --

oh sorry...fixing it now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70122361
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,198 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+head(df)
+
+# Another method to print the first few rows and optionally truncate the 
printing of long values
+head(df)
+# $example off:create_DataFrames$
+
+
+# $example on:dataframe_operations$
+# Create the DataFrame
+df <- read.json("examples/src/main/resources/people.json")
+
+# Show the content of the DataFrame
+head(df)
+## age  name
+## null Michael
+## 30   Andy
+## 19   Justin
+
+# Print the schema in a tree format
+printSchema(df)
+## root
+## |-- age: long (nullable = true)
+## |-- name: string (nullable = true)
+
+# Select only the "name" column
+head(select(df, "name"))
+## name
+## Michael
+## Andy
+## Justin
+
+# Select everybody, but increment the age by 1
+head(select(df, df$name, df$age + 1))
+## name(age + 1)
+## Michael null
+## Andy31
+## Justin  20
+
+# Select people older than 21
+head(where(df, df$age > 21))
+## age name
+## 30  Andy
+
+# Count people by age
+head(count(groupBy(df, "age")))
+## age  count
+## null 1
+## 19   1
+## 30   1
+# $example off:dataframe_operations$
+
+
+# Create a DataFrame from json file
+path <- file.path(Sys.getenv("SPARK_HOME"), 
"examples/src/main/resources/people.json")
+peopleDF <- read.json(path)
+# Register this DataFrame as a table.
+createOrReplaceTempView(peopleDF, "table")
+# $example on:sql_query$
+df <- sql("SELECT * FROM table")
+# $example off:sql_query$
+
+
+# $example on:source_parquet$
+df <- read.df("examples/src/main/resources/users.parquet")
+write.df(select(df, "name", "favorite_color"), "namesAndFavColors.parquet")
+# $example off:source_parquet$
+
+
+# $example on:source_json$
+df <- read.df("examples/src/main/resources/people.json", "json")
+write.df(select(df, "name", "age"), "namesAndAges.parquet", "parquet")
--- End diff --

I have slight preference to be more verbose and write this as 
`write.df(..., source = "parquet")`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70122513
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,175 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+showDF(df)
+# $example off:create_DataFrames$
+
+
+# $example on:untyped_transformations$
+# Create the DataFrame
+df <- read.json("examples/src/main/resources/people.json")
+
+# Show the content of the DataFrame
+showDF(df)
+## age  name
+## null Michael
+## 30   Andy
+## 19   Justin
+
+# Print the schema in a tree format
+printSchema(df)
+## root
+## |-- age: long (nullable = true)
+## |-- name: string (nullable = true)
+
+# Select only the "name" column
+showDF(select(df, "name"))
+## name
+## Michael
+## Andy
+## Justin
+
+# Select everybody, but increment the age by 1
+showDF(select(df, df$name, df$age + 1))
+## name(age + 1)
+## Michael null
+## Andy31
+## Justin  20
+
+# Select people older than 21
+showDF(where(df, df$age > 21))
+## age name
+## 30  Andy
+
+# Count people by age
+showDF(count(groupBy(df, "age")))
+## age  count
+## null 1
+## 19   1
+## 30   1
+# $example off:untyped_transformations$
+
+
+# $example on:sql_query$
+df <- sql("SELECT * FROM table")
+# $example off:sql_query$
+
+
+# $example on:source_parquet$
+df <- read.df("examples/src/main/resources/users.parquet")
+write.df(select(df, "name", "favorite_color"), "namesAndFavColors.parquet")
+# $example off:source_parquet$
+
+
+# $example on:source_json$
+df <- read.df("examples/src/main/resources/people.json", "json")
+write.df(select(df, "name", "age"), "namesAndAges.parquet", "parquet")
+# $example off:source_json$
+
+
+# $example on:direct_query$
+df <- sql("SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`")
+# $example off:direct_query$
+
+
+# $example on:load_programmatically$
+schemaPeople # The SparkDataFrame from the previous example.
+
+# SparkDataFrame can be saved as Parquet files, maintaining the schema 
information.
+write.parquet(schemaPeople, "people.parquet")
+
+# Read in the Parquet file created above. Parquet files are 
self-describing so the schema is preserved.
+# The result of loading a parquet file is also a DataFrame.
+parquetFile <- read.parquet("people.parquet")
+
+# Parquet files can also be used to create a temporary view and then used 
in SQL statements.
+createOrReplaceTempView(parquetFile, "parquetFile")
+teenagers <- sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 
19")
+schema <- structType(structField("name", "string"))
+teenNames <- dapply(df, function(p) { cbind(paste("Name:", p$name)) }, 
schema)
+for (teenName in collect(teenNames)$name) {
+  cat(teenName, "\n")
+}
--- End diff --

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70124201
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,198 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+head(df)
+
+# Another method to print the first few rows and optionally truncate the 
printing of long values
+head(df)
+# $example off:create_DataFrames$
+
+
+# $example on:dataframe_operations$
+# Create the DataFrame
+df <- read.json("examples/src/main/resources/people.json")
+
+# Show the content of the DataFrame
+head(df)
+## age  name
+## null Michael
+## 30   Andy
+## 19   Justin
+
+# Print the schema in a tree format
+printSchema(df)
+## root
+## |-- age: long (nullable = true)
+## |-- name: string (nullable = true)
+
+# Select only the "name" column
+head(select(df, "name"))
+## name
+## Michael
+## Andy
+## Justin
+
+# Select everybody, but increment the age by 1
+head(select(df, df$name, df$age + 1))
+## name(age + 1)
+## Michael null
+## Andy31
+## Justin  20
+
+# Select people older than 21
+head(where(df, df$age > 21))
+## age name
+## 30  Andy
+
+# Count people by age
+head(count(groupBy(df, "age")))
+## age  count
+## null 1
+## 19   1
+## 30   1
+# $example off:dataframe_operations$
+
+
+# Create a DataFrame from json file
+path <- file.path(Sys.getenv("SPARK_HOME"), 
"examples/src/main/resources/people.json")
+peopleDF <- read.json(path)
+# Register this DataFrame as a table.
+createOrReplaceTempView(peopleDF, "table")
+# $example on:sql_query$
+df <- sql("SELECT * FROM table")
+# $example off:sql_query$
+
+
+# $example on:source_parquet$
+df <- read.df("examples/src/main/resources/users.parquet")
+write.df(select(df, "name", "favorite_color"), "namesAndFavColors.parquet")
+# $example off:source_parquet$
+
+
+# $example on:source_json$
+df <- read.df("examples/src/main/resources/people.json", "json")
+write.df(select(df, "name", "age"), "namesAndAges.parquet", "parquet")
+# $example off:source_json$
+
+
+# $example on:direct_query$
+df <- sql("SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`")
+# $example off:direct_query$
+
+
+# $example on:load_programmatically$
+df <- read.df("examples/src/main/resources/people.json", "json")
+
+# SparkDataFrame can be saved as Parquet files, maintaining the schema 
information.
+write.parquet(df, "people.parquet")
+
+# Read in the Parquet file created above. Parquet files are 
self-describing so the schema is preserved.
+# The result of loading a parquet file is also a DataFrame.
+parquetFile <- read.parquet("people.parquet")
+
+# Parquet files can also be used to create a temporary view and then used 
in SQL statements.
+createOrReplaceTempView(parquetFile, "parquetFile")
+teenagers <- sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 
19")
+head(teenagers)
+## name
+## 1 Justin
+
--- End diff --

Would be good to add a comment as to what we are doing in the following 
code block. Something like `We can also run custom R-UDFs on Spark DataFrames. 
Here we prefix all the names with "Name:"`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscr

[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70124162
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,198 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+head(df)
+
+# Another method to print the first few rows and optionally truncate the 
printing of long values
+head(df)
+# $example off:create_DataFrames$
+
+
+# $example on:dataframe_operations$
+# Create the DataFrame
+df <- read.json("examples/src/main/resources/people.json")
+
+# Show the content of the DataFrame
+head(df)
+## age  name
+## null Michael
+## 30   Andy
+## 19   Justin
+
+# Print the schema in a tree format
+printSchema(df)
+## root
+## |-- age: long (nullable = true)
+## |-- name: string (nullable = true)
+
+# Select only the "name" column
+head(select(df, "name"))
+## name
+## Michael
+## Andy
+## Justin
+
+# Select everybody, but increment the age by 1
+head(select(df, df$name, df$age + 1))
+## name(age + 1)
+## Michael null
+## Andy31
+## Justin  20
+
+# Select people older than 21
+head(where(df, df$age > 21))
+## age name
+## 30  Andy
+
+# Count people by age
+head(count(groupBy(df, "age")))
+## age  count
+## null 1
+## 19   1
+## 30   1
+# $example off:dataframe_operations$
+
+
+# Create a DataFrame from json file
+path <- file.path(Sys.getenv("SPARK_HOME"), 
"examples/src/main/resources/people.json")
--- End diff --

we can use the `df` from before, basically `peopleDF` here is the same as 
`df` before

but right after this line is `df <- sql("SELECT * FROM table")`, I think 
it's better use another name in this example for clarity. 

what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70124300
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,198 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+head(df)
+
+# Another method to print the first few rows and optionally truncate the 
printing of long values
+head(df)
+# $example off:create_DataFrames$
+
+
+# $example on:dataframe_operations$
+# Create the DataFrame
+df <- read.json("examples/src/main/resources/people.json")
+
+# Show the content of the DataFrame
+head(df)
+## age  name
+## null Michael
+## 30   Andy
+## 19   Justin
+
+# Print the schema in a tree format
+printSchema(df)
+## root
+## |-- age: long (nullable = true)
+## |-- name: string (nullable = true)
+
+# Select only the "name" column
+head(select(df, "name"))
+## name
+## Michael
+## Andy
+## Justin
+
+# Select everybody, but increment the age by 1
+head(select(df, df$name, df$age + 1))
+## name(age + 1)
+## Michael null
+## Andy31
+## Justin  20
+
+# Select people older than 21
+head(where(df, df$age > 21))
+## age name
+## 30  Andy
+
+# Count people by age
+head(count(groupBy(df, "age")))
+## age  count
+## null 1
+## 19   1
+## 30   1
+# $example off:dataframe_operations$
+
+
+# Create a DataFrame from json file
+path <- file.path(Sys.getenv("SPARK_HOME"), 
"examples/src/main/resources/people.json")
+peopleDF <- read.json(path)
+# Register this DataFrame as a table.
+createOrReplaceTempView(peopleDF, "table")
+# $example on:sql_query$
+df <- sql("SELECT * FROM table")
+# $example off:sql_query$
+
+
+# $example on:source_parquet$
+df <- read.df("examples/src/main/resources/users.parquet")
+write.df(select(df, "name", "favorite_color"), "namesAndFavColors.parquet")
+# $example off:source_parquet$
+
+
+# $example on:source_json$
+df <- read.df("examples/src/main/resources/people.json", "json")
+write.df(select(df, "name", "age"), "namesAndAges.parquet", "parquet")
--- End diff --

sure, I will do it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70124885
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,198 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+head(df)
+
+# Another method to print the first few rows and optionally truncate the 
printing of long values
+head(df)
+# $example off:create_DataFrames$
+
+
+# $example on:dataframe_operations$
+# Create the DataFrame
+df <- read.json("examples/src/main/resources/people.json")
+
+# Show the content of the DataFrame
+head(df)
+## age  name
+## null Michael
+## 30   Andy
+## 19   Justin
+
+# Print the schema in a tree format
+printSchema(df)
+## root
+## |-- age: long (nullable = true)
+## |-- name: string (nullable = true)
+
+# Select only the "name" column
+head(select(df, "name"))
+## name
+## Michael
+## Andy
+## Justin
+
+# Select everybody, but increment the age by 1
+head(select(df, df$name, df$age + 1))
+## name(age + 1)
+## Michael null
+## Andy31
+## Justin  20
+
+# Select people older than 21
+head(where(df, df$age > 21))
+## age name
+## 30  Andy
+
+# Count people by age
+head(count(groupBy(df, "age")))
+## age  count
+## null 1
+## 19   1
+## 30   1
+# $example off:dataframe_operations$
+
+
+# Create a DataFrame from json file
+path <- file.path(Sys.getenv("SPARK_HOME"), 
"examples/src/main/resources/people.json")
+peopleDF <- read.json(path)
+# Register this DataFrame as a table.
+createOrReplaceTempView(peopleDF, "table")
+# $example on:sql_query$
+df <- sql("SELECT * FROM table")
+# $example off:sql_query$
+
+
+# $example on:source_parquet$
+df <- read.df("examples/src/main/resources/users.parquet")
+write.df(select(df, "name", "favorite_color"), "namesAndFavColors.parquet")
+# $example off:source_parquet$
+
+
+# $example on:source_json$
+df <- read.df("examples/src/main/resources/people.json", "json")
+write.df(select(df, "name", "age"), "namesAndAges.parquet", "parquet")
+# $example off:source_json$
+
+
+# $example on:direct_query$
+df <- sql("SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`")
+# $example off:direct_query$
+
+
+# $example on:load_programmatically$
+df <- read.df("examples/src/main/resources/people.json", "json")
+
+# SparkDataFrame can be saved as Parquet files, maintaining the schema 
information.
+write.parquet(df, "people.parquet")
+
+# Read in the Parquet file created above. Parquet files are 
self-describing so the schema is preserved.
+# The result of loading a parquet file is also a DataFrame.
+parquetFile <- read.parquet("people.parquet")
+
+# Parquet files can also be used to create a temporary view and then used 
in SQL statements.
+createOrReplaceTempView(parquetFile, "parquetFile")
+teenagers <- sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 
19")
+head(teenagers)
+## name
+## 1 Justin
+
--- End diff --

sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70125056
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,198 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+head(df)
+
+# Another method to print the first few rows and optionally truncate the 
printing of long values
+head(df)
+# $example off:create_DataFrames$
+
+
+# $example on:dataframe_operations$
+# Create the DataFrame
+df <- read.json("examples/src/main/resources/people.json")
+
+# Show the content of the DataFrame
+head(df)
+## age  name
+## null Michael
+## 30   Andy
+## 19   Justin
+
+# Print the schema in a tree format
+printSchema(df)
+## root
+## |-- age: long (nullable = true)
+## |-- name: string (nullable = true)
+
+# Select only the "name" column
+head(select(df, "name"))
+## name
+## Michael
+## Andy
+## Justin
+
+# Select everybody, but increment the age by 1
+head(select(df, df$name, df$age + 1))
+## name(age + 1)
+## Michael null
+## Andy31
+## Justin  20
+
+# Select people older than 21
+head(where(df, df$age > 21))
+## age name
+## 30  Andy
+
+# Count people by age
+head(count(groupBy(df, "age")))
+## age  count
+## null 1
+## 19   1
+## 30   1
+# $example off:dataframe_operations$
+
+
+# Create a DataFrame from json file
+path <- file.path(Sys.getenv("SPARK_HOME"), 
"examples/src/main/resources/people.json")
--- End diff --

Hmm I see. since this is a `select *` query the contents are not changing. 
Would something like 
```
createOrReplaceTempView(df, "table")
df <- sql("SELECT * FROM table")
```
work ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70125600
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,198 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session()
+# $example off:init_session$
+
+
+# $example on:create_DataFrames$
+df <- read.json("examples/src/main/resources/people.json")
+
+# Displays the content of the DataFrame
+head(df)
+
+# Another method to print the first few rows and optionally truncate the 
printing of long values
+head(df)
+# $example off:create_DataFrames$
+
+
+# $example on:dataframe_operations$
+# Create the DataFrame
+df <- read.json("examples/src/main/resources/people.json")
+
+# Show the content of the DataFrame
+head(df)
+## age  name
+## null Michael
+## 30   Andy
+## 19   Justin
+
+# Print the schema in a tree format
+printSchema(df)
+## root
+## |-- age: long (nullable = true)
+## |-- name: string (nullable = true)
+
+# Select only the "name" column
+head(select(df, "name"))
+## name
+## Michael
+## Andy
+## Justin
+
+# Select everybody, but increment the age by 1
+head(select(df, df$name, df$age + 1))
+## name(age + 1)
+## Michael null
+## Andy31
+## Justin  20
+
+# Select people older than 21
+head(where(df, df$age > 21))
+## age name
+## 30  Andy
+
+# Count people by age
+head(count(groupBy(df, "age")))
+## age  count
+## null 1
+## 19   1
+## 30   1
+# $example off:dataframe_operations$
+
+
+# Create a DataFrame from json file
+path <- file.path(Sys.getenv("SPARK_HOME"), 
"examples/src/main/resources/people.json")
--- End diff --

this is great, I'll do it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70151194
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,197 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session(appName='MyApp', 
sparkConfig=list(spark.executor.memory="1g"))
--- End diff --

I think it'll be great if you could run lint-r on this, typically, our R 
style would be something like this:
```
sparkR.session(appName = "MyApp", sparkConfig = list(spark.executor.memory 
= "1g"))
```
- you might want to use `"` to be consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70153229
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,197 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session(appName='MyApp', 
sparkConfig=list(spark.executor.memory="1g"))
--- End diff --

I just found some inconsistency like below, and I'll follow the style you 
suggested.

no space: 
https://github.com/keypointt/spark/blob/d5b0b7f111a28c63ca6e501ff0017af64881f0b4/examples/src/main/r/ml.R#L25

with space: 
https://github.com/keypointt/spark/blob/d5b0b7f111a28c63ca6e501ff0017af64881f0b4/examples/src/main/r/ml.R#L34




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70154681
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,197 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session(appName='MyApp', 
sparkConfig=list(spark.executor.memory="1g"))
--- End diff --

we should probably update those too..



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-08 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14082#discussion_r70155854
  
--- Diff: examples/src/main/r/RSparkSQLExample.R ---
@@ -0,0 +1,197 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(SparkR)
+
+# $example on:init_session$
+sparkR.session(appName='MyApp', 
sparkConfig=list(spark.executor.memory="1g"))
--- End diff --

ok I'll just submit a quick minor patch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14082: [SPARK-16381][SQL][SparkR] Update SQL examples an...

2016-07-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14082


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org