[GitHub] spark pull request #18363: [Spark-21123][Docs][Structured Streaming] Options...

2017-06-20 Thread assafmendelson
Github user assafmendelson closed the pull request at:

https://github.com/apache/spark/pull/18363


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18363: [Spark-21123][Docs][Structured Streaming] Options...

2017-06-20 Thread assafmendelson
GitHub user assafmendelson opened a pull request:

https://github.com/apache/spark/pull/18363

[Spark-21123][Docs][Structured Streaming] Options for file stream source 
are in a wrong table - version to fix 2.1

## What changes were proposed in this pull request?

The description for several options of File Source for structured streaming 
appeared in the File Sink description instead.

This commit continues on PR #18342 and targets the fixes for the 
documentation of version spark version 2.1

## How was this patch tested?

Built the documentation by SKIP_API=1 jekyll build and visually inspected 
the structured streaming programming guide.

@zsxwing This is the PR to fix version 2.1 as discussed in PR #18342 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/assafmendelson/spark spark-21123-for-spark2.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18363


commit 595ffd7f029f65f8741d554b81ce0e1f0f66c322
Author: assafmendelson <assaf.mendel...@gmail.com>
Date:   2017-06-20T14:25:12Z

File source options for spark 2.1 appeared under File sink




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18342: [Spark-21123][Docs][Structured Streaming] Options for fi...

2017-06-20 Thread assafmendelson
Github user assafmendelson commented on the issue:

https://github.com/apache/spark/pull/18342
  
@zsxwing My jira account is assaf.mendelson.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18342: [Spark 21123][Docs][Structured Streaming] Options...

2017-06-18 Thread assafmendelson
GitHub user assafmendelson opened a pull request:

https://github.com/apache/spark/pull/18342

[Spark 21123][Docs][Structured Streaming] Options for file stream source 
are in a wrong table

## What changes were proposed in this pull request?

The description for several options of File Source for structured streaming 
appeared in the File Sink description instead.

This pull request has two commits: The first includes changes to the 
version as it appeared in spark 2.1 and the second handled an additional option 
added for spark 2.2

## How was this patch tested?

Built the documentation by SKIP_API=1 jekyll build and visually inspected 
the structured streaming programming guide.

The original documentation was written by @tdas and @lw-lin 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/assafmendelson/spark spark-21123

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18342.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18342


commit f8044581b35eaa33aa78c29c602fedf1d89c06b8
Author: assafmendelson <assaf.mendel...@gmail.com>
Date:   2017-06-18T06:20:50Z

File source options for spark 2.1 appeared under File sink

commit 13ff475f42f22f4bdee4b982c217feb0c8825d57
Author: assafmendelson <assaf.mendel...@gmail.com>
Date:   2017-06-18T06:23:31Z

Additional File source options for spark 2.2 appeared under File sink




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2016-12-19 Thread assafmendelson
Github user assafmendelson commented on a diff in the pull request:

https://github.com/apache/spark/pull/16329#discussion_r93025608
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedUntypedAggregation.scala
 ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.sql
+
+// $example on:untyped_custom_aggregation$
+import org.apache.spark.sql.expressions.MutableAggregationBuffer
+import org.apache.spark.sql.expressions.UserDefinedAggregateFunction
+import org.apache.spark.sql.types._
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.SparkSession
+// $example off:untyped_custom_aggregation$
+
+object UserDefinedUntypedAggregation {
+
+  // $example on:untyped_custom_aggregation$
+  object MyAverage extends UserDefinedAggregateFunction {
+// Data types of input arguments
+def inputSchema: StructType = StructType(StructField("salary", 
LongType) :: Nil)
--- End diff --

I would go with inputColumn. 
What I think should be more strongly explained is that this is basically 
the schema of the input for the aggregate function and not for the source 
dataframe.  Basically someone might think that their original dataframe might 
need to have this name for the column.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2016-12-19 Thread assafmendelson
Github user assafmendelson commented on a diff in the pull request:

https://github.com/apache/spark/pull/16329#discussion_r92990221
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedUntypedAggregation.scala
 ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.sql
+
+// $example on:untyped_custom_aggregation$
+import org.apache.spark.sql.expressions.MutableAggregationBuffer
+import org.apache.spark.sql.expressions.UserDefinedAggregateFunction
+import org.apache.spark.sql.types._
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.SparkSession
+// $example off:untyped_custom_aggregation$
+
+object UserDefinedUntypedAggregation {
+
+  // $example on:untyped_custom_aggregation$
+  object MyAverage extends UserDefinedAggregateFunction {
+// Data types of input arguments
+def inputSchema: StructType = StructType(StructField("salary", 
LongType) :: Nil)
+// Data types of values in the aggregation buffer
+def bufferSchema: StructType = {
+  StructType(StructField("sum", LongType) :: StructField("count", 
LongType) :: Nil)
+}
+// The data type of the returned value
+def dataType: DataType = DoubleType
+// Whether this function always returns the same output on the 
identical input
+def deterministic: Boolean = true
+// Initializes the given aggregation buffer
+def initialize(buffer: MutableAggregationBuffer): Unit = {
--- End diff --

I believe an explanation on what MutableAggregationBuffer is should be 
added.
Basically explain how to access it, what it means for it to be mutable 
(including probably explaining that arrays and map types are immutable even if 
the buffer itself is mutable) etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...

2016-12-19 Thread assafmendelson
Github user assafmendelson commented on a diff in the pull request:

https://github.com/apache/spark/pull/16329#discussion_r92989437
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedUntypedAggregation.scala
 ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.sql
+
+// $example on:untyped_custom_aggregation$
+import org.apache.spark.sql.expressions.MutableAggregationBuffer
+import org.apache.spark.sql.expressions.UserDefinedAggregateFunction
+import org.apache.spark.sql.types._
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.SparkSession
+// $example off:untyped_custom_aggregation$
+
+object UserDefinedUntypedAggregation {
+
+  // $example on:untyped_custom_aggregation$
+  object MyAverage extends UserDefinedAggregateFunction {
+// Data types of input arguments
+def inputSchema: StructType = StructType(StructField("salary", 
LongType) :: Nil)
--- End diff --

Maybe add a little explanation here. For example, when I first saw this I 
tried to figure out where "salary" appears in the code as in practice it is 
being accessed by index only (input.getLong(0)). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org