[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13090#issuecomment-218958319
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58539/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13090#issuecomment-218958318
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13090#issuecomment-218958212
  
**[Test build #58539 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58539/consoleFull)**
 for PR 13090 at commit 
[`ddb9ce6`](https://github.com/apache/spark/commit/ddb9ce6692d9b70b7656196191d3572696ee0a12).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...

2016-05-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13084#discussion_r63136060
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -222,6 +222,33 @@ trait Unevaluable extends Expression {
 
 
 /**
+ * An expression that gets replaced at runtime (currently by the 
optimizer) into a different
+ * expression for evaluation. This is mainly used to provide compatibility 
with other databases.
+ * For example, we use this to support "nvl" by replacing it with 
"coalesce".
+ */
+trait RuntimeReplaceable extends Unevaluable {
--- End diff --

Yea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [spark-15212][SQL]CSV file reader when read fi...

2016-05-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12987#issuecomment-218958088
  
@WeichenXu123 I tried that with `ignoreLeadingWhiteSpace` and 
`ignoreTrailingWhiteSpace` and it seems working fine. I am careful of saying 
this because I am not a committer but personally I would suggest close this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15253] [SQL] Support old table schema c...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13073#issuecomment-218957944
  
**[Test build #58545 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58545/consoleFull)**
 for PR 13073 at commit 
[`f82f7c6`](https://github.com/apache/spark/commit/f82f7c609d3ba69580c4531f348dd6d87f804c2c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15310][SQL] Rename HiveTypeCoercion -> ...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13091#issuecomment-218956881
  
**[Test build #58544 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58544/consoleFull)**
 for PR 13091 at commit 
[`06aa4ae`](https://github.com/apache/spark/commit/06aa4aefb1fe47f552275edf0417e5644820).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-218956629
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58538/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13048#discussion_r63135350
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala ---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.orc
+
+import org.apache.hadoop.conf.Configuration
+
+/**
+ * Options for the ORC data source.
+ */
+private[orc] class OrcOptions(
+@transient private val parameters: Map[String, String],
+@transient private val conf: Configuration)
+  extends Serializable {
+
+  import OrcOptions._
+
+  /**
+   * Compression codec to use. By default use the value specified in 
Hadoop configuration.
+   * Acceptable values are defined in [[shortOrcCompressionCodecNames]].
+   */
+  val compressionCodec: String = {
+val default = conf.get(ORC_COMPRESSION, "SNAPPY")
+
+// Because the ORC configuration value in `default` is not guaranteed 
to be the same
+// with keys in `shortOrcCompressionCodecNames` in Spark, this value 
should not be
+// used as the key for `shortOrcCompressionCodecNames` but just a 
return value.
+parameters.get("compression") match {
--- End diff --

Thank you so much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-218956628
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-218956508
  
**[Test build #58538 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58538/consoleFull)**
 for PR 12268 at commit 
[`cbb1674`](https://github.com/apache/spark/commit/cbb1674ecb4a82bfdb3fed97cdd14adbdd14ffb6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13048#discussion_r63135275
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala ---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.orc
+
+import org.apache.hadoop.conf.Configuration
+
+/**
+ * Options for the ORC data source.
+ */
+private[orc] class OrcOptions(
+@transient private val parameters: Map[String, String],
+@transient private val conf: Configuration)
+  extends Serializable {
+
+  import OrcOptions._
+
+  /**
+   * Compression codec to use. By default use the value specified in 
Hadoop configuration.
+   * Acceptable values are defined in [[shortOrcCompressionCodecNames]].
+   */
+  val compressionCodec: String = {
+val default = conf.get(ORC_COMPRESSION, "SNAPPY")
+
+// Because the ORC configuration value in `default` is not guaranteed 
to be the same
+// with keys in `shortOrcCompressionCodecNames` in Spark, this value 
should not be
+// used as the key for `shortOrcCompressionCodecNames` but just a 
return value.
+parameters.get("compression") match {
--- End diff --

yea i think maybe it's best to remove the hadoop config dependency and just 
depend on parameters, and default to snappy.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15305][ML][DOC]:spark.ml document Bisec...

2016-05-12 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request:

https://github.com/apache/spark/pull/13083#issuecomment-218955997
  
cc @zhengruifeng @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-05-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11724


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-05-12 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11724#issuecomment-218955594
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-05-12 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11724#issuecomment-218955604
  
Merging this into master and 2.0, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15236][SQL][SPARK SHELL] Add spark-defa...

2016-05-12 Thread xwu0226
Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13088#discussion_r63134583
  
--- Diff: repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala 
---
@@ -88,7 +88,8 @@ object Main extends Logging {
 }
 
 val builder = SparkSession.builder.config(conf)
-if (SparkSession.hiveClassesArePresent) {
+if (conf.getBoolean("spark.user.hive.catalog", true)
--- End diff --

Right now from the repl/Main.scala level, the way to generate a 
sparkSession with hive catalog is checking 
`SparkSession.hiveClassesArePresent`, which checks for the classes for 
`HiveSharedState` and `HiveSessionState`.  If these classes are built, repl 
will always start sparkSession that uses hive catalog.  In order for repl to 
use InMemoryCatalog, we need to go the else code path of the code 
```
if (conf.getBoolean("spark.use.hive.catalog", true)
  && SparkSession.hiveClassesArePresent) {
  sparkSession = builder.enableHiveSupport().getOrCreate()
  logInfo("Created Spark session with Hive support")
} else {
  sparkSession = builder.getOrCreate()
  logInfo("Created Spark session")
}
```
I think the default value for the `CATALOG_IMPLEMENTATION` is `in-memory`. 
But maybe I can put this key `spark.sql.catalogImplementation` in the 
spark-defaults.conf ?
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13048#discussion_r63134568
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala ---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.orc
+
+import org.apache.hadoop.conf.Configuration
+
+/**
+ * Options for the ORC data source.
+ */
+private[orc] class OrcOptions(
+@transient private val parameters: Map[String, String],
+@transient private val conf: Configuration)
+  extends Serializable {
+
+  import OrcOptions._
+
+  /**
+   * Compression codec to use. By default use the value specified in 
Hadoop configuration.
+   * Acceptable values are defined in [[shortOrcCompressionCodecNames]].
+   */
+  val compressionCodec: String = {
+val default = conf.get(ORC_COMPRESSION, "SNAPPY")
+
+// Because the ORC configuration value in `default` is not guaranteed 
to be the same
+// with keys in `shortOrcCompressionCodecNames` in Spark, this value 
should not be
+// used as the key for `shortOrcCompressionCodecNames` but just a 
return value.
+parameters.get("compression") match {
--- End diff --

Hm.. sorry I think I got confused. So do you mean leave this part as it is 
and make the default compression value to string `SNAPPY` instead of reading 
the default value from Hadoop configuration?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...

2016-05-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13084


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541] [SQL] SQL function: IFNULL, NULL...

2016-05-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12373


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...

2016-05-12 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13084#issuecomment-218953966
  
Merging to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...

2016-05-12 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13084#issuecomment-218953875
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13048#discussion_r63134045
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala ---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.orc
+
+import org.apache.hadoop.conf.Configuration
+
+/**
+ * Options for the ORC data source.
+ */
+private[orc] class OrcOptions(
+@transient private val parameters: Map[String, String],
+@transient private val conf: Configuration)
+  extends Serializable {
+
+  import OrcOptions._
+
+  /**
+   * Compression codec to use. By default use the value specified in 
Hadoop configuration.
+   * Acceptable values are defined in [[shortOrcCompressionCodecNames]].
+   */
+  val compressionCodec: String = {
+val default = conf.get(ORC_COMPRESSION, "SNAPPY")
--- End diff --

@rxin Thank you for informing me. Then, I will update this to read 
`compression` from parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13048#discussion_r63133864
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala ---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.orc
+
+import org.apache.hadoop.conf.Configuration
+
+/**
+ * Options for the ORC data source.
+ */
+private[orc] class OrcOptions(
+@transient private val parameters: Map[String, String],
+@transient private val conf: Configuration)
+  extends Serializable {
+
+  import OrcOptions._
+
+  /**
+   * Compression codec to use. By default use the value specified in 
Hadoop configuration.
+   * Acceptable values are defined in [[shortOrcCompressionCodecNames]].
+   */
+  val compressionCodec: String = {
+val default = conf.get(ORC_COMPRESSION, "SNAPPY")
--- End diff --

i think overtime it'd be better if we are switching entirely over to 
Spark's own configuration, and propagate values over to Hadoop Configuration, 
not the other way around ...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13048#discussion_r63133871
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala ---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.orc
+
+import org.apache.hadoop.conf.Configuration
+
+/**
+ * Options for the ORC data source.
+ */
+private[orc] class OrcOptions(
+@transient private val parameters: Map[String, String],
+@transient private val conf: Configuration)
+  extends Serializable {
+
+  import OrcOptions._
+
+  /**
+   * Compression codec to use. By default use the value specified in 
Hadoop configuration.
+   * Acceptable values are defined in [[shortOrcCompressionCodecNames]].
+   */
+  val compressionCodec: String = {
+val default = conf.get(ORC_COMPRESSION, "SNAPPY")
--- End diff --

Also I'd just call it compression, rather than orc.compression?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13048#discussion_r63133772
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala ---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.orc
+
+import org.apache.hadoop.conf.Configuration
+
+/**
+ * Options for the ORC data source.
+ */
+private[orc] class OrcOptions(
+@transient private val parameters: Map[String, String],
+@transient private val conf: Configuration)
+  extends Serializable {
+
+  import OrcOptions._
+
+  /**
+   * Compression codec to use. By default use the value specified in 
Hadoop configuration.
+   * Acceptable values are defined in [[shortOrcCompressionCodecNames]].
+   */
+  val compressionCodec: String = {
+val default = conf.get(ORC_COMPRESSION, "SNAPPY")
--- End diff --

Ah... Isn't it still possible to set Hadoop configuration via 
`spark.sessionState.newHadoopConf()` though? I thought it is safe to use this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...

2016-05-12 Thread lresende
Github user lresende closed the pull request at:

https://github.com/apache/spark/pull/13092


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15311] [SQL] Disallow DML on Non-tempor...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13093#issuecomment-218952559
  
**[Test build #58543 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58543/consoleFull)**
 for PR 13093 at commit 
[`5c3b116`](https://github.com/apache/spark/commit/5c3b11621ead9402602b6eaf7531990c34d3ed31).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...

2016-05-12 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13084#discussion_r63133598
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -222,6 +222,33 @@ trait Unevaluable extends Expression {
 
 
 /**
+ * An expression that gets replaced at runtime (currently by the 
optimizer) into a different
+ * expression for evaluation. This is mainly used to provide compatibility 
with other databases.
+ * For example, we use this to support "nvl" by replacing it with 
"coalesce".
+ */
+trait RuntimeReplaceable extends Unevaluable {
--- End diff --

This is very cool!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15236][SQL][SPARK SHELL] Add spark-defa...

2016-05-12 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13088#discussion_r63133610
  
--- Diff: repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala 
---
@@ -88,7 +88,8 @@ object Main extends Logging {
 }
 
 val builder = SparkSession.builder.config(conf)
-if (SparkSession.hiveClassesArePresent) {
+if (conf.getBoolean("spark.user.hive.catalog", true)
--- End diff --

why do we need this? isn't the enableHiveSupport the same as setting a 
config spark.sql.catalogImplementation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...

2016-05-12 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13092#issuecomment-218952186
  
It's better to do this once we have a 2.0 maven artifact published I think. 
Otherwise Mima is going to have duplicates and as we do api tweaks during qa 
period we will have to fight with conflicts. Can you close this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13092#issuecomment-218952073
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58541/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15311] [SQL] Disallow DML on Non-tempor...

2016-05-12 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/13093

[SPARK-15311] [SQL] Disallow DML on Non-temporary Tables when Using 
In-Memory Catalog

 What changes were proposed in this pull request?
So far, when using In-Memory Catalog, we allow DDL operations for 
non-temporary tables. However, the corresponding DML operations are not 
supported. This PR is to issue appropriate exceptions in this case.

Another option is to disallow users to do DDL operations for non-temporary 
tables. @rxin @andrewor14 @yhuai @cloud-fan @liancheng Let me know if you want 
to do it in that way. If so, we have to disable a lot of existing test cases.

 How was this patch tested?
Added test cases in DDLSuite.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark selectAfterCreate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13093.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13093


commit 5c3b11621ead9402602b6eaf7531990c34d3ed31
Author: gatorsmile 
Date:   2016-05-13T04:47:59Z

issue exceptions for non-temporary tables when the catalog is using 
InMemoryCatalog




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13092#issuecomment-218952070
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13092#issuecomment-218952061
  
**[Test build #58541 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58541/consoleFull)**
 for PR 13092 at commit 
[`92dddf5`](https://github.com/apache/spark/commit/92dddf53c04e07f9f78d1bc2b9c1c3cda4c86eff).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15310][SQL] Rename HiveTypeCoercion -> ...

2016-05-12 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13091#issuecomment-218952009
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-05-12 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11724#issuecomment-218951716
  
cc @davies can you review this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...

2016-05-12 Thread lresende
Github user lresende commented on the pull request:

https://github.com/apache/spark/pull/13092#issuecomment-218951567
  
@srowen Please review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15310][SQL] Rename HiveTypeCoercion -> ...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13091#issuecomment-218951119
  
**[Test build #58542 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58542/consoleFull)**
 for PR 13091 at commit 
[`7424af4`](https://github.com/apache/spark/commit/7424af4bcfbad5a9490321b08425739dfaa2ca67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13092#issuecomment-218951117
  
**[Test build #58541 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58541/consoleFull)**
 for PR 13092 at commit 
[`92dddf5`](https://github.com/apache/spark/commit/92dddf53c04e07f9f78d1bc2b9c1c3cda4c86eff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15309] Bump master to version 2.1.0-SNA...

2016-05-12 Thread lresende
GitHub user lresende opened a pull request:

https://github.com/apache/spark/pull/13092

[SPARK-15309] Bump master to version 2.1.0-SNAPSHOT

## What changes were proposed in this pull request?

Update pom artifact version to 2.1.0-SNAPSHOT to avoid any conflicts with 
2.0.0-SNAPSHOT branch.

## How was this patch tested?

Regular build.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lresende/spark SPARK-15309

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13092.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13092


commit 92dddf53c04e07f9f78d1bc2b9c1c3cda4c86eff
Author: Luciano Resende 
Date:   2016-05-13T04:40:16Z

[SPARK-15309] Bump master to version 2.1.0-SNAPSHOT




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [TRIVIAL][Doc] SparkSession class doc example ...

2016-05-12 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13086#issuecomment-218950905
  
Maybe we should add () to the builder function instead? In that case this 
code will work in Java too.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15310][SQL] Rename HiveTypeCoercion -> ...

2016-05-12 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13091#issuecomment-218950766
  
cc @yhuai for review; talked with @marmbrus offline already.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15310][SQL] Rename HiveTypeCoercion -> ...

2016-05-12 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/13091

[SPARK-15310][SQL] Rename HiveTypeCoercion -> TypeCoercion

## What changes were proposed in this pull request?
We originally designed the type coercion rules to match Hive, but over time 
we have diverged. It does not make sense to call it HiveTypeCoercion anymore. 
This patch renames it TypeCoercion.

## How was this patch tested?
Updated unit tests to reflect the rename.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-15310

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13091.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13091


commit 7424af4bcfbad5a9490321b08425739dfaa2ca67
Author: Reynold Xin 
Date:   2016-05-13T04:42:29Z

[SPARK-15310][SQL] Rename HiveTypeCoercion -> TypeCoercion




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-218950690
  
**[Test build #58540 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58540/consoleFull)**
 for PR 12719 at commit 
[`2d56bc2`](https://github.com/apache/spark/commit/2d56bc27b1091ce37103f7427332841c2b996003).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13078#issuecomment-218950522
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58537/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13078#issuecomment-218950521
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

2016-05-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-218950477
  
Rebase to resolve conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13048#discussion_r63132718
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcOptions.scala ---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.orc
+
+import org.apache.hadoop.conf.Configuration
+
+/**
+ * Options for the ORC data source.
+ */
+private[orc] class OrcOptions(
+@transient private val parameters: Map[String, String],
+@transient private val conf: Configuration)
+  extends Serializable {
+
+  import OrcOptions._
+
+  /**
+   * Compression codec to use. By default use the value specified in 
Hadoop configuration.
+   * Acceptable values are defined in [[shortOrcCompressionCodecNames]].
+   */
+  val compressionCodec: String = {
+val default = conf.get(ORC_COMPRESSION, "SNAPPY")
--- End diff --

hmm why are we reading from hadoop configuration? shouldn't we just read 
from parameters?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13078#issuecomment-218950439
  
**[Test build #58537 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58537/consoleFull)**
 for PR 13078 at commit 
[`9b2a5aa`](https://github.com/apache/spark/commit/9b2a5aa9f88cbb35a4843298a7c295297b4ce378).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15306][SQL] Move object expressions int...

2016-05-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13085


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...

2016-05-12 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13084#issuecomment-218949992
  
cc @hvanhovell wanna review this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15306][SQL] Move object expressions int...

2016-05-12 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13085#issuecomment-218950017
  
Merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13090#issuecomment-218949812
  
**[Test build #58539 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58539/consoleFull)**
 for PR 13090 at commit 
[`ddb9ce6`](https://github.com/apache/spark/commit/ddb9ce6692d9b70b7656196191d3572696ee0a12).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...

2016-05-12 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/13090

[SPARK-15308][SQL] RowEncoder should preserve nested column name.

## What changes were proposed in this pull request?

The following code generates wrong schema:

```
val schema = new StructType().add(
  "struct",
  new StructType()
.add("i", IntegerType, nullable = false)
.add(
  "s",
  new StructType().add("int", IntegerType, nullable = false),
  nullable = false),
  nullable = false)
val ds = sqlContext.range(10).map(l => Row(l, Row(l)))(RowEncoder(schema))
ds.printSchema()
```

This should print as follows:

```
root
 |-- struct: struct (nullable = false)
 ||-- i: integer (nullable = false)
 ||-- s: struct (nullable = false)
 |||-- int: integer (nullable = false)
```

but the result is:

```
 |-- struct: struct (nullable = false)
 ||-- col1: integer (nullable = false)
 ||-- col2: struct (nullable = false)
 |||-- col1: integer (nullable = false)
```

This PR fixes `RowEncoder` to preserve nested column name.

## How was this patch tested?

Existing tests and I added a test to check if `RowEncoder` preserves nested 
column name.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-15308

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13090.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13090


commit d4ae79775f2ba0ebd0dd65b52076930244c2be96
Author: Takuya UESHIN 
Date:   2016-05-13T04:02:41Z

Add a test to check if RowEncoder preserves nested column name.

commit ddb9ce6692d9b70b7656196191d3572696ee0a12
Author: Takuya UESHIN 
Date:   2016-05-13T04:08:00Z

Fix RowEncoder to preserve nested column name.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13048#issuecomment-218949638
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13048#issuecomment-218949640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58536/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13048#issuecomment-218949541
  
**[Test build #58536 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58536/consoleFull)**
 for PR 13048 at commit 
[`6fd5f0d`](https://github.com/apache/spark/commit/6fd5f0d03b967d4b91f925859ef9ee7a01b98dfd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14615][ML] Use the new ML Vector and Ma...

2016-05-12 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/12627#issuecomment-218949085
  
@viirya Can you try to merge my code with yours, and see if the python 
tests pass? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...

2016-05-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/13087#issuecomment-218949073
  
Hi, @liancheng and @cloud-fan .
This PR is similar with your commit, `[SPARK-13473][SQL] Don't push 
predicate through project with nondeterministic field(s)`.
This PR prevent pushing predicate through project with UDF function 
expression.
Could you review this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13087#issuecomment-218948867
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58535/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13087#issuecomment-218948865
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13087#issuecomment-218948774
  
**[Test build #58535 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58535/consoleFull)**
 for PR 13087 at commit 
[`ab5bc4b`](https://github.com/apache/spark/commit/ab5bc4b77ef2fc40684f680cde296f7d91e29aa2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-218948489
  
**[Test build #58538 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58538/consoleFull)**
 for PR 12268 at commit 
[`cbb1674`](https://github.com/apache/spark/commit/cbb1674ecb4a82bfdb3fed97cdd14adbdd14ffb6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...

2016-05-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/13087#issuecomment-218948335
  
The reported error scenario is the following.
```scala
scala> val df = sc.parallelize(Seq(("a", "b"), ("a1", 
"b1"))).toDF("old","old1")
scala> val udfFunc = udf((s: String) => {println(s"running udf($s)"); s })
scala> val newDF = df.withColumn("new", udfFunc(df("old")))
scala> val filteredOnNewColumnDF = newDF.filter("new <> 'a1'")
scala> filteredOnNewColumnDF.show
running udf(a)
running udf(a)
running udf(a1)
+---++---+
|old|old1|new|
+---++---+
|  a|   b|  a|
+---++---+
```
The result of this PR is like the following.
```scala
scala> filteredOnNewColumnDF.show
running udf(a1)
running udf(a)
+---++---+
|old|old1|new|
+---++---+
|  a|   b|  a|
+---++---+
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-12 Thread zzcclp
Github user zzcclp commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-218947787
  
@markhamstra , thanks for your explaintion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-05-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/11724#issuecomment-218946776
  
@rxin Do you mind if I ask a quick look again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12855#issuecomment-218946730
  
ping @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15198][SQL] Support for pushing down fi...

2016-05-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12972#issuecomment-218946661
  
Please excuse my ping, @liancheng


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-12 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-218945792
  
@zzcclp Not likely.  This PR shouldn't produce any different results, but 
rather produces the same results faster.  We're typically very conservative 
with patch-level releases, so the optimization work for this PR will almost 
certainly only appear in the Spark 2.x series.  That's not too far off.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15143][SPARK-15144][SQL] Add CSV tests ...

2016-05-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12921#issuecomment-218945091
  
Please excuse my ping @rxin @falaki 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13048#issuecomment-218943216
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58534/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13048#issuecomment-218943213
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13048#issuecomment-218943100
  
**[Test build #58534 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58534/consoleFull)**
 for PR 13048 at commit 
[`0f86ce6`](https://github.com/apache/spark/commit/0f86ce60d69ddc97ced64faa3e354338bb3bbf35).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13078#issuecomment-218942530
  
**[Test build #58537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58537/consoleFull)**
 for PR 13078 at commit 
[`9b2a5aa`](https://github.com/apache/spark/commit/9b2a5aa9f88cbb35a4843298a7c295297b4ce378).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-12 Thread zzcclp
Github user zzcclp commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-218942375
  
Good PR, will it plan to be merged into branch-1.6?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-218942100
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-218942102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58533/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-218942004
  
**[Test build #58533 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58533/consoleFull)**
 for PR 12836 at commit 
[`da7bb2b`](https://github.com/apache/spark/commit/da7bb2be9206ce452429072432cf565f35c8763f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-12 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13078#discussion_r63128336
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala ---
@@ -64,8 +64,9 @@ private[ann] trait Layer extends Serializable {
* @return the layer model
*/
   def createModel(initialWeights: BDV[Double]): LayerModel
+
   /**
-   * Returns the instance of the layer with random generated weights
+   * Returns the instance of the layer with random generated weights.
--- End diff --

This method already have a `@return` text.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-12 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13078#discussion_r63128009
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/ann/BreezeUtil.scala ---
@@ -55,7 +55,7 @@ private[ann] object BreezeUtil {
* @param y y
*/
   def dgemv(alpha: Double, a: BDM[Double], x: BDV[Double], beta: Double, 
y: BDV[Double]): Unit = {
-require(a.cols == x.length, "A & b Dimension mismatch!")
+require(a.cols == x.length, "A & x Dimension mismatch!")
--- End diff --

Right. I will rename the matrix args to upper and add this missing checking.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15267][SQL] Refactor options for JDBC a...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13048#issuecomment-218941040
  
**[Test build #58536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58536/consoleFull)**
 for PR 13048 at commit 
[`6fd5f0d`](https://github.com/apache/spark/commit/6fd5f0d03b967d4b91f925859ef9ee7a01b98dfd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13087#issuecomment-218940532
  
**[Test build #58535 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58535/consoleFull)**
 for PR 13087 at commit 
[`ab5bc4b`](https://github.com/apache/spark/commit/ab5bc4b77ef2fc40684f680cde296f7d91e29aa2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Branch 2.0

2016-05-12 Thread zhengruifeng
Github user zhengruifeng commented on the pull request:

https://github.com/apache/spark/pull/13089#issuecomment-218940318
  
@ahnqirage please close it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15282][SQL] UDF funtion is not always d...

2016-05-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/13087#issuecomment-218939725
  
There are several cases which assumes UDF is deterministic. It would be a 
big change to user. I'll revert the change on ScalaUDF, and update this PR to 
change optimizer not to duplicate the UDF expression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14906][ML] Move VectorUDT and MatrixUDT...

2016-05-12 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/12870#issuecomment-218939778
  
@mengxr Currently I moved only Vector/Matrix and their UDTs to `pyspark.ml` 
and made `pyspark.ml` and `pyspark.mllib` codes to use moved `ml.linalg` 
Vector/Matrix. Besides, `pyspark.mllib` is untouched.

Do you mean that we want to keep `pyspark.mllib` codes using 
`pyspark.mllib.linalg` and let `pyspark.ml` codes using new `pyspark.ml.linalg`?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-12 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12788#discussion_r63126914
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala
 ---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+// scalastyle:off println
+
+import org.apache.spark.{SparkConf, SparkContext}
+// $example on$
+import org.apache.spark.ml.clustering.{GaussianMixture, 
GaussianMixtureModel}
--- End diff --

`GaussianMixtureModel` is not used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-12 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12788#discussion_r63126939
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala
 ---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+// scalastyle:off println
+
+import org.apache.spark.{SparkConf, SparkContext}
--- End diff --

`{SparkConf, SparkContext}` are not used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-12 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12788#discussion_r63126885
  
--- Diff: examples/src/main/python/ml/gaussian_mixture_example.py ---
@@ -0,0 +1,55 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import print_function
+
+# $example on$
+from pyspark.ml.clustering import GaussianMixture, GaussianMixtureModel
--- End diff --

`GaussianMixtureModel` is not used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14615][ML] Use the new ML Vector and Ma...

2016-05-12 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/12627#issuecomment-218938821
  
#12870 moved PySpark Vector/Matrix to ml package. It matches PySpark 
Vector/Matrix to old Vector/Matrix Scala codes. Since it doesn't touch Scala 
ML/MLlib codes (using old Vector/Matrix now), it can pass the tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14615][ML] Use the new ML Vector and Ma...

2016-05-12 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/12627#issuecomment-218938505
  
@dbtsai Looks like you only let ML codes use new Vector and Matrix. 
However, we can't have ML using new Vector/Matrix and MLlib using old 
Vector/Matrix. The matching between PySpark and Scala will be failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-12 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12788#discussion_r63125817
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala
 ---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+// scalastyle:off println
+
+import org.apache.spark.{SparkConf, SparkContext}
+// $example on$
+import org.apache.spark.ml.clustering.{GaussianMixture, 
GaussianMixtureModel}
+import org.apache.spark.sql.SparkSession
+// $example off$
+
+/**
+ * An example demonstrating Gaussian Mixture Model (GMM).
+ * Run with
+ * {{{
+ * bin/run-example ml.GaussianMixtureExample
+ * }}}
+ */
+object GaussianMixtureExample {
+  def main(args: Array[String]): Unit = {
+// Creates a SparkSession
+val spark = 
SparkSession.builder.appName(s"${this.getClass.getSimpleName}").getOrCreate()
+
+// $example on$
+// Load data
+val dataset = 
spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt")
+
+// Trains Gaussian Mixture Model
+val gmm = new GaussianMixture()
+  .setK(2)
+  .setFeaturesCol("features")
--- End diff --

keep the args in line with scala one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-12 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12788#discussion_r63126132
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaGaussianMixtureExample.java
 ---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+// $example on$
+import org.apache.spark.ml.clustering.GaussianMixture;
+import org.apache.spark.ml.clustering.GaussianMixtureModel;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+// $example off$
+import org.apache.spark.sql.SparkSession;
+
+
+/**
+ * An example demonstrating a Gaussian Mixture Model.
+ * Run with
+ * 
+ * bin/run-example ml.JavaGaussianMixtureExample
+ * 
+ */
+public class JavaGaussianMixtureExample {
+
+  public static void main(String[] args) {
+
+// Parses the arguments
+SparkSession spark = SparkSession
+.builder()
+.appName("JavaGaussianMixtureExample")
+.getOrCreate();
+
+// $example on$
+// Load data
+Dataset dataset = 
spark.read().format("libsvm").load("data/mllib/sample_kmeans_data.txt");
+
+// Trains a GaussianMixture model
+GaussianMixture gmm = new GaussianMixture()
+  .setK(2);
+GaussianMixtureModel model = gmm.fit(dataset);
+
+// Output the parameters of the mixture model
+for (int j = 0; j < model.getK(); j++) {
--- End diff --

nit, `j` -> 'i' to keep in line with scala example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13084#issuecomment-218937650
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58532/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13084#issuecomment-218937648
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14541][SQL] Support IFNULL, NULLIF, NVL...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13084#issuecomment-218937533
  
**[Test build #58532 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58532/consoleFull)**
 for PR 13084 at commit 
[`3248fb5`](https://github.com/apache/spark/commit/3248fb5f10e1ae44328450587044c29eeef21d62).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-12 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12788#discussion_r63125992
  
--- Diff: examples/src/main/python/ml/gaussian_mixture_example.py ---
@@ -0,0 +1,55 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import print_function
+
+# $example on$
+from pyspark.ml.clustering import GaussianMixture, GaussianMixtureModel
+# $example off$
+from pyspark.sql import SparkSession
+
+"""
+A simple example demonstrating a Gaussian Mixture Model (GMM).
+Run with:
+  bin/spark-submit examples/src/main/python/ml/gaussian_mixture_example.py
+"""
+
+if __name__ == "__main__":
+spark = SparkSession\
+.builder\
+.appName("PythonGuassianMixtureExample")\
+.getOrCreate()
+
+# $example on$
+# load data
--- End diff --

Loads


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-12 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12788#discussion_r63125902
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala
 ---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+// scalastyle:off println
+
+import org.apache.spark.{SparkConf, SparkContext}
+// $example on$
+import org.apache.spark.ml.clustering.{GaussianMixture, 
GaussianMixtureModel}
+import org.apache.spark.sql.SparkSession
+// $example off$
+
+/**
+ * An example demonstrating Gaussian Mixture Model (GMM).
+ * Run with
+ * {{{
+ * bin/run-example ml.GaussianMixtureExample
+ * }}}
+ */
+object GaussianMixtureExample {
+  def main(args: Array[String]): Unit = {
+// Creates a SparkSession
+val spark = 
SparkSession.builder.appName(s"${this.getClass.getSimpleName}").getOrCreate()
+
+// $example on$
+// Load data
+val dataset = 
spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt")
+
+// Trains Gaussian Mixture Model
+val gmm = new GaussianMixture()
+  .setK(2)
+  .setFeaturesCol("features")
+  .setPredictionCol("prediction")
+  .setTol(0.0001)
+  .setMaxIter(10)
+  .setSeed(10)
+val model = gmm.fit(dataset)
+
+// output parameters of max-likelihood model
--- End diff --

`max-likelihood model` -> `mixture model`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-12 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12788#discussion_r63125785
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala
 ---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+// scalastyle:off println
+
+import org.apache.spark.{SparkConf, SparkContext}
+// $example on$
+import org.apache.spark.ml.clustering.{GaussianMixture, 
GaussianMixtureModel}
+import org.apache.spark.sql.SparkSession
+// $example off$
+
+/**
+ * An example demonstrating Gaussian Mixture Model (GMM).
+ * Run with
+ * {{{
+ * bin/run-example ml.GaussianMixtureExample
+ * }}}
+ */
+object GaussianMixtureExample {
+  def main(args: Array[String]): Unit = {
+// Creates a SparkSession
+val spark = 
SparkSession.builder.appName(s"${this.getClass.getSimpleName}").getOrCreate()
+
+// $example on$
+// Load data
--- End diff --

Loads


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-12 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12788#discussion_r63125752
  
--- Diff: examples/src/main/python/ml/gaussian_mixture_example.py ---
@@ -0,0 +1,55 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import print_function
+
+# $example on$
+from pyspark.ml.clustering import GaussianMixture, GaussianMixtureModel
+# $example off$
+from pyspark.sql import SparkSession
+
+"""
+A simple example demonstrating a Gaussian Mixture Model (GMM).
+Run with:
+  bin/spark-submit examples/src/main/python/ml/gaussian_mixture_example.py
+"""
+
+if __name__ == "__main__":
+spark = SparkSession\
+.builder\
+.appName("PythonGuassianMixtureExample")\
+.getOrCreate()
+
+# $example on$
+# load data
+dataset = 
spark.read.format("libsvm").load("data/mllib/sample_kmeans_data.txt")
+
+gmm = GaussianMixture().setK(2).setSeed(10).setFeaturesCol("features")
+model = gmm.fit(dataset)
+
+print("Gaussians: ")
+model.gaussiansDF.show()
+
+transformed = model.transform(dataset).select("prediction")
--- End diff --

keep the output in line with scala example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >