date:20160516

[GitHub] spark pull request: [SPARK-15292] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13076#issuecomment-219626509
  
**[Test build #58668 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58668/consoleFull)**
 for PR 13076 at commit 
[`13e7a5b`](https://github.com/apache/spark/commit/13e7a5be72b65a288280bdfc5faa02fc24807f1a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15312] [SQL] Detect Duplicate Key in Pa...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13095#issuecomment-219626498
  
**[Test build #58667 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58667/consoleFull)**
 for PR 13095 at commit 
[`06f89d5`](https://github.com/apache/spark/commit/06f89d511233373e3d3bca1cb424da8f77976f7d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15292] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-16 Thread MLnick

Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/13076#issuecomment-219626330
  
Yeah I agree - let's add that for next release.
On Tue, 17 May 2016 at 07:37, Yanbo Liang  wrote:

> @avulanov  Thanks for your comments. To
> provide rawPredictions is make sense, but I think it's a new feature and
> we should do it in the next release cycle. I can work on it if no one has
> started. We have code freeze for 2.0 and only API audit, bug fix and
> documentation update should be merged.
>
> â
> You are receiving this because you commented.
> Reply to this email directly or view it on GitHub
> 
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14204] [SQL] register driverClass rathe...

2016-05-16 Thread zzcclp

Github user zzcclp commented on the pull request:

https://github.com/apache/spark/pull/12000#issuecomment-219626090
  
@mchalek @JoshRosen  Is there any progress yet? I got the same error with 
spark-1.6.1. But if I add "driver=com.mysql.jdbc.Driver" property to 
write.jdbc's properties params, i was ok. 
This pr looks like a pretty reasonable change without setting 
"driver=com.mysql.jdbc.Driver" property.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14906][ML] Copy linalg in PySpark to ne...

2016-05-16 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/13099#issuecomment-219624730
  
@viirya I made one pass and sent you a PR at 
https://github.com/viirya/spark-1/pull/5. Please take a look and merge it if it 
looks good to you. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12855#issuecomment-219624016
  
**[Test build #58666 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58666/consoleFull)**
 for PR 12855 at commit 
[`5f780a7`](https://github.com/apache/spark/commit/5f780a7ba60b2f0518897a9a369d27455cab47cb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15292] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-16 Thread yanboliang

Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/13076#issuecomment-219623827
  
@avulanov Thanks for your comments. To provide ```rawPredictions``` is make 
sense, but I think it's a new feature and we should do it in the next release 
cycle. I can work on it if no one has started. We have code freeze for 2.0 and 
only API audit, bug fix and documentation update should be merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-16 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12855#discussion_r63466408
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
 ---
@@ -216,6 +215,33 @@ class InsertIntoHiveTableSuite extends QueryTest with 
TestHiveSingleton with Bef
 sql("DROP TABLE hiveTableWithStructValue")
   }
 
+  test("SPARK-10216: Avoid empty files during overwrite into Hive table 
with group by query") {
+val testDataset = hiveContext.sparkContext.parallelize(
+  (1 to 2).map(i => TestData(i, i.toString))).toDF()
+testDataset.registerTempTable("testDataset")
+
+val tmpDir = Utils.createTempDir()
+sql(
+  s"""
+|CREATE TABLE table1(key int,value string)
+|location '${tmpDir.toURI.toString}'
+  """.stripMargin)
+sql(
+  """
+|INSERT OVERWRITE TABLE table1
+|SELECT count(key), value FROM testDataset GROUP BY value
--- End diff --

Ah, yes. Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-16 Thread sun-rui

Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-219621230
  
@felixcheung, could you demonstrate your thought more clearly? I am not 
sure If I understand it. this PR aims to allow applying an R function on 
groups, while agg is pre-defined functionality and involves no user-supplied 
code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15342][SQL][PySpark] PySpark test for n...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13134#issuecomment-219620522
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58665/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15342][SQL][PySpark] PySpark test for n...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13134#issuecomment-219620521
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15342][SQL][PySpark] PySpark test for n...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13134#issuecomment-219620469
  
**[Test build #58665 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58665/consoleFull)**
 for PR 13134 at commit 
[`97fb7f5`](https://github.com/apache/spark/commit/97fb7f555cffc96f140e483c455e366265de5459).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15165][SQL] Codegen can break because t...

2016-05-16 Thread sarutak

Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/12939#discussion_r63464908
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
@@ -162,7 +162,18 @@ package object util {
   def toCommentSafeString(str: String): String = {
 val len = math.min(str.length, 128)
 val suffix = if (str.length > len) "..." else ""
-str.substring(0, len).replace("*/", "\\*\\/").replace("\\u", "u") 
+ suffix
--- End diff --

Thanks for the advice.
I think "\u" should be escaped too otherwise, the compilation will fail 
when invalid unicode characters, like `\u002X` or `\u001`, are in literals.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-16 Thread felixcheung

Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-219619796
  
My apologies if this was discussed (this was a long PR!) - isn't it rather 
constricting to allow grouping by list of columns only? Is it more useful or 
flexible to support "aggregated grouped data"?

https://spark.apache.org/docs/latest/sparkr.html#grouping-aggregation

https://docs.cloud.databricks.com/docs/latest/databricks_guide/10%20SparkR/1%20Functions/agg.html



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14603] [SQL] [FOLLOWUP] Verification of...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13126#issuecomment-219619023
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14603] [SQL] [FOLLOWUP] Verification of...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13126#issuecomment-219619025
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58661/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14603] [SQL] [FOLLOWUP] Verification of...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13126#issuecomment-219618912
  
**[Test build #58661 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58661/consoleFull)**
 for PR 13126 at commit 
[`dc884a6`](https://github.com/apache/spark/commit/dc884a6b967b5cdddbeb407a52d474b84528a8e5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15342][SQL][PySpark] PySpark test for n...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13134#issuecomment-219618868
  
**[Test build #58665 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58665/consoleFull)**
 for PR 13134 at commit 
[`97fb7f5`](https://github.com/apache/spark/commit/97fb7f555cffc96f140e483c455e366265de5459).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15337] [SPARK-15338] [SQL] Enable Run-t...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13128#issuecomment-219618619
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15337] [SPARK-15338] [SQL] Enable Run-t...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13128#issuecomment-219618620
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58663/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15337] [SPARK-15338] [SQL] Enable Run-t...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13128#issuecomment-219618589
  
**[Test build #58663 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58663/consoleFull)**
 for PR 13128 at commit 
[`372b492`](https://github.com/apache/spark/commit/372b492d3d96070f270f88b8304137a2ebd1a440).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15342][SQL][PySpark] PySpark test for n...

2016-05-16 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13134#discussion_r63464031
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -1036,8 +1036,15 @@ def test_access_column(self):
 self.assertRaises(TypeError, lambda: df[{}])
 
 def test_column_name_with_non_ascii(self):
-df = self.spark.createDataFrame([(1,)], ["æ°é"])
-self.assertEqual(StructType([StructField("æ°é", LongType(), 
True)]), df.schema)
+if sys.version >= '3':
--- End diff --

The usage in other places looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15330] [SQL] Implement Reset Command

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13121#issuecomment-219618421
  
**[Test build #58664 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58664/consoleFull)**
 for PR 13121 at commit 
[`bdeb165`](https://github.com/apache/spark/commit/bdeb1658ce3171e1b1779df84bb2739df8993f60).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14434][ML]:User guide doc and examples ...

2016-05-16 Thread wangmiao1981

Github user wangmiao1981 commented on the pull request:

https://github.com/apache/spark/pull/12788#issuecomment-219617189
  
@MLnick I made changes accordingly. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12855#issuecomment-219616350
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58662/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12855#issuecomment-219616331
  
**[Test build #58662 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58662/consoleFull)**
 for PR 12855 at commit 
[`24e16b7`](https://github.com/apache/spark/commit/24e16b794f61dc629bbeced274cbcab02d044bbc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12855#issuecomment-219616349
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15337] [SPARK-15338] [SQL] Enable Run-t...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13128#issuecomment-219615459
  
**[Test build #58663 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58663/consoleFull)**
 for PR 13128 at commit 
[`372b492`](https://github.com/apache/spark/commit/372b492d3d96070f270f88b8304137a2ebd1a440).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12855#discussion_r63462283
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala 
---
@@ -879,6 +879,24 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils with Tes
   }
 }
   }
+
+  test("SPARK-10216: Avoid empty files during overwriting with group by 
query") {
+withTempPath { path =>
+  val df = sqlContext.range(0, 5)
+  val groupedDF = df.groupBy("id").count()
+  groupedDF.write
+.format(dataSourceName)
+.mode(SaveMode.Overwrite)
+.save(path.getCanonicalPath)
--- End diff --

Save as https://github.com/apache/spark/pull/12855/files#r63462262


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12855#discussion_r63462262
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
 ---
@@ -216,6 +215,33 @@ class InsertIntoHiveTableSuite extends QueryTest with 
TestHiveSingleton with Bef
 sql("DROP TABLE hiveTableWithStructValue")
   }
 
+  test("SPARK-10216: Avoid empty files during overwrite into Hive table 
with group by query") {
+val testDataset = hiveContext.sparkContext.parallelize(
+  (1 to 2).map(i => TestData(i, i.toString))).toDF()
+testDataset.registerTempTable("testDataset")
+
+val tmpDir = Utils.createTempDir()
+sql(
+  s"""
+|CREATE TABLE table1(key int,value string)
+|location '${tmpDir.toURI.toString}'
+  """.stripMargin)
+sql(
+  """
+|INSERT OVERWRITE TABLE table1
+|SELECT count(key), value FROM testDataset GROUP BY value
--- End diff --

Seems you want to explicitly control the number of shuffle partitions? 
Otherwise, this test will not testing anything if the number of shuffle 
partitions is set to 2 by any chance?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12855#issuecomment-219614959
  
**[Test build #58662 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58662/consoleFull)**
 for PR 12855 at commit 
[`24e16b7`](https://github.com/apache/spark/commit/24e16b794f61dc629bbeced274cbcab02d044bbc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10216][SQL] Avoid creating empty files ...

2016-05-16 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12855#issuecomment-219614807
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15292] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-16 Thread avulanov

Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/13076#issuecomment-219613838
  
@yanboliang Changing `weights` to `initialWeights` makes a perfect sense, 
thank you. There were few requests for MLP to provide `rawPredictions`. Since 
you are working on the API, do you think we can add this too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

2016-05-16 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13107#issuecomment-219612852
  
What is the root cause? Can you also add a regression test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14603] [SQL] [FOLLOWUP] Verification of...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13126#issuecomment-219610450
  
**[Test build #58661 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58661/consoleFull)**
 for PR 13126 at commit 
[`dc884a6`](https://github.com/apache/spark/commit/dc884a6b967b5cdddbeb407a52d474b84528a8e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15337] [SPARK-15338] [SQL] Enable Run-t...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13128#issuecomment-219609218
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58660/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15337] [SPARK-15338] [SQL] Enable Run-t...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13128#issuecomment-219609183
  
**[Test build #58660 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58660/consoleFull)**
 for PR 13128 at commit 
[`52161e4`](https://github.com/apache/spark/commit/52161e4e402a99bad5cca8b42a804df9b5b80596).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15337] [SPARK-15338] [SQL] Enable Run-t...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13128#issuecomment-219609216
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15290] [BUILD] Move annotations, like @...

2016-05-16 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13074#issuecomment-219608475
  
Actually josh is on vacation. Definitely go ahead. I looked at it and it 
seemed ok. I'm not a build expert though.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-219608365
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-219608366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58659/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-219608257
  
**[Test build #58659 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58659/consoleFull)**
 for PR 12060 at commit 
[`4eb8c05`](https://github.com/apache/spark/commit/4eb8c05841d14c9ecb306669508d6d0eab7543f5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15334][SQL] HiveClient facade not compa...

2016-05-16 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13127#issuecomment-219606817
  
I have finished my review. Thank you for working on this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15334][SQL] HiveClient facade not compa...

2016-05-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13127#discussion_r63458236
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -130,109 +124,343 @@ class VersionsSuite extends SparkFunSuite with 
Logging {
 test(s"$version: create client") {
   client = null
   System.gc() // Hack to avoid SEGV on some JVM versions.
+  val hadoopConf = new Configuration();
+  hadoopConf.set("test", "success")
   client =
 IsolatedClientLoader.forVersion(
   hiveMetastoreVersion = version,
   hadoopVersion = VersionInfo.getVersion,
   sparkConf = sparkConf,
-  hadoopConf = new Configuration(),
+  hadoopConf,
   config = buildConf(),
   ivyPath = ivyPath).createClient()
 }
 
+def table(database: String, tableName: String): CatalogTable = {
+  CatalogTable(
+identifier = TableIdentifier(tableName, Some(database)),
+tableType = CatalogTableType.MANAGED,
+schema = Seq(CatalogColumn("key", "int")),
+storage = CatalogStorageFormat(
+  locationUri = None,
+  inputFormat = 
Some(classOf[org.apache.hadoop.mapred.TextInputFormat].getName),
+  outputFormat = Some(
+
classOf[org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat[_, 
_]].getName),
+  serde = 
Some(classOf[org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe].getName()),
+  compressed = false,
+  serdeProperties = Map.empty
+))
+}
+
+
///
+// Database related API
+
///
+
+val tempDatabasePath = Utils.createTempDir().getCanonicalPath
+
 test(s"$version: createDatabase") {
-  val db = CatalogDatabase("default", "desc", "loc", Map())
-  client.createDatabase(db, ignoreIfExists = true)
+  val defaultDB = CatalogDatabase("default", "desc", "loc", Map())
+  client.createDatabase(defaultDB, ignoreIfExists = true)
+  val tempDB = CatalogDatabase(
+"temporary", description = "test create", tempDatabasePath, Map())
+  client.createDatabase(tempDB, ignoreIfExists = true)
+}
+
+test(s"$version: setCurrentDatabase") {
+  client.setCurrentDatabase("default")
+}
+
+test(s"$version: getDatabase") {
+  // No exception should be thrown
+  client.getDatabase("default")
+}
+
+test(s"$version: getDatabaseOption") {
+  assert(client.getDatabaseOption("default").isDefined)
+  assert(client.getDatabaseOption("nonexist") == None)
 }
 
+test(s"$version: listDatabases") {
+  assert(client.listDatabases("defau.*") == Seq("default"))
+}
+
+test(s"$version: alterDatabase") {
+  val database = client.getDatabase("temporary").copy(properties = 
Map("flag" -> "true"))
+  client.alterDatabase(database)
+  assert(client.getDatabase("temporary").properties.contains("flag"))
+}
+
+test(s"$version: dropDatabase") {
+  assert(client.getDatabaseOption("temporary").isDefined)
+  client.dropDatabase("temporary", ignoreIfNotExists = false, cascade 
= true)
+  assert(client.getDatabaseOption("temporary").isEmpty)
+}
+
+
///
+// Table related API
+
///
+
 test(s"$version: createTable") {
-  val table =
-CatalogTable(
-  identifier = TableIdentifier("src", Some("default")),
-  tableType = CatalogTableType.MANAGED,
-  schema = Seq(CatalogColumn("key", "int")),
-  storage = CatalogStorageFormat(
-locationUri = None,
-inputFormat = 
Some(classOf[org.apache.hadoop.mapred.TextInputFormat].getName),
-outputFormat = Some(
-  
classOf[org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat[_, 
_]].getName),
-serde = 
Some(classOf[org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe].getName()),
-compressed = false,
-serdeProperties = Map.empty
-  ))
-
-  client.createTable(table, ignoreIfExists = false)
+  client.createTable(table("default", tableName = "src"), 
ignoreIfExists = false)
+  // Creates a temporary table
+  client.createTable(table("default", "temporary"), ignoreIfExists = 
false)
+}
+

[GitHub] spark pull request: [SPARK-15334][SQL] HiveClient facade not compa...

2016-05-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13127#discussion_r63458112
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -130,109 +124,343 @@ class VersionsSuite extends SparkFunSuite with 
Logging {
 test(s"$version: create client") {
   client = null
   System.gc() // Hack to avoid SEGV on some JVM versions.
+  val hadoopConf = new Configuration();
+  hadoopConf.set("test", "success")
   client =
 IsolatedClientLoader.forVersion(
   hiveMetastoreVersion = version,
   hadoopVersion = VersionInfo.getVersion,
   sparkConf = sparkConf,
-  hadoopConf = new Configuration(),
+  hadoopConf,
   config = buildConf(),
   ivyPath = ivyPath).createClient()
 }
 
+def table(database: String, tableName: String): CatalogTable = {
+  CatalogTable(
+identifier = TableIdentifier(tableName, Some(database)),
+tableType = CatalogTableType.MANAGED,
+schema = Seq(CatalogColumn("key", "int")),
+storage = CatalogStorageFormat(
+  locationUri = None,
+  inputFormat = 
Some(classOf[org.apache.hadoop.mapred.TextInputFormat].getName),
+  outputFormat = Some(
+
classOf[org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat[_, 
_]].getName),
+  serde = 
Some(classOf[org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe].getName()),
+  compressed = false,
+  serdeProperties = Map.empty
+))
+}
+
+
///
+// Database related API
+
///
+
+val tempDatabasePath = Utils.createTempDir().getCanonicalPath
+
 test(s"$version: createDatabase") {
-  val db = CatalogDatabase("default", "desc", "loc", Map())
-  client.createDatabase(db, ignoreIfExists = true)
+  val defaultDB = CatalogDatabase("default", "desc", "loc", Map())
+  client.createDatabase(defaultDB, ignoreIfExists = true)
+  val tempDB = CatalogDatabase(
+"temporary", description = "test create", tempDatabasePath, Map())
+  client.createDatabase(tempDB, ignoreIfExists = true)
+}
+
+test(s"$version: setCurrentDatabase") {
+  client.setCurrentDatabase("default")
+}
+
+test(s"$version: getDatabase") {
+  // No exception should be thrown
+  client.getDatabase("default")
+}
+
+test(s"$version: getDatabaseOption") {
+  assert(client.getDatabaseOption("default").isDefined)
+  assert(client.getDatabaseOption("nonexist") == None)
 }
 
+test(s"$version: listDatabases") {
+  assert(client.listDatabases("defau.*") == Seq("default"))
+}
+
+test(s"$version: alterDatabase") {
+  val database = client.getDatabase("temporary").copy(properties = 
Map("flag" -> "true"))
+  client.alterDatabase(database)
+  assert(client.getDatabase("temporary").properties.contains("flag"))
+}
+
+test(s"$version: dropDatabase") {
+  assert(client.getDatabaseOption("temporary").isDefined)
+  client.dropDatabase("temporary", ignoreIfNotExists = false, cascade 
= true)
+  assert(client.getDatabaseOption("temporary").isEmpty)
+}
+
+
///
+// Table related API
+
///
+
 test(s"$version: createTable") {
-  val table =
-CatalogTable(
-  identifier = TableIdentifier("src", Some("default")),
-  tableType = CatalogTableType.MANAGED,
-  schema = Seq(CatalogColumn("key", "int")),
-  storage = CatalogStorageFormat(
-locationUri = None,
-inputFormat = 
Some(classOf[org.apache.hadoop.mapred.TextInputFormat].getName),
-outputFormat = Some(
-  
classOf[org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat[_, 
_]].getName),
-serde = 
Some(classOf[org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe].getName()),
-compressed = false,
-serdeProperties = Map.empty
-  ))
-
-  client.createTable(table, ignoreIfExists = false)
+  client.createTable(table("default", tableName = "src"), 
ignoreIfExists = false)
+  // Creates a temporary table
+  client.createTable(table("default", "temporary"), ignoreIfExists = 
false)
+}
+

[GitHub] spark pull request: [SPARK-15334][SQL] HiveClient facade not compa...

2016-05-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13127#discussion_r63458151
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -130,109 +124,343 @@ class VersionsSuite extends SparkFunSuite with 
Logging {
 test(s"$version: create client") {
   client = null
   System.gc() // Hack to avoid SEGV on some JVM versions.
+  val hadoopConf = new Configuration();
+  hadoopConf.set("test", "success")
   client =
 IsolatedClientLoader.forVersion(
   hiveMetastoreVersion = version,
   hadoopVersion = VersionInfo.getVersion,
   sparkConf = sparkConf,
-  hadoopConf = new Configuration(),
+  hadoopConf,
   config = buildConf(),
   ivyPath = ivyPath).createClient()
 }
 
+def table(database: String, tableName: String): CatalogTable = {
+  CatalogTable(
+identifier = TableIdentifier(tableName, Some(database)),
+tableType = CatalogTableType.MANAGED,
+schema = Seq(CatalogColumn("key", "int")),
+storage = CatalogStorageFormat(
+  locationUri = None,
+  inputFormat = 
Some(classOf[org.apache.hadoop.mapred.TextInputFormat].getName),
+  outputFormat = Some(
+
classOf[org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat[_, 
_]].getName),
+  serde = 
Some(classOf[org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe].getName()),
+  compressed = false,
+  serdeProperties = Map.empty
+))
+}
+
+
///
+// Database related API
+
///
+
+val tempDatabasePath = Utils.createTempDir().getCanonicalPath
+
 test(s"$version: createDatabase") {
-  val db = CatalogDatabase("default", "desc", "loc", Map())
-  client.createDatabase(db, ignoreIfExists = true)
+  val defaultDB = CatalogDatabase("default", "desc", "loc", Map())
+  client.createDatabase(defaultDB, ignoreIfExists = true)
+  val tempDB = CatalogDatabase(
+"temporary", description = "test create", tempDatabasePath, Map())
+  client.createDatabase(tempDB, ignoreIfExists = true)
+}
+
+test(s"$version: setCurrentDatabase") {
+  client.setCurrentDatabase("default")
+}
+
+test(s"$version: getDatabase") {
+  // No exception should be thrown
+  client.getDatabase("default")
+}
+
+test(s"$version: getDatabaseOption") {
+  assert(client.getDatabaseOption("default").isDefined)
+  assert(client.getDatabaseOption("nonexist") == None)
 }
 
+test(s"$version: listDatabases") {
+  assert(client.listDatabases("defau.*") == Seq("default"))
+}
+
+test(s"$version: alterDatabase") {
+  val database = client.getDatabase("temporary").copy(properties = 
Map("flag" -> "true"))
+  client.alterDatabase(database)
+  assert(client.getDatabase("temporary").properties.contains("flag"))
+}
+
+test(s"$version: dropDatabase") {
+  assert(client.getDatabaseOption("temporary").isDefined)
+  client.dropDatabase("temporary", ignoreIfNotExists = false, cascade 
= true)
+  assert(client.getDatabaseOption("temporary").isEmpty)
+}
+
+
///
+// Table related API
+
///
+
 test(s"$version: createTable") {
-  val table =
-CatalogTable(
-  identifier = TableIdentifier("src", Some("default")),
-  tableType = CatalogTableType.MANAGED,
-  schema = Seq(CatalogColumn("key", "int")),
-  storage = CatalogStorageFormat(
-locationUri = None,
-inputFormat = 
Some(classOf[org.apache.hadoop.mapred.TextInputFormat].getName),
-outputFormat = Some(
-  
classOf[org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat[_, 
_]].getName),
-serde = 
Some(classOf[org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe].getName()),
-compressed = false,
-serdeProperties = Map.empty
-  ))
-
-  client.createTable(table, ignoreIfExists = false)
+  client.createTable(table("default", tableName = "src"), 
ignoreIfExists = false)
+  // Creates a temporary table
+  client.createTable(table("default", "temporary"), ignoreIfExists = 
false)
+}
+

[GitHub] spark pull request: [SPARK-15337] [SPARK-15338] [SQL] Enable Run-t...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13128#issuecomment-219605418
  
**[Test build #58660 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58660/consoleFull)**
 for PR 13128 at commit 
[`52161e4`](https://github.com/apache/spark/commit/52161e4e402a99bad5cca8b42a804df9b5b80596).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15247][SQL] Set the default number of p...

2016-05-16 Thread maropu

Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/13137#issuecomment-219603543
  
@liancheng Trivial fix and could you fix this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15340][SQL]Limit the size of the map us...

2016-05-16 Thread DoingDone9

Github user DoingDone9 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13130#discussion_r63453073
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -363,7 +363,7 @@ private[spark] object HadoopRDD extends Logging {
 
   def containsCachedMetadata(key: String): Boolean = 
SparkEnv.get.hadoopJobMetadata.containsKey(key)
 
-  private def putCachedMetadata(key: String, value: Any): Unit =
+  private def putCachedMetadata(key: String, value: Object): Unit =
--- End diff --

CacheBuilder.newBuilder()  return CacheBuilder<`Object`, `Object`>,  i can 
only get map[String, `Object`]  @HyukjinKwon 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15292] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-16 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/13076#issuecomment-219595234
  
Thanks for these fixes!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15292] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-16 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13076#discussion_r63452937
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
 ---
@@ -65,22 +63,23 @@ private[ml] trait MultilayerPerceptronParams extends 
PredictorParams
   "it is adjusted to the size of this data. Recommended size is 
between 10 and 1000",
 ParamValidators.gt(0))
 
-  /** @group getParam */
+  /** @group expertGetParam */
   final def getBlockSize: Int = $(blockSize)
 
   /**
-   * Allows setting the solver: minibatch gradient descent (gd) or l-bfgs.
-   * l-bfgs is the default one.
+   * The solver algorithm for optimization.
+   * Supported options: "gd" (minibatch gradient descent) or "l-bfgs".
+   * Default: "l-bfgs"
*
* @group expertParam
*/
   final val solver: Param[String] = new Param[String](this, "solver",
-" Allows setting the solver: minibatch gradient descent (gd) or 
l-bfgs. " +
-  " l-bfgs is the default one.",
-ParamValidators.inArray[String](Array("gd", "l-bfgs")))
+"The solver algorithm for optimization. Supported options: " +
+  s"${MultilayerPerceptronClassifier.supportedSolvers.mkString(", ")}. 
(Default l-bfgs)",
+
ParamValidators.inArray[String](MultilayerPerceptronClassifier.supportedSolvers))
 
-  /** @group getParam */
-  final def getOptimizer: String = $(solver)
+  /** @group expertGetParam */
+  final def getSolver: String = $(solver)
 
   /**
* Model weights. Can be returned either after training or after 
explicit setting
--- End diff --

I like this approach.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15292] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-16 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13076#discussion_r63452941
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
 ---
@@ -204,16 +238,26 @@ class MultilayerPerceptronClassifier @Since("1.5.0") (
 val labels = myLayers.last
 val lpData = extractLabeledPoints(dataset)
 val data = lpData.map(lp => LabelConverter.encodeLabeledPoint(lp, 
labels))
-val topology = FeedForwardTopology.multiLayerPerceptron(myLayers, true)
+val topology = FeedForwardTopology.multiLayerPerceptron(myLayers, 
softmaxOnTop = true)
 val trainer = new FeedForwardTrainer(topology, myLayers(0), 
myLayers.last)
-if (isDefined(weights)) {
-  trainer.setWeights($(weights))
+if (isDefined(initialWeights)) {
+  trainer.setWeights($(initialWeights))
 } else {
   trainer.setSeed($(seed))
 }
-trainer.LBFGSOptimizer
-  .setConvergenceTol($(tol))
-  .setNumIterations($(maxIter))
+if ($(solver) == MultilayerPerceptronClassifier.LBFGS) {
+  trainer.LBFGSOptimizer
+.setConvergenceTol($(tol))
+.setNumIterations($(maxIter))
+} else if ($(solver) == MultilayerPerceptronClassifier.GD) {
+  trainer.SGDOptimizer
+.setNumIterations($(maxIter))
+.setConvergenceTol($(tol))
+.setStepSize($(stepSize))
+} else {
+  throw new UnsupportedOperationException(
--- End diff --

IllegalArgumentException


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15292] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-16 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13076#discussion_r63452938
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala
 ---
@@ -181,12 +206,21 @@ class MultilayerPerceptronClassifier @Since("1.5.0") (
   def setSeed(value: Long): this.type = set(seed, value)
 
   /**
-   * Sets the model weights.
+   * Sets the value of param [[initialWeights]].
*
-   * @group expertParam
+   * @group expertSetParam
+   */
+  @Since("2.0.0")
+  def setInitialWeights(value: Vector): this.type = set(initialWeights, 
value)
+
+  /**
+   * Sets the value of param [[stepSize]].
--- End diff --

State that this param only applies if solver = "gd"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14752] [SQL] fix kryo ordering serializ...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13141#issuecomment-219594712
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58658/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14752] [SQL] fix kryo ordering serializ...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13141#issuecomment-219594710
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14752] [SQL] fix kryo ordering serializ...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13141#issuecomment-219594613
  
**[Test build #58658 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58658/consoleFull)**
 for PR 13141 at commit 
[`66f0e6c`](https://github.com/apache/spark/commit/66f0e6c352bae9e65eadada19b1cfead8b06b3aa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-219594426
  
**[Test build #58659 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58659/consoleFull)**
 for PR 12060 at commit 
[`4eb8c05`](https://github.com/apache/spark/commit/4eb8c05841d14c9ecb306669508d6d0eab7543f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread ueshin

Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-219594429
  
@kayousterhout Thank you for your review.
I updated the comments and pushed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

2016-05-16 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/13138#issuecomment-219593532
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15334][SQL] HiveClient facade not compa...

2016-05-16 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13127#discussion_r63451732
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -130,109 +124,343 @@ class VersionsSuite extends SparkFunSuite with 
Logging {
 test(s"$version: create client") {
   client = null
   System.gc() // Hack to avoid SEGV on some JVM versions.
+  val hadoopConf = new Configuration();
+  hadoopConf.set("test", "success")
   client =
 IsolatedClientLoader.forVersion(
   hiveMetastoreVersion = version,
   hadoopVersion = VersionInfo.getVersion,
   sparkConf = sparkConf,
-  hadoopConf = new Configuration(),
+  hadoopConf,
   config = buildConf(),
   ivyPath = ivyPath).createClient()
 }
 
+def table(database: String, tableName: String): CatalogTable = {
+  CatalogTable(
+identifier = TableIdentifier(tableName, Some(database)),
+tableType = CatalogTableType.MANAGED,
+schema = Seq(CatalogColumn("key", "int")),
+storage = CatalogStorageFormat(
+  locationUri = None,
+  inputFormat = 
Some(classOf[org.apache.hadoop.mapred.TextInputFormat].getName),
+  outputFormat = Some(
+
classOf[org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat[_, 
_]].getName),
+  serde = 
Some(classOf[org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe].getName()),
+  compressed = false,
+  serdeProperties = Map.empty
+))
+}
+
+
///
+// Database related API
+
///
+
+val tempDatabasePath = Utils.createTempDir().getCanonicalPath
+
 test(s"$version: createDatabase") {
-  val db = CatalogDatabase("default", "desc", "loc", Map())
-  client.createDatabase(db, ignoreIfExists = true)
+  val defaultDB = CatalogDatabase("default", "desc", "loc", Map())
+  client.createDatabase(defaultDB, ignoreIfExists = true)
+  val tempDB = CatalogDatabase(
+"temporary", description = "test create", tempDatabasePath, Map())
+  client.createDatabase(tempDB, ignoreIfExists = true)
+}
+
+test(s"$version: setCurrentDatabase") {
+  client.setCurrentDatabase("default")
+}
+
+test(s"$version: getDatabase") {
+  // No exception should be thrown
+  client.getDatabase("default")
+}
+
+test(s"$version: getDatabaseOption") {
+  assert(client.getDatabaseOption("default").isDefined)
+  assert(client.getDatabaseOption("nonexist") == None)
 }
 
+test(s"$version: listDatabases") {
+  assert(client.listDatabases("defau.*") == Seq("default"))
+}
+
+test(s"$version: alterDatabase") {
+  val database = client.getDatabase("temporary").copy(properties = 
Map("flag" -> "true"))
+  client.alterDatabase(database)
+  assert(client.getDatabase("temporary").properties.contains("flag"))
+}
+
+test(s"$version: dropDatabase") {
+  assert(client.getDatabaseOption("temporary").isDefined)
+  client.dropDatabase("temporary", ignoreIfNotExists = false, cascade 
= true)
+  assert(client.getDatabaseOption("temporary").isEmpty)
+}
+
+
///
+// Table related API
+
///
+
 test(s"$version: createTable") {
-  val table =
-CatalogTable(
-  identifier = TableIdentifier("src", Some("default")),
-  tableType = CatalogTableType.MANAGED,
-  schema = Seq(CatalogColumn("key", "int")),
-  storage = CatalogStorageFormat(
-locationUri = None,
-inputFormat = 
Some(classOf[org.apache.hadoop.mapred.TextInputFormat].getName),
-outputFormat = Some(
-  
classOf[org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat[_, 
_]].getName),
-serde = 
Some(classOf[org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe].getName()),
-compressed = false,
-serdeProperties = Map.empty
-  ))
-
-  client.createTable(table, ignoreIfExists = false)
+  client.createTable(table("default", tableName = "src"), 
ignoreIfExists = false)
+  // Creates a temporary table
+  client.createTable(table("default", "temporary"), ignoreIfExists = 
false)
+}
+

[GitHub] spark pull request: [SPARK-15247][SQL] Set the default number of p...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13137#issuecomment-219592753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58657/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15247][SQL] Set the default number of p...

2016-05-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13137#issuecomment-219592752
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15247][SQL] Set the default number of p...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13137#issuecomment-219592621
  
**[Test build #58657 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58657/consoleFull)**
 for PR 13137 at commit 
[`4aafb37`](https://github.com/apache/spark/commit/4aafb3789bb67020be50d86611d58c43baafafec).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15334][SQL] HiveClient facade not compa...

2016-05-16 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13127#discussion_r63451609
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -265,6 +343,47 @@ private[client] class Shim_v0_12 extends Shim with 
Logging {
 dropIndexMethod.invoke(hive, dbName, tableName, indexName, true: 
JBoolean)
   }
 
+  // Hive 0.12 doesn't support permanent functions. Here we use a 
in-memory map to simulate the
+  // permanent function registry.
+  private var functionMap = Map.empty[String, CatalogFunction]
+  private val functionDDLNotSupported = "Hive 0.12 doesn't support 
creating permanent functions. " +
+"Please use Hive 0.13 or higher."
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

2016-05-16 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13138#discussion_r63451076
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala
 ---
@@ -185,6 +185,20 @@ class RowEncoderSuite extends SparkFunSuite {
 assert(encoder.serializer.head.nullable == false)
   }
 
+  test("RowEncoder should support array as the external type for 
ArrayType") {
--- End diff --

We have `encodeDecodeTest` in `RowEncoderSuite`, but this is the special 
case. ideally all types that allow more than one external types need individual 
test, i.e. decimal type, struct type, array type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

2016-05-16 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13138#discussion_r63450908
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
 ---
@@ -37,6 +37,11 @@ class GenericArrayData(val array: Array[Any]) extends 
ArrayData {
   def this(primitiveArray: Array[Byte]) = this(primitiveArray.toSeq)
   def this(primitiveArray: Array[Boolean]) = this(primitiveArray.toSeq)
 
+  def this(seqOrArray: Any) = this(seqOrArray match {
--- End diff --

We don't. For `RowEncoder`, we only have the schema when we build it, no 
type info. Actually there are 3 special cases overall, one is array type, we 
support both seq and array. One is decimal type, we support both java/scala 
`BigDecimal` and `Decimal`. One is struct type, we support both `Row` and 
`Product`. They all use runtime reflection


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15334][SQL] HiveClient facade not compa...

2016-05-16 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13127#discussion_r63450897
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -130,109 +124,343 @@ class VersionsSuite extends SparkFunSuite with 
Logging {
 test(s"$version: create client") {
   client = null
   System.gc() // Hack to avoid SEGV on some JVM versions.
+  val hadoopConf = new Configuration();
+  hadoopConf.set("test", "success")
   client =
 IsolatedClientLoader.forVersion(
   hiveMetastoreVersion = version,
   hadoopVersion = VersionInfo.getVersion,
   sparkConf = sparkConf,
-  hadoopConf = new Configuration(),
+  hadoopConf,
   config = buildConf(),
   ivyPath = ivyPath).createClient()
 }
 
+def table(database: String, tableName: String): CatalogTable = {
+  CatalogTable(
+identifier = TableIdentifier(tableName, Some(database)),
+tableType = CatalogTableType.MANAGED,
+schema = Seq(CatalogColumn("key", "int")),
+storage = CatalogStorageFormat(
+  locationUri = None,
+  inputFormat = 
Some(classOf[org.apache.hadoop.mapred.TextInputFormat].getName),
+  outputFormat = Some(
+
classOf[org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat[_, 
_]].getName),
+  serde = 
Some(classOf[org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe].getName()),
+  compressed = false,
+  serdeProperties = Map.empty
+))
+}
+
+
///
+// Database related API
+
///
+
+val tempDatabasePath = Utils.createTempDir().getCanonicalPath
+
 test(s"$version: createDatabase") {
-  val db = CatalogDatabase("default", "desc", "loc", Map())
-  client.createDatabase(db, ignoreIfExists = true)
+  val defaultDB = CatalogDatabase("default", "desc", "loc", Map())
+  client.createDatabase(defaultDB, ignoreIfExists = true)
+  val tempDB = CatalogDatabase(
+"temporary", description = "test create", tempDatabasePath, Map())
+  client.createDatabase(tempDB, ignoreIfExists = true)
+}
+
+test(s"$version: setCurrentDatabase") {
+  client.setCurrentDatabase("default")
+}
+
+test(s"$version: getDatabase") {
+  // No exception should be thrown
+  client.getDatabase("default")
+}
+
+test(s"$version: getDatabaseOption") {
+  assert(client.getDatabaseOption("default").isDefined)
+  assert(client.getDatabaseOption("nonexist") == None)
 }
 
+test(s"$version: listDatabases") {
+  assert(client.listDatabases("defau.*") == Seq("default"))
+}
+
+test(s"$version: alterDatabase") {
+  val database = client.getDatabase("temporary").copy(properties = 
Map("flag" -> "true"))
+  client.alterDatabase(database)
+  assert(client.getDatabase("temporary").properties.contains("flag"))
+}
+
+test(s"$version: dropDatabase") {
+  assert(client.getDatabaseOption("temporary").isDefined)
+  client.dropDatabase("temporary", ignoreIfNotExists = false, cascade 
= true)
+  assert(client.getDatabaseOption("temporary").isEmpty)
+}
+
+
///
+// Table related API
+
///
+
 test(s"$version: createTable") {
-  val table =
-CatalogTable(
-  identifier = TableIdentifier("src", Some("default")),
-  tableType = CatalogTableType.MANAGED,
-  schema = Seq(CatalogColumn("key", "int")),
-  storage = CatalogStorageFormat(
-locationUri = None,
-inputFormat = 
Some(classOf[org.apache.hadoop.mapred.TextInputFormat].getName),
-outputFormat = Some(
-  
classOf[org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat[_, 
_]].getName),
-serde = 
Some(classOf[org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe].getName()),
-compressed = false,
-serdeProperties = Map.empty
-  ))
-
-  client.createTable(table, ignoreIfExists = false)
+  client.createTable(table("default", tableName = "src"), 
ignoreIfExists = false)
+  // Creates a temporary table
+  client.createTable(table("default", "temporary"), ignoreIfExists = 
false)
--- End diff

[GitHub] spark pull request: [SPARK-15315][SQL] Adding error check to the C...

2016-05-16 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13105#discussion_r63450323
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/DefaultSource.scala
 ---
@@ -172,4 +173,15 @@ class DefaultSource extends FileFormat with 
DataSourceRegister {
 .mapPartitions(_.map(pair => new String(pair._2.getBytes, 0, 
pair._2.getLength, charset)))
 }
   }
+
+  private def verifySchema(schema: StructType): Unit = {
+schema.foreach { field =>
+  field.dataType match {
+case _: ArrayType | _: MapType | _: StructType =>
+  throw new AnalysisException(
--- End diff --

Just for a note, I am less sure of `AnalysisException`. It seems JSON uses 
`sys.error()`. CSV data source uses `UnsupportedOperationException`, 
`RuntimeException` and etc. Maybe we need to make the exceptions consistent 
later.

It seems `AnalysisException` is 

>Thrown when a query fails to analyze, usually because the query itself is 
invalid.

So, I think it would be nicer to be consistent with CSV existing behaviour 
or JSON one. FYI, CSV is throwing `RuntimeException` when it fails to read due 
to types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15315][SQL] Adding error check to the C...

2016-05-16 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/13105#issuecomment-219588474
  
Just for a note, JSON data source is doing this in  via 
[`JacsonGenerator.apply()`](https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala#L36-L86)
 which is equivalent with 
[`CSVRelation.rowToString()`](https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala#L159-L165).
 However, it seems JSON is doing this because it needs to traverse the schema 
as a tree to verify if each type is supported or not. So, it seems this is 
doing this while writing each row.

For CSV, it does not need to traverse a tree but just look the root types 
because it does not support nested types. So, schema can be checked faster.

Therefore, it seems reasonable to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

2016-05-16 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/11996#discussion_r63449411
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 assert(TaskLocation("executor_host1_3") === 
ExecutorCacheTaskLocation("host1", "3"))
   }
 
+  test("Kill other task attempts when one attempt belonging to the same 
task succeeds") {
+sc = new SparkContext("local", "test")
+val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", 
"host2"))
+val taskSet = FakeTask.createTaskSet(4)
+val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
+val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = 
taskSet.tasks.map { task =>
+  task.initialAccumulators.map { a => a.toInfo(Some(0L), None) }
+}
+// Offer resources for 4 tasks to start
+for ((k, v) <- List(
+"exec1" -> "host1",
+"exec1" -> "host1",
+"exec2" -> "host2",
+"exec2" -> "host2")) {
+  val taskOption = manager.resourceOffer(k, v, NO_PREF)
+  assert(taskOption.isDefined)
+  val task = taskOption.get
+  assert(task.executorId === k)
+}
+assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
+// Complete the 3 tasks and leave 1 task in running
+for (id <- Set(0, 1, 2)) {
+  manager.handleSuccessfulTask(id, createTaskResult(id, 
accumUpdatesByTask(id)))
+  assert(sched.endedTasks(id) === Success)
+}
+
+// Wait for the threshold time to start speculative attempt for the 
running task
+Thread.sleep(100)
+val speculation = manager.checkSpeculatableTasks
+assert(speculation === true)
+// Offer resource to start the speculative attempt for the running task
+val taskOption5 = manager.resourceOffer("exec1", "host1", NO_PREF)
+assert(taskOption5.isDefined)
+val task5 = taskOption5.get
+assert(task5.taskId === 4)
+assert(task5.executorId === "exec1")
+assert(task5.attemptNumber === 1)
+sched.backend = mock(classOf[SchedulerBackend])
+// Complete the speculative attempt for the running task
+manager.handleSuccessfulTask(4, createTaskResult(3, 
accumUpdatesByTask(3)))
+assert(sched.endedTasks(3) === Success)
--- End diff --

Why isn't this Killed? My understanding is that the task set finishing 
should trigger this call: 
https://github.com/devaraj-kavali/spark/blob/ba9ffab65f9f003af3a27671b8610525c2e38d84/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L733
 which will cause the FakeDagScheduler to set sched.endedTasks(3) to Killed.  
Or does that happen *before* the successful attempt happens, so the successful 
one overrides it? (in any case, can you add a comment describing this?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

2016-05-16 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/11996#discussion_r63448950
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 assert(TaskLocation("executor_host1_3") === 
ExecutorCacheTaskLocation("host1", "3"))
   }
 
+  test("Kill other task attempts when one attempt belonging to the same 
task succeeds") {
+sc = new SparkContext("local", "test")
+val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", 
"host2"))
+val taskSet = FakeTask.createTaskSet(4)
+val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
+val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = 
taskSet.tasks.map { task =>
+  task.initialAccumulators.map { a => a.toInfo(Some(0L), None) }
+}
+// Offer resources for 4 tasks to start
+for ((k, v) <- List(
+"exec1" -> "host1",
+"exec1" -> "host1",
+"exec2" -> "host2",
+"exec2" -> "host2")) {
+  val taskOption = manager.resourceOffer(k, v, NO_PREF)
+  assert(taskOption.isDefined)
+  val task = taskOption.get
+  assert(task.executorId === k)
+}
+assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
+// Complete the 3 tasks and leave 1 task in running
+for (id <- Set(0, 1, 2)) {
+  manager.handleSuccessfulTask(id, createTaskResult(id, 
accumUpdatesByTask(id)))
+  assert(sched.endedTasks(id) === Success)
+}
+
+// Wait for the threshold time to start speculative attempt for the 
running task
+Thread.sleep(100)
+val speculation = manager.checkSpeculatableTasks
+assert(speculation === true)
--- End diff --

here just do "assert(manager.checkSpeculatableTasks)"? (since the === isn't 
helpful for boolean values, where it's obvious what the expected / actual were)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15315][SQL] Adding error check to the C...

2016-05-16 Thread sureshthalamati

Github user sureshthalamati commented on a diff in the pull request:

https://github.com/apache/spark/pull/13105#discussion_r63448886
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/DefaultSource.scala
 ---
@@ -172,4 +173,15 @@ class DefaultSource extends FileFormat with 
DataSourceRegister {
 .mapPartitions(_.map(pair => new String(pair._2.getBytes, 0, 
pair._2.getLength, charset)))
 }
   }
+
+  private def verifySchema(schema: StructType): Unit = {
+schema.foreach {
+  field => field.dataType match {
--- End diff --

Thank your for the feed back.  Fixed it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

2016-05-16 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/11996#discussion_r63448866
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 assert(TaskLocation("executor_host1_3") === 
ExecutorCacheTaskLocation("host1", "3"))
   }
 
+  test("Kill other task attempts when one attempt belonging to the same 
task succeeds") {
+sc = new SparkContext("local", "test")
+val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", 
"host2"))
+val taskSet = FakeTask.createTaskSet(4)
+val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
+val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = 
taskSet.tasks.map { task =>
+  task.initialAccumulators.map { a => a.toInfo(Some(0L), None) }
+}
+// Offer resources for 4 tasks to start
+for ((k, v) <- List(
+"exec1" -> "host1",
+"exec1" -> "host1",
+"exec2" -> "host2",
+"exec2" -> "host2")) {
+  val taskOption = manager.resourceOffer(k, v, NO_PREF)
+  assert(taskOption.isDefined)
+  val task = taskOption.get
+  assert(task.executorId === k)
+}
+assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
+// Complete the 3 tasks and leave 1 task in running
+for (id <- Set(0, 1, 2)) {
+  manager.handleSuccessfulTask(id, createTaskResult(id, 
accumUpdatesByTask(id)))
+  assert(sched.endedTasks(id) === Success)
+}
+
+// Wait for the threshold time to start speculative attempt for the 
running task
+Thread.sleep(100)
--- End diff --

What about adding an argument to checkSpeculatableTasks to control the 
magic-100 value, so something like

checkSpeculatableTasks(minTimeToSpeculation: Int  = 100) and then we can 
set it to 0 for tests? In general it's nice to avoid adding sleeps to the 
(already slow) tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

2016-05-16 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/11996#discussion_r63448080
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -620,6 +620,14 @@ private[spark] class TaskSetManager(
 // Note: "result.value()" only deserializes the value when it's called 
at the first time, so
 // here "result.value()" just returns the value and won't block other 
threads.
 sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), 
result.accumUpdates, info)
+// Kill other task attempts if any as the one attempt succeeded
+for (attemptInfo <- taskAttempts(index) if attemptInfo.attemptNumber 
!= info.attemptNumber
--- End diff --

I think you don't need the middle condition here (if 
attemptInfo.attemptNumber != info.attemptNumber) since this attempt will no 
longer be running (since markSuccessful() was called above), so the last 
condition will fail?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15315][SQL] Adding error check to the C...

2016-05-16 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13105#discussion_r63447722
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/DefaultSource.scala
 ---
@@ -172,4 +173,15 @@ class DefaultSource extends FileFormat with 
DataSourceRegister {
 .mapPartitions(_.map(pair => new String(pair._2.getBytes, 0, 
pair._2.getLength, charset)))
 }
   }
+
+  private def verifySchema(schema: StructType): Unit = {
+schema.foreach {
+  field => field.dataType match {
--- End diff --

Oh, I should have said this clearly. Here is a good documentation, 
(https://github.com/databricks/scala-style-guide#pattern-matching).

I think

```scala
schema.foreach { field =>
  field.dataType match { 
...
```
would be nicer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

2016-05-16 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/11996#discussion_r63447725
  
--- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala ---
@@ -337,6 +337,16 @@ private[spark] object UIUtils extends Logging {
   failed: Int,
   skipped: Int,
   total: Int): Seq[Node] = {
+makeProgressBar(started, completed, failed, skipped, 0, total)
--- End diff --

can you use a named param here ("killed = 0")?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

2016-05-16 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/11996#discussion_r63447507
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala ---
@@ -71,11 +73,16 @@ class TaskInfo(
 failed = true
   }
 
+  private[spark] def markKilled(time: Long = System.currentTimeMillis) {
--- End diff --

Can you consolidate this and the two methods above into a single 
markFinished method that accepts a TaskState and a time? And then that method 
can handle matching the TaskState to the appropriate value of killed / failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15315][SQL] Adding error check to the C...

2016-05-16 Thread sureshthalamati

Github user sureshthalamati commented on a diff in the pull request:

https://github.com/apache/spark/pull/13105#discussion_r63447396
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/DefaultSource.scala
 ---
@@ -172,4 +173,13 @@ class DefaultSource extends FileFormat with 
DataSourceRegister {
 .mapPartitions(_.map(pair => new String(pair._2.getBytes, 0, 
pair._2.getLength, charset)))
 }
   }
+
+  private def verifySchema(schema: StructType): Unit = {
+schema.foreach(field => field.dataType match {
--- End diff --

Thank you @HyukjinKwon for reviewing the PR. Addressed your comment. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

2016-05-16 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/11996#discussion_r63446802
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -620,6 +620,14 @@ private[spark] class TaskSetManager(
 // Note: "result.value()" only deserializes the value when it's called 
at the first time, so
 // here "result.value()" just returns the value and won't block other 
threads.
 sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), 
result.accumUpdates, info)
+// Kill other task attempts if any as the one attempt succeeded
--- End diff --

Can you change this to "Kill any other attempts for the same task (since 
those are unnecessary now that one attempt completed successfully)."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14906][ML] Move VectorUDT and MatrixUDT...

2016-05-16 Thread dbtsai

Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/12870#issuecomment-219583049
  
Ping @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

2016-05-16 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/11996#discussion_r63446363
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -620,6 +620,14 @@ private[spark] class TaskSetManager(
 // Note: "result.value()" only deserializes the value when it's called 
at the first time, so
 // here "result.value()" just returns the value and won't block other 
threads.
 sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), 
result.accumUpdates, info)
+// Kill other task attempts if any as the one attempt succeeded
+for (attemptInfo <- taskAttempts(index) if attemptInfo.attemptNumber 
!= info.attemptNumber
+&& attemptInfo.running) {
+  logInfo("Killing attempt " + attemptInfo.attemptNumber + " for task 
" + attemptInfo.id +
--- End diff --

Can you use string interpolation to make this more concise / readable? 
(e.g., s"Killing attempt ${attemptInfo.attemptNumber} for task 
${attemptInfo.id} in stage " ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14752][SQL] LazilyGenerateOrdering thro...

2016-05-16 Thread bomeng

Github user bomeng commented on the pull request:

https://github.com/apache/spark/pull/12661#issuecomment-219582450
  
Since this one has been here for more than 10 days, I've provided another 
approach with new test case. Please take a look. Thanks.
[PR for SPARK-14752](https://github.com/apache/spark/pull/13141)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread kayousterhout

Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/12060#issuecomment-219582446
  
This LGTM with the small comment changes I suggested. @markhamstra any 
objections to this? Mark / @rxin thoughts on merging it into the 2.0 branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14752] [SQL] fix kryo ordering serializ...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13141#issuecomment-219582225
  
**[Test build #58658 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58658/consoleFull)**
 for PR 13141 at commit 
[`66f0e6c`](https://github.com/apache/spark/commit/66f0e6c352bae9e65eadada19b1cfead8b06b3aa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15247][SQL] Set the default number of p...

2016-05-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13137#issuecomment-219580597
  
**[Test build #58657 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58657/consoleFull)**
 for PR 13137 at commit 
[`4aafb37`](https://github.com/apache/spark/commit/4aafb3789bb67020be50d86611d58c43baafafec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14752] [SQL] fix kryo ordering serializ...

2016-05-16 Thread bomeng

GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/13141

[SPARK-14752] [SQL] fix kryo ordering serialization

## What changes were proposed in this pull request?

When using Kryo as serializer and we will get `NullPointerException` 
exception for query with `ORDER BY'.

## How was this patch tested?

I've added a new test cases to HashedRelationSuite.scala, since this issue 
is kind of related to Spark-14521.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark SPARK-14752

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13141.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13141


commit 66f0e6c352bae9e65eadada19b1cfead8b06b3aa
Author: bomeng 
Date:   2016-05-16T23:42:22Z

fix kryo serialization




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15280] [Input/Output] Refactored OrcOut...

2016-05-16 Thread seyfe

Github user seyfe commented on the pull request:

https://github.com/apache/spark/pull/13066#issuecomment-219581242
  
Hi @liancheng and @yhuai,

Would you mind looking at this pull request? I would appreciate that.

Thanks,
Ergin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15334][SQL] HiveClient facade not compa...

2016-05-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13127#discussion_r63445044
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -130,109 +124,343 @@ class VersionsSuite extends SparkFunSuite with 
Logging {
 test(s"$version: create client") {
   client = null
   System.gc() // Hack to avoid SEGV on some JVM versions.
+  val hadoopConf = new Configuration();
+  hadoopConf.set("test", "success")
   client =
 IsolatedClientLoader.forVersion(
   hiveMetastoreVersion = version,
   hadoopVersion = VersionInfo.getVersion,
   sparkConf = sparkConf,
-  hadoopConf = new Configuration(),
+  hadoopConf,
   config = buildConf(),
   ivyPath = ivyPath).createClient()
 }
 
+def table(database: String, tableName: String): CatalogTable = {
+  CatalogTable(
+identifier = TableIdentifier(tableName, Some(database)),
+tableType = CatalogTableType.MANAGED,
+schema = Seq(CatalogColumn("key", "int")),
+storage = CatalogStorageFormat(
+  locationUri = None,
+  inputFormat = 
Some(classOf[org.apache.hadoop.mapred.TextInputFormat].getName),
+  outputFormat = Some(
+
classOf[org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat[_, 
_]].getName),
+  serde = 
Some(classOf[org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe].getName()),
+  compressed = false,
+  serdeProperties = Map.empty
+))
+}
+
+
///
+// Database related API
+
///
+
+val tempDatabasePath = Utils.createTempDir().getCanonicalPath
+
 test(s"$version: createDatabase") {
-  val db = CatalogDatabase("default", "desc", "loc", Map())
-  client.createDatabase(db, ignoreIfExists = true)
+  val defaultDB = CatalogDatabase("default", "desc", "loc", Map())
+  client.createDatabase(defaultDB, ignoreIfExists = true)
+  val tempDB = CatalogDatabase(
+"temporary", description = "test create", tempDatabasePath, Map())
+  client.createDatabase(tempDB, ignoreIfExists = true)
+}
+
+test(s"$version: setCurrentDatabase") {
+  client.setCurrentDatabase("default")
+}
+
+test(s"$version: getDatabase") {
+  // No exception should be thrown
+  client.getDatabase("default")
+}
+
+test(s"$version: getDatabaseOption") {
+  assert(client.getDatabaseOption("default").isDefined)
+  assert(client.getDatabaseOption("nonexist") == None)
 }
 
+test(s"$version: listDatabases") {
+  assert(client.listDatabases("defau.*") == Seq("default"))
+}
+
+test(s"$version: alterDatabase") {
+  val database = client.getDatabase("temporary").copy(properties = 
Map("flag" -> "true"))
+  client.alterDatabase(database)
+  assert(client.getDatabase("temporary").properties.contains("flag"))
+}
+
+test(s"$version: dropDatabase") {
+  assert(client.getDatabaseOption("temporary").isDefined)
+  client.dropDatabase("temporary", ignoreIfNotExists = false, cascade 
= true)
+  assert(client.getDatabaseOption("temporary").isEmpty)
+}
+
+
///
+// Table related API
+
///
+
 test(s"$version: createTable") {
-  val table =
-CatalogTable(
-  identifier = TableIdentifier("src", Some("default")),
-  tableType = CatalogTableType.MANAGED,
-  schema = Seq(CatalogColumn("key", "int")),
-  storage = CatalogStorageFormat(
-locationUri = None,
-inputFormat = 
Some(classOf[org.apache.hadoop.mapred.TextInputFormat].getName),
-outputFormat = Some(
-  
classOf[org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat[_, 
_]].getName),
-serde = 
Some(classOf[org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe].getName()),
-compressed = false,
-serdeProperties = Map.empty
-  ))
-
-  client.createTable(table, ignoreIfExists = false)
+  client.createTable(table("default", tableName = "src"), 
ignoreIfExists = false)
+  // Creates a temporary table
+  client.createTable(table("default", "temporary"), ignoreIfExists = 
false)
+}
+

[GitHub] spark pull request: [SPARK-15334][SQL] HiveClient facade not compa...

2016-05-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13127#discussion_r63444035
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -130,109 +124,343 @@ class VersionsSuite extends SparkFunSuite with 
Logging {
 test(s"$version: create client") {
   client = null
   System.gc() // Hack to avoid SEGV on some JVM versions.
+  val hadoopConf = new Configuration();
+  hadoopConf.set("test", "success")
   client =
 IsolatedClientLoader.forVersion(
   hiveMetastoreVersion = version,
   hadoopVersion = VersionInfo.getVersion,
   sparkConf = sparkConf,
-  hadoopConf = new Configuration(),
+  hadoopConf,
   config = buildConf(),
   ivyPath = ivyPath).createClient()
 }
 
+def table(database: String, tableName: String): CatalogTable = {
+  CatalogTable(
+identifier = TableIdentifier(tableName, Some(database)),
+tableType = CatalogTableType.MANAGED,
+schema = Seq(CatalogColumn("key", "int")),
+storage = CatalogStorageFormat(
+  locationUri = None,
+  inputFormat = 
Some(classOf[org.apache.hadoop.mapred.TextInputFormat].getName),
+  outputFormat = Some(
+
classOf[org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat[_, 
_]].getName),
+  serde = 
Some(classOf[org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe].getName()),
+  compressed = false,
+  serdeProperties = Map.empty
+))
+}
+
+
///
+// Database related API
+
///
+
+val tempDatabasePath = Utils.createTempDir().getCanonicalPath
+
 test(s"$version: createDatabase") {
-  val db = CatalogDatabase("default", "desc", "loc", Map())
-  client.createDatabase(db, ignoreIfExists = true)
+  val defaultDB = CatalogDatabase("default", "desc", "loc", Map())
+  client.createDatabase(defaultDB, ignoreIfExists = true)
+  val tempDB = CatalogDatabase(
+"temporary", description = "test create", tempDatabasePath, Map())
+  client.createDatabase(tempDB, ignoreIfExists = true)
+}
+
+test(s"$version: setCurrentDatabase") {
+  client.setCurrentDatabase("default")
+}
+
+test(s"$version: getDatabase") {
+  // No exception should be thrown
+  client.getDatabase("default")
+}
+
+test(s"$version: getDatabaseOption") {
+  assert(client.getDatabaseOption("default").isDefined)
+  assert(client.getDatabaseOption("nonexist") == None)
 }
 
+test(s"$version: listDatabases") {
+  assert(client.listDatabases("defau.*") == Seq("default"))
+}
+
+test(s"$version: alterDatabase") {
+  val database = client.getDatabase("temporary").copy(properties = 
Map("flag" -> "true"))
+  client.alterDatabase(database)
+  assert(client.getDatabase("temporary").properties.contains("flag"))
+}
+
+test(s"$version: dropDatabase") {
+  assert(client.getDatabaseOption("temporary").isDefined)
+  client.dropDatabase("temporary", ignoreIfNotExists = false, cascade 
= true)
+  assert(client.getDatabaseOption("temporary").isEmpty)
+}
+
+
///
+// Table related API
+
///
+
 test(s"$version: createTable") {
-  val table =
-CatalogTable(
-  identifier = TableIdentifier("src", Some("default")),
-  tableType = CatalogTableType.MANAGED,
-  schema = Seq(CatalogColumn("key", "int")),
-  storage = CatalogStorageFormat(
-locationUri = None,
-inputFormat = 
Some(classOf[org.apache.hadoop.mapred.TextInputFormat].getName),
-outputFormat = Some(
-  
classOf[org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat[_, 
_]].getName),
-serde = 
Some(classOf[org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe].getName()),
-compressed = false,
-serdeProperties = Map.empty
-  ))
-
-  client.createTable(table, ignoreIfExists = false)
+  client.createTable(table("default", tableName = "src"), 
ignoreIfExists = false)
+  // Creates a temporary table
+  client.createTable(table("default", "temporary"), ignoreIfExists = 
false)
--- End diff --

[GitHub] spark pull request: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc lay...

2016-05-16 Thread felixcheung

Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/13109#issuecomment-219578322
  
that's fine, I don't know the history of putting stats column function into 
one rd page though.
I agree it is fine to group function by the name `corr` DataFrame & `corr` 
column going to the same rd page.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/12060#discussion_r63443356
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -752,23 +751,20 @@ class DAGScheduler(
 submitStage(stage)
   }
 }
-submitWaitingStages()
   }
 
   /**
* Check for waiting stages which are now eligible for resubmission.
-   * Ordinarily run on every iteration of the event loop.
+   * Ordinarily run after the parent stage completed successfully.
*/
-  private def submitWaitingStages() {
-// TODO: We might want to run this less often, when we are sure that 
something has become
-// runnable that wasn't before.
+  private def submitWaitingChildStages(parent: Stage) {
 logTrace("Checking for newly runnable parent stages")
--- End diff --

Can you update this to say s"Checking if any dependencies of $parent are 
now runnable"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/12060#discussion_r63443272
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -752,23 +751,20 @@ class DAGScheduler(
 submitStage(stage)
   }
 }
-submitWaitingStages()
   }
 
   /**
* Check for waiting stages which are now eligible for resubmission.
-   * Ordinarily run on every iteration of the event loop.
+   * Ordinarily run after the parent stage completed successfully.
--- End diff --

Can you update this to say something like:

"Submits stages that depend on the given parent stage. Called when the 
parent stage completes successfully."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/12060#discussion_r63443044
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1357,7 +1345,6 @@ class DAGScheduler(
   logDebug("Additional executor lost message for " + execId +
"(epoch " + currentEpoch + ")")
 }
--- End diff --

This appears to be a non-issue, because we handle lost shuffle output 
separately, when we get a FetchFailure from a task that tries to fetch the 
output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14269][SCHEDULER] Eliminate unnecessary...

2016-05-16 Thread kayousterhout

Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/12060#discussion_r63442702
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1357,7 +1345,6 @@ class DAGScheduler(
   logDebug("Additional executor lost message for " + execId +
"(epoch " + currentEpoch + ")")
 }
--- End diff --

Is it necessary to submit some newly-waiting stages here (e.g., if shuffle 
output was lost for a map stage, so now that map stage needs to be re-run)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14906][ML] Copy linalg in PySpark to ne...

2016-05-16 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/13099#issuecomment-219575757
  
I'm making a pass now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7424] [ML] ML ClassificationModel shoul...

2016-05-16 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12066#discussion_r63440571
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala ---
@@ -179,7 +230,7 @@ abstract class ClassificationModel[FeaturesType, M <: 
ClassificationModel[Featur
 }
 predictUDF(col(getFeaturesCol))
   }
-  outputData = outputData.withColumn(getPredictionCol, predUDF)
+  outputData = outputData.withColumn(getPredictionCol, predUDF, 
predictionMetadata)
--- End diff --

The problem is that the name may be different.  transform can be given a 
new output column name, so we may need to update the name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15112][SQL] Allows query plan schema an...

2016-05-16 Thread tedyu

Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/12952#discussion_r63439615
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -163,15 +164,17 @@ object EliminateSerialization extends 
Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case d @ DeserializeToObject(_, _, s: SerializeFromObject)
 if d.outputObjectType == s.inputObjectType =>
-  // A workaround for SPARK-14803. Remove this after it is fixed.
-  if (d.outputObjectType.isInstanceOf[ObjectType] &&
-  d.outputObjectType.asInstanceOf[ObjectType].cls == 
classOf[org.apache.spark.sql.Row]) {
-s.child
-  } else {
-// Adds an extra Project here, to preserve the output expr id of 
`DeserializeToObject`.
-val objAttr = Alias(s.child.output.head, "obj")(exprId = 
d.output.head.exprId)
-Project(objAttr :: Nil, s.child)
+  d.outputObjectType match {
+// A workaround for SPARK-14803. Remove this after it is fixed.
+case ObjectType(cls) if cls == classOf[Row] =>
--- End diff --

Now that SPARK-14803 is fixed, this can be dropped.

Should I open a PR ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15334][SQL] HiveClient facade not compa...

2016-05-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13127#discussion_r63438981
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -265,6 +343,47 @@ private[client] class Shim_v0_12 extends Shim with 
Logging {
 dropIndexMethod.invoke(hive, dbName, tableName, indexName, true: 
JBoolean)
   }
 
+  // Hive 0.12 doesn't support permanent functions. Here we use a 
in-memory map to simulate the
+  // permanent function registry.
+  private var functionMap = Map.empty[String, CatalogFunction]
+  private val functionDDLNotSupported = "Hive 0.12 doesn't support 
creating permanent functions. " +
+"Please use Hive 0.13 or higher."
--- End diff --

For Hive 0.12, let's throw an exception if users want to create a permanent 
function. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15334][SQL] HiveClient facade not compa...

2016-05-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13127#discussion_r63438911
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -208,7 +286,7 @@ private[client] class Shim_v0_12 extends Shim with 
Logging {
   predicates: Seq[Expression]): Seq[Partition] = {
 // getPartitionsByFilter() doesn't support binary comparison ops in 
Hive 0.12.
 // See HIVE-4888.
-logDebug("Hive 0.12 doesn't support predicate pushdown to metastore. " 
+
+logWarning("Hive 0.12 doesn't support predicate pushdown to metastore. 
" +
--- End diff --

For this method , it is allowed to have false positives. Let's update the 
doc of this method. Also, let's revert this change (just to avoid of letting 
users think that this warning indicates an error).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...

2016-05-16 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/13135#issuecomment-219569749
  
@yanboliang Thank you so much for taking a close look and a detailed 
explanation!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 404 matches

Mail list logo