date:20150827

[GitHub] spark pull request: [SPARK-10327][SQL] Cache Table is not working ...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8494#issuecomment-135637877
  
  [Test build #41720 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41720/console)
 for   PR 8494 at commit 
[`bfd40d9`](https://github.com/apache/spark/commit/bfd40d999b6530bc04fc03ea6591c0093e10e534).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10260][ML] Add @Since annotation to ml....

2015-08-27 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8455#discussion_r38171523
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -30,6 +30,7 @@ import org.apache.spark.sql.{DataFrame, Row}
 
 
 /**
+ * :: Experimental ::
--- End diff --

This is also necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10260][ML] Add @Since annotation to ml....

2015-08-27 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8455#issuecomment-135637836
  
LGTM except the comment above. I'll merge it after 1.5.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10327][SQL] Cache Table is not working ...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8494#issuecomment-135638024
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...

2015-08-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8492#issuecomment-135640246
  
I'd follow postgres here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8057][Core]Call TaskAttemptContext.getT...

2015-08-27 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/6599#issuecomment-135598700
  
 I think that we should also backport this to branch-1.4.

+1 since we fix it in 1.5.0. Just confirmed this one didn't have conflicts 
with branch-1.4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8464#issuecomment-135603859
  
  [Test build #41717 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41717/consoleFull)
 for   PR 8464 at commit 
[`7dcd502`](https://github.com/apache/spark/commit/7dcd502fc7278978fab5a233f4a81fefcca8bf72).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/8464#discussion_r38166499
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala ---
@@ -0,0 +1,45 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.local
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+
+
+case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode {
--- End diff --

I think we still need `filter`, or `map` for these iterator trees. @rxin is 
there anything I misunderstand for the `LocalNode` design?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8505][SparkR] Add settings to kick `lin...

2015-08-27 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/7883#issuecomment-135615906
  
Alright I'm going to merge this as its better to do so before more breaking 
style changes get in. Will watch Jenkins for the next couple of hours to make 
sure things are fine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8813][SQL] Combine files when there're ...

2015-08-27 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8125#discussion_r38167285
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/CombineSmallFile.scala ---
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources
+
+import org.apache.hadoop.fs.{FileStatus, FileSystem, Path}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.SQLContext
+
+object CombineSmallFile {
+  def combineWithFiles[T](rdd: RDD[T], sqlContext: SQLContext, inputFiles: 
Array[FileStatus])
+  : RDD[T] = {
+if (sqlContext.conf.combineSmallFile) {
+  val totalLen = inputFiles.map { file =
+if (file.isDir) 0L else file.getLen
+  }.sum
+  val numPartitions = (totalLen / sqlContext.conf.splitSize + 1).toInt
+  rdd.coalesce(numPartitions)
--- End diff --

I think this is a very hack way to solve this problem. As we can not tell 
how the the data source to be split, even for Hadoop, the split size just a 
hint, use that for computing the partition number probably too risky for a 
generic data process framework.

And the `RDD.coalesce` actually will combine the splits in a arbitrary way, 
it's probably causes the data skew, as we most likely combine the large 
partitions into a a single task.

IMO, I'd like to deep investigate how Hive to combine the small partitions, 
by using the `CombineHiveInputFormat` or `HiveInputFormat`, which seems has a 
strategy to select the partitions according to both input format, and also keep 
the balance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...

2015-08-27 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/8343#issuecomment-135616119
  
Thanks @lresende -- LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8343#issuecomment-135620132
  
  [Test build #41718 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41718/console)
 for   PR 8343 at commit 
[`472c767`](https://github.com/apache/spark/commit/472c76714c25b909e281d8079b7ead6c152d4512).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10327][SQL] Cache Table is not working ...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8494#issuecomment-135620007
  
  [Test build #41720 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41720/consoleFull)
 for   PR 8494 at commit 
[`bfd40d9`](https://github.com/apache/spark/commit/bfd40d999b6530bc04fc03ea6591c0093e10e534).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8495#issuecomment-135621721
  
  [Test build #41722 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41722/consoleFull)
 for   PR 8495 at commit 
[`4758a87`](https://github.com/apache/spark/commit/4758a87ea3b74914ffd2870e1a736472944c4a04).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit

2015-08-27 Thread yu-iskw

Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/8495#issuecomment-135624869
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10326] [yarn] Fix app submission on win...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8493#issuecomment-135629824
  
  [Test build #41716 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41716/console)
 for   PR 8493 at commit 
[`a14dba5`](https://github.com/apache/spark/commit/a14dba5233526f844a68d77c5d765d98b0534e2a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10326] [yarn] Fix app submission on win...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8493#issuecomment-135629856
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10326] [yarn] Fix app submission on win...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8493#issuecomment-135629857
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41716/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10082] [MLlib] Validate i, j in apply D...

2015-08-27 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8271#issuecomment-135638595
  
I think this requires some micro-benchmark. I want to see the overhead of 
additional two checks. We can also test a single `require` statement that 
contains both checks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8362#issuecomment-135639349
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8362#issuecomment-135639351
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41719/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes some typos in HiveCo...

2015-08-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8481#issuecomment-135639367
  
Merging this in master  branch-1.5. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8343#issuecomment-135595941
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8343#issuecomment-135595930
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10326] [yarn] Fix app submission on win...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8493#issuecomment-135601511
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10326] [yarn] Fix app submission on win...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8493#issuecomment-135601503
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10327][SQL] Cache Table is not working ...

2015-08-27 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/8494#issuecomment-135618510
  
cc @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10260][ML] Add @Since annotation to ml....

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8455#issuecomment-135637446
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9679][ML][PYSPARK] Add Python API for S...

2015-08-27 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8118#discussion_r38171484
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -29,14 +29,14 @@ import org.apache.spark.sql.types.{ArrayType, 
StringType, StructField, StructTyp
 /**
  * stop words list
  */
-private object StopWords {
+protected[spark] object StopWords {
--- End diff --

`private[spark]` should be the same but appears more often


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8492#issuecomment-135637490
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41725/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10260][ML] Add @Since annotation to ml....

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8455#issuecomment-135637464
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8492#issuecomment-135637488
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10260][ML] Add @Since annotation to ml....

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8455#issuecomment-135637651
  
  [Test build #41728 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41728/consoleFull)
 for   PR 8455 at commit 
[`2c0a4d0`](https://github.com/apache/spark/commit/2c0a4d0e2cd6da8371bab064c83e8e155aa5183f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/8464#discussion_r38164639
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala ---
@@ -0,0 +1,45 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.local
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+
+
+case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode {
+
+  private[this] var count = 0
+
+  override def output: Seq[Attribute] = child.output
+
+  override def open(): Unit = child.open()
--- End diff --

LocalNode cannot be reused, just like Iterator. So it's not necessary to 
reset it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7878#issuecomment-135604744
  
  [Test build #41711 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41711/console)
 for   PR 7878 at commit 
[`cf58c49`](https://github.com/apache/spark/commit/cf58c49c3be31c8e33639ba68eca16398f98c7f6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7878#issuecomment-135604771
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41711/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7878#issuecomment-135604767
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8343#issuecomment-135617402
  
  [Test build #41718 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41718/consoleFull)
 for   PR 8343 at commit 
[`472c767`](https://github.com/apache/spark/commit/472c76714c25b909e281d8079b7ead6c152d4512).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit

2015-08-27 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/8495#issuecomment-135620933
  
@yu-iskw I also found a minor bug in lint-r that I just fixed. Please let 
me know if that is good. With this change lint-r passes on my machine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10260][ML] Add @Since annotation to ml....

2015-08-27 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/8455#discussion_r38169722
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -30,8 +30,11 @@ import org.apache.spark.sql.{DataFrame, Row}
 
 
 /**
+ * :: Experimental ::
  * Common params for KMeans and KMeansModel
  */
+@Since(1.5.0)
+@Experimental
--- End diff --

Both `Since` and `Experimental` are not necessary because this is package 
private.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...

2015-08-27 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/8492#issuecomment-135634760
  
From PostgresSQL:
```
If the array expression yields a null array, the result of ANY will be 
null. If the left-hand expression yields null, the result of ANY is ordinarily 
null (though a non-strict comparison operator could possibly yield a different 
result). Also, if the right-hand array contains any null elements and no true 
comparison result is obtained, the result of ANY will be null, not false 
(again, assuming a strict comparison operator). This is in accordance with 
SQL's normal rules for Boolean combinations of null values.
```

It's more consistent in PostgresSQL, I'd like to follow it. 

cc @rxin @marmbrus 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7878#issuecomment-135635298
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...

2015-08-27 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/7878#issuecomment-135635256
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Core] whitespace fixes in RangePartitioner

2015-08-27 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/8480#issuecomment-135366372
  
Yes, I think that commit was to pass style checks though. I assume this 
doesn't fail anything? I mean, I don't mind just merging this, but in my 
personal opinion I'd like to lightly push back on very small non-functional 
changes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...

2015-08-27 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/8451#issuecomment-135366571
  
The other sibling PRs look good and I can merge them. This looks good after 
the `Strings` change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10256][ML] Removes guava dependency fro...

2015-08-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8447


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10254][ML]Removes Guava dependencies in...

2015-08-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8445


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10255][ML] Removes Guava dependencies f...

2015-08-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8446


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8464#issuecomment-135368814
  
  [Test build #41676 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41676/console)
 for   PR 8464 at commit 
[`62b8d24`](https://github.com/apache/spark/commit/62b8d2411d5f3be1460f68c02e6af6e3ab10fdce).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class LimitNode(limit: Int, child: LocalNode) extends 
UnaryLocalNode `
  * `case class UnionNode(children: Seq[LocalNode]) extends LocalNode `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8464#issuecomment-135368942
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8464#issuecomment-135368946
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41676/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9613] [HOTFIX] Fix usage of JavaConvert...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8479#issuecomment-135369577
  
  [Test build #1696 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1696/console)
 for   PR 8479 at commit 
[`b6c17e7`](https://github.com/apache/spark/commit/b6c17e7daad09096e0bed94e677226b61d349bc1).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class LogisticRegressionModel @Since(1.3.0) (`
  * `class SVMModel @Since(1.1.0) (`
  * `class GaussianMixtureModel @Since(1.3.0) (`
  * `class KMeansModel @Since(1.1.0) (@Since(1.0.0) val clusterCenters: 
Array[Vector])`
  * `class PowerIterationClusteringModel @Since(1.3.0) (`
  * `class StreamingKMeansModel @Since(1.2.0) (`
  * `class StreamingKMeans @Since(1.2.0) (`
  * `class BinaryClassificationMetrics @Since(1.3.0) (`
  * `class MulticlassMetrics @Since(1.1.0) (predictionAndLabels: 
RDD[(Double, Double)]) `
  * `class MultilabelMetrics @Since(1.2.0) (predictionAndLabels: 
RDD[(Array[Double], Array[Double])]) `
  * `class RegressionMetrics @Since(1.2.0) (`
  * `class ChiSqSelectorModel @Since(1.3.0) (`
  * `class ChiSqSelector @Since(1.3.0) (`
  * `class ElementwiseProduct @Since(1.4.0) (`
  * `class IDF @Since(1.2.0) (@Since(1.2.0) val minDocFreq: Int) `
  * `class Normalizer @Since(1.1.0) (p: Double) extends VectorTransformer 
`
  * `class PCA @Since(1.4.0) (@Since(1.4.0) val k: Int) `
  * `class StandardScaler @Since(1.1.0) (withMean: Boolean, withStd: 
Boolean) extends Logging `
  * `class StandardScalerModel @Since(1.3.0) (`
  * `class FPGrowthModel[Item: ClassTag] @Since(1.3.0) (`
  * `  class FreqItemset[Item] @Since(1.3.0) (`
  * `  class FreqSequence[Item] @Since(1.5.0) (`
  * `class PrefixSpanModel[Item] @Since(1.5.0) (`
  * `class DenseMatrix @Since(1.3.0) (`
  * `class SparseMatrix @Since(1.3.0) (`
  * `class DenseVector @Since(1.0.0) (`
  * `class SparseVector @Since(1.0.0) (`
  * `class BlockMatrix @Since(1.3.0) (`
  * `class CoordinateMatrix @Since(1.0.0) (`
  * `class IndexedRowMatrix @Since(1.0.0) (`
  * `class RowMatrix @Since(1.0.0) (`
  * `class PoissonGenerator @Since(1.1.0) (`
  * `class ExponentialGenerator @Since(1.3.0) (`
  * `class GammaGenerator @Since(1.3.0) (`
  * `class LogNormalGenerator @Since(1.3.0) (`
  * `case class Rating @Since(0.8.0) (`
  * `class MatrixFactorizationModel @Since(0.8.0) (`
  * `abstract class GeneralizedLinearModel @Since(1.0.0) (`
  * `class IsotonicRegressionModel @Since(1.3.0) (`
  * `case class LabeledPoint @Since(1.0.0) (`
  * `class LassoModel @Since(1.1.0) (`
  * `class LinearRegressionModel @Since(1.1.0) (`
  * `class RidgeRegressionModel @Since(1.1.0) (`
  * `class MultivariateGaussian @Since(1.3.0) (`
  * `case class BoostingStrategy @Since(1.4.0) (`
  * `class Strategy @Since(1.3.0) (`
  * `class DecisionTreeModel @Since(1.0.0) (`
  * `class Node @Since(1.2.0) (`
  * `class Predict @Since(1.2.0) (`
  * `class RandomForestModel @Since(1.2.0) (`
  * `class GradientBoostedTreesModel @Since(1.2.0) (`
  * `abstract class SetOperation(left: LogicalPlan, right: LogicalPlan) 
extends BinaryNode `
  * `case class Union(left: LogicalPlan, right: LogicalPlan) extends 
SetOperation(left, right) `
  * `case class Intersect(left: LogicalPlan, right: LogicalPlan) extends 
SetOperation(left, right)`
  * `case class Except(left: LogicalPlan, right: LogicalPlan) extends 
SetOperation(left, right)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/8464#discussion_r38080354
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/local/LocalNodeTest.scala
 ---
@@ -0,0 +1,192 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql.execution.local
+
+import scala.util.control.NonFatal
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.types.StructType
+
+class LocalNodeTest extends SparkFunSuite {
+
+  /**
+   * Runs the LocalNode and makes sure the answer matches the expected 
result.
+   * @param input the input data to be used.
+   * @param nodeFunction a function which accepts the input LocalNode and 
uses it to instantiate
+   * the local physical operator that's being tested.
+   * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s.
+   * @param sortAnswers if true, the answers will be sorted by their 
toString representations prior
+   *to being compared.
+   */
+  protected def checkAnswer(
+  input: DataFrame,
+  nodeFunction: LocalNode = LocalNode,
+  expectedAnswer: Seq[Row],
+  sortAnswers: Boolean = true): Unit = {
+doCheckAnswer(
+  input :: Nil,
+  nodes = nodeFunction(nodes.head),
+  expectedAnswer,
+  sortAnswers)
+  }
+
+  /**
+   * Runs the LocalNode and makes sure the answer matches the expected 
result.
+   * @param left the left input data to be used.
+   * @param right the right input data to be used.
+   * @param nodeFunction a function which accepts the input LocalNode and 
uses it to instantiate
+   * the local physical operator that's being tested.
+   * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s.
+   * @param sortAnswers if true, the answers will be sorted by their 
toString representations prior
+   *to being compared.
+   */
+  protected def checkAnswer2(
+  left: DataFrame,
+  right: DataFrame,
+  nodeFunction: (LocalNode, LocalNode) = LocalNode,
+  expectedAnswer: Seq[Row],
+  sortAnswers: Boolean = true): Unit = {
+doCheckAnswer(
+  left :: right :: Nil,
+  nodes = nodeFunction(nodes(0), nodes(1)),
+  expectedAnswer,
+  sortAnswers)
+  }
+
+  /**
+   * Runs the `LocalNode`s and makes sure the answer matches the expected 
result.
+   * @param input the input data to be used.
+   * @param nodeFunction a function which accepts a sequence of input 
`LocalNode`s and uses them to
+   * instantiate the local physical operator that's 
being tested.
+   * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s.
+   * @param sortAnswers if true, the answers will be sorted by their 
toString representations prior
+   *to being compared.
+   */
+  protected def doCheckAnswer(
+input: Seq[DataFrame],
+nodeFunction: Seq[LocalNode] = LocalNode,
+expectedAnswer: Seq[Row],
+sortAnswers: Boolean = true): Unit = {
+LocalNodeTest.checkAnswer(
+  input.map(dataFrameToSeqScanNode), nodeFunction, expectedAnswer, 
sortAnswers) match {
+  case Some(errorMessage) = fail(errorMessage)
+  case None =
+}
+  }
+
+  protected def dataFrameToSeqScanNode(df: DataFrame): SeqScanNode = {
+val output = df.queryExecution.sparkPlan.output
+val converter =
+  
CatalystTypeConverters.createToCatalystConverter(StructType.fromAttributes(output))
+new SeqScanNode(
+  output,
+  df.collect().map(r = converter(r).asInstanceOf[InternalRow]))
--- End diff --

Cool. Fixed it.


---
If your

[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes typo in non-public H...

2015-08-27 Thread liancheng

GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/8481

[SPARK-SQL] [MINOR] Fixes typo in non-public HiveContext method name



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark hive-context-typo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8481.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8481


commit ff85c949087fb9e64858f918c5c831672ae562ed
Author: Cheng Lian l...@databricks.com
Date:   2015-08-27T10:03:39Z

Fixes typo in non-public HiveContext method name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8464#issuecomment-135370380
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8464#issuecomment-135370354
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes typo in non-public H...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8481#issuecomment-135370363
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes typo in non-public H...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8481#issuecomment-135370350
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9613] [HOTFIX] Fix usage of JavaConvert...

2015-08-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8479


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8464#issuecomment-135371382
  
  [Test build #41681 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41681/consoleFull)
 for   PR 8464 at commit 
[`22e7bc0`](https://github.com/apache/spark/commit/22e7bc0b9882b637bb06ee39a66d3ece789042fa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2365] Add IndexedRDD, an efficient upda...

2015-08-27 Thread zerosign

Github user zerosign commented on the pull request:

https://github.com/apache/spark/pull/1297#issuecomment-135373104
  
Hi Ankur, 

Any update on this pull request ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...

2015-08-27 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/7520#issuecomment-135304152
  
The normalization is not done by StructObjectInspector or 
OrcStructObjectInspector, but in `SemanticAnalyzer` of Hive. I've checked with 
Hive, even the orc column names are in capital, Hive works well, the only thing 
I am not sure is about the column pruning and predicate push down, seems 
explain extended select xx of Hive doesn't give those information, maybe 
@zhzhan can give some comments on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10251][CORE] some common types are not ...

2015-08-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8465


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...

2015-08-27 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/7520#discussion_r38065116
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala ---
@@ -253,7 +260,7 @@ private[orc] case class OrcTableScan(
 maybeStructOI.map { soi =
   val (fieldRefs, fieldOrdinals) = nonPartitionKeyAttrs.map {
 case (attr, ordinal) =
-  soi.getStructFieldRef(attr.name.toLowerCase) - ordinal
--- End diff --

If don't do the normalization, is this the only place we need to change? 
Since both `StructObjectInspector` and `OrcStructObjectInspector` are working 
for the same purpose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7520#issuecomment-135308659
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7520#issuecomment-135308665
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41671/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7520#issuecomment-135308292
  
  [Test build #41671 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41671/console)
 for   PR 7520 at commit 
[`055cd09`](https://github.com/apache/spark/commit/055cd09a09fff47cf43578a19ac78b77610231ce).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10251][CORE] some common types are not ...

2015-08-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8465#issuecomment-135304194
  
Thanks - I've merged this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes some typos in HiveCo...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8481#issuecomment-135373483
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes some typos in HiveCo...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8481#issuecomment-135373539
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes some typos in HiveCo...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8481#issuecomment-135373540
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41680/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes some typos in HiveCo...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8481#issuecomment-135373462
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9170][SQL] User-provided columns should...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7520#issuecomment-135374072
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9170][SQL] User-provided columns should...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7520#issuecomment-135373958
  
  [Test build #41677 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41677/console)
 for   PR 7520 at commit 
[`a389746`](https://github.com/apache/spark/commit/a38974647ac75a359ae7495af39b93152a437d72).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10049][SPARKR][WIP] Support collecting ...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8458#issuecomment-135376204
  
  [Test build #41683 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41683/consoleFull)
 for   PR 8458 at commit 
[`02c64eb`](https://github.com/apache/spark/commit/02c64eb93b75d9ac0e2a12d8dd5a8c1ed5d143f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-10314 [CORE]RDD persist to OFF_HEAP tach...

2015-08-27 Thread romansew

GitHub user romansew opened a pull request:

https://github.com/apache/spark/pull/8482

SPARK-10314 [CORE]RDD persist to OFF_HEAP tachyon got block rdd_x_x nâ¦

SPARK-10314 [CORE]RDD persist to OFF_HEAP tachyon got block rdd_x_x not 
found exception when parallelism is big than data split size

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jd-ode/spark branch-1.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8482.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8482


commit 1da81ab8cd6e7e45e1b2d03352ecbbb1635f644c
Author: wangxiaoyu8 wangxiao...@jd.com
Date:   2015-08-27T10:41:15Z

SPARK-10314 [CORE]RDD persist to OFF_HEAP tachyon got block rdd_x_x not 
found exception when parallelism is big than data split size




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-10314 [CORE]RDD persist to OFF_HEAP tach...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8482#issuecomment-135378312
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10315] remove spark.akka.failure-detect...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8483#issuecomment-135384552
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10315] remove spark.akka.failure-detect...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8483#issuecomment-135384520
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135387576
  
/cc @JoshRosen @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135388498
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9170][SQL] User-provided columns should...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7520#issuecomment-135374076
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41677/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes some typos in HiveCo...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8481#issuecomment-135374158
  
  [Test build #41682 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41682/consoleFull)
 for   PR 8481 at commit 
[`2b414e4`](https://github.com/apache/spark/commit/2b414e4b4c8ecb9183d8497c5d5cc1c16bcde470).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10049][SPARKR][WIP] Support collecting ...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8458#issuecomment-135375355
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10049][SPARKR][WIP] Support collecting ...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8458#issuecomment-135375375
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10049][SPARKR][WIP] Support collecting ...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8458#issuecomment-135376437
  
  [Test build #41683 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41683/console)
 for   PR 8458 at commit 
[`02c64eb`](https://github.com/apache/spark/commit/02c64eb93b75d9ac0e2a12d8dd5a8c1ed5d143f2).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10049][SPARKR][WIP] Support collecting ...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8458#issuecomment-135376440
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41683/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10049][SPARKR][WIP] Support collecting ...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8458#issuecomment-135376439
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: REMOVE spark.akka.failure-detector.threshold

2015-08-27 Thread CodingCat

GitHub user CodingCat opened a pull request:

https://github.com/apache/spark/pull/8483

REMOVE spark.akka.failure-detector.threshold



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/CodingCat/spark SPARK_10315

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8483


commit 70c8f7be4b3d080aa29ae9b37e1e45c6a204bb9c
Author: CodingCat zhunans...@gmail.com
Date:   2015-08-27T11:01:50Z

REMOVE spark.akka.failure-detector.threshold




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10315] remove document on spark.akka.fa...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8483#issuecomment-135386761
  
  [Test build #41684 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41684/consoleFull)
 for   PR 8483 at commit 
[`70c8f7b`](https://github.com/apache/spark/commit/70c8f7be4b3d080aa29ae9b37e1e45c6a204bb9c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/8484

[SPARK-10065][SQL] Avoid triple copying of var-length objects in Array in 
tungsten projection

JIRA: https://issues.apache.org/jira/browse/SPARK-10065

Currently we do unnecessary copying of objects in the array. We should 
avoid them.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 avoid-triple-obj-copying

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8484.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8484


commit 9e118c228e57b5a78dd1c370f261cd40a42ec1d3
Author: Liang-Chi Hsieh vii...@appier.com
Date:   2015-08-27T11:06:29Z

Avoid triple copying of var-length objects in Array in tungsten projection.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10315] remove document on spark.akka.fa...

2015-08-27 Thread CodingCat

Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/8483#issuecomment-135388747
  
@srowen , mind taking a look ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135388439
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135389447
  
  [Test build #41685 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41685/consoleFull)
 for   PR 8484 at commit 
[`9e118c2`](https://github.com/apache/spark/commit/9e118c228e57b5a78dd1c370f261cd40a42ec1d3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10315] remove document on spark.akka.fa...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8483#issuecomment-135392719
  
  [Test build #41684 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41684/console)
 for   PR 8483 at commit 
[`70c8f7b`](https://github.com/apache/spark/commit/70c8f7be4b3d080aa29ae9b37e1e45c6a204bb9c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10315] remove document on spark.akka.fa...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8483#issuecomment-135392859
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9170][SQL] User-provided columns should...

2015-08-27 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/7520#issuecomment-135392980
  
One thing to note is that, case sensitivity of Spark SQL is configurable 
([see here] [1]). So I don't think we should make `StructType` completely case 
insensitive (yet case preserving).

If I understand this issue correctly, the root problem here is that, while 
writing schema information to physical ORC files, our current approach isn't 
case preserving.  As suggested by @chenghao-intel, when saving a DataFrame as 
Hive metastore tables using ORC, Spark SQL 1.5 now saves it in a Hive 
compatible approach, so that we can read the data back using Hive.  This 
implies that, changes made in this PR should also be compatible with Hive.  
After investigating Hive's behavior for a while, I got some interesting 
findings.

Snippets below were executed against Hive 1.2.1 (with a PostgreSQL 
metastore) and Spark SQL 1.5-SNAPSHOT ([revision 05c] [2]).  Firstly, let's 
prepare a Hive ORC table:

```
hive CREATE TABLE orc_test STORED AS ORC AS SELECT 1 AS CoL;
...
hive SELECT col FROM orc_test;
OK
1
Time taken: 0.056 seconds, Fetched: 1 row(s)
hive SELECT COL FROM orc_test;
OK
1
Time taken: 0.056 seconds, Fetched: 1 row(s)
hive DESC orc_test;
OK
col int
Time taken: 0.047 seconds, Fetched: 1 row(s)
```

So Hive is neither case sensitive nor case preserving.  We can further 
prove this by checking metastore table `COLUMN_V2`:

```
metastore_hive121 SELECT * FROM COLUMNS_V2
+-+---+---+-+---+
|   CD_ID |   COMMENT | COLUMN_NAME   | TYPE_NAME   |   INTEGER_IDX |
|-+---+---+-+---|
|  22 |null | col   | int | 0 |
+-+---+---+-+---+
```

(I cleared my local Hive warehouse, so the only column record here is the 
one created above.)

Now let's read the physical ORC files directly using Spark:

```
scala 
sqlContext.read.orc(hdfs://localhost:9000/user/hive/warehouse_hive121/orc_test).printSchema()
root
 |-- _col0: integer (nullable = true)

scala 
sqlContext.read.orc(hdfs://localhost:9000/user/hive/warehouse_hive121/orc_test).show()
+-+
|_col0|
+-+
|1|
+-+
```

Huh? Why it's `_col0` instead of `col`?  Let's inspect the physical ORC 
file written by Hive:

```
$ hive --orcfiledump /user/hive/warehouse_hive121/orc_test/00_0

Structure for /user/hive/warehouse_hive121/orc_test/00_0
File Version: 0.12 with HIVE_8732
15/08/27 19:07:15 INFO orc.ReaderImpl: Reading ORC rows from 
/user/hive/warehouse_hive121/orc_test/00_0 with {include: null, offset: 0, 
length: 9223372036854775807}
15/08/27 19:07:15 INFO orc.RecordReaderFactory: Schema is not specified on 
read. Using file schema.
Rows: 1
Compression: ZLIB
Compression size: 262144
Type: struct_col0:int  !!!
...
```

Surprise!  So, when writing ORC files, *Hive doesn't even preserve the 
column names*.

Conclusions:

1.  Making `StructType` completely case insensitive is unacceptable.
1.  Concrete column names written into ORC files by Spark SQL don't affect 
interoperability with Hive.
1.  It would be good for Spark SQL to be case preserving when writing ORC 
files.

And I think this is the task this PR should aim.

[1]: 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala#L247-L249
[2]: 
https://github.com/apache/spark/commit/bb1640529725c6c38103b95af004f8bd905c


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10315] remove document on spark.akka.fa...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8483#issuecomment-135392861
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41684/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135394838
  
  [Test build #41685 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41685/console)
 for   PR 8484 at commit 
[`9e118c2`](https://github.com/apache/spark/commit/9e118c228e57b5a78dd1c370f261cd40a42ec1d3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...

2015-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8484#issuecomment-135394871
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

401 - 500 of 579 matches

Mail list logo