date:20160820

[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-20 Thread Sherry302

Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @steveloughran Thank you very much for the comments. I have created an 
Hadoop jira [HADOOP-13527 ](https://issues.apache.org/jira/browse/HADOOP-13527) 
and attached the patch, could you please review it? I am unable to assign the 
jira to me, could you please add me as âcontributorâ role in Hadoop? Thanks 
again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14738: [MINOR][ML]Add expert param support to SharedParamsCodeG...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14738
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14738: [MINOR][ML]Add expert param support to SharedParamsCodeG...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14738
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64159/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14738: [MINOR][ML]Add expert param support to SharedParamsCodeG...

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14738
  
**[Test build #64159 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64159/consoleFull)**
 for PR 14738 at commit 
[`ba6d731`](https://github.com/apache/spark/commit/ba6d73116f92385fc4d0d9fed8aaf3aab7e5a6a4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14738: [MINOR][ML]Add expert param support to SharedParamsCodeG...

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14738
  
**[Test build #64159 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64159/consoleFull)**
 for PR 14738 at commit 
[`ba6d731`](https://github.com/apache/spark/commit/ba6d73116f92385fc4d0d9fed8aaf3aab7e5a6a4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14738: [MINOR][ML]Add expert param support to SharedPara...

2016-08-20 Thread hqzizania

GitHub user hqzizania opened a pull request:

https://github.com/apache/spark/pull/14738

[MINOR][ML]Add expert param support to SharedParamsCodeGen

## What changes were proposed in this pull request?

Add expert param support to SharedParamsCodeGen where aggregationDepth a 
expert param is added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hqzizania/spark SPARK-17090-minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14738.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14738


commit 6c2c514073a05d578e4ca1bb5c120506b58ce72d
Author: hqzizania 
Date:   2016-08-19T17:47:24Z

add aggregationDepth to SharedParamsCodeGen

commit cc37a89308ab7c4064f84b6248d0d6888ba9e64f
Author: hqzizania 
Date:   2016-08-21T03:19:21Z

Merge remote-tracking branch 'origin/master'




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14625: [SPARK-17045] [SQL] Build/move Join-related test ...

2016-08-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14625#discussion_r75588865
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ---
@@ -245,6 +245,10 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
 
 (1 to 100).map(i => (i, i.toString)).toDF("key", 
"value").createOrReplaceTempView("testdata")
 
+Seq((1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2))
--- End diff --

To be honest, it is hard to write test data, especially when we want very 
few rows in each data set.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14625: [SPARK-17045] [SQL] Build/move Join-related test ...

2016-08-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14625#discussion_r75588856
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ---
@@ -245,6 +245,10 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
 
 (1 to 100).map(i => (i, i.toString)).toDF("key", 
"value").createOrReplaceTempView("testdata")
 
+Seq((1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2))
--- End diff --

The major differences are the data. They have different data distribution. 
For example, testData` does not have duplicate key values, but `testData2` has 
fewer rows and duplicate key values. `src1` has null but `src` does not have 
it. Your concern is valid. We should change the name; otherwise, it is hard to 
understand the reasons.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

2016-08-20 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14712
  
Spark SQL already has its own metastore: `InMemoryCatalog`. And we do have 
an abstraction for metasotre: `ExternalCatalog`. We have 2 targets here:

1. add table statistics in Spark SQL
2. Spark SQL and Hive should recognize table statistics from each other.

I think target 1 is more important, and we do need an implementation that 
not depend on hive features.

> Actually, we desperately need spark sql to have its own metastore, 
because we need to persist statistics like histograms which AFAIK hive 
metastore doesn't support.

We store table statistics in table properties, why would hive metastore not 
support it? Do you mean Hive can't recognize it? But I think it's ok, we should 
not limit our table statistics by what Hive supports.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14625: [SPARK-17045] [SQL] Build/move Join-related test ...

2016-08-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14625#discussion_r75588814
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/join.sql ---
@@ -0,0 +1,225 @@
+-- join nested table expressions (auto_join0.q)
--- End diff --

: ) That is for helping reviewers know the origins of the queries. If you 
think we do not care, we can remove it.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14601: [SPARK-13979][Core] Killed executor is re spawned...

2016-08-20 Thread agsachin

Github user agsachin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14601#discussion_r75588799
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -107,6 +107,14 @@ class SparkHadoopUtil extends Logging {
 if (key.startsWith("spark.hadoop.")) {
   hadoopConf.set(key.substring("spark.hadoop.".length), value)
 }
+   // Copy any "fs.swift2d.foo=bar" properties into conf as 
"fs.swift2d.foo=bar"
--- End diff --

thats nice suggestion we add configs for azure also. I am not familiar with 
azure do you have a sample code that to understand and run and test 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14601: [SPARK-13979][Core] Killed executor is re spawned...

2016-08-20 Thread agsachin

Github user agsachin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14601#discussion_r75588790
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -102,11 +102,19 @@ class SparkHadoopUtil extends Logging {
 hadoopConf.set("fs.s3n.awsSecretAccessKey", accessKey)
 hadoopConf.set("fs.s3a.secret.key", accessKey)
   }
-  // Copy any "spark.hadoop.foo=bar" system properties into conf as 
"foo=bar"
   conf.getAll.foreach { case (key, value) =>
+// Copy any "spark.hadoop.foo=bar" system properties into conf as 
"foo=bar"
 if (key.startsWith("spark.hadoop.")) {
   hadoopConf.set(key.substring("spark.hadoop.".length), value)
 }
+   // Copy any "fs.swift2d.foo=bar" properties into conf as 
"fs.swift2d.foo=bar"
+else if (key.startsWith("fs.swift2d")){
+  hadoopConf.set(key, value)
--- End diff --

this is have added as I was using https://github.com/SparkTC/stocator. now 
I have  updated for `hadoop-openstack` also 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14737: [Spark-17171][WEB UI] DAG will list all partitions in th...

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14737
  
**[Test build #64158 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64158/consoleFull)**
 for PR 14737 at commit 
[`595453f`](https://github.com/apache/spark/commit/595453fbb2ccdd4009821724adefb829a13890c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14737: [Spark-17171][WEB UI] DAG will list all partition...

2016-08-20 Thread cenyuhai

GitHub user cenyuhai opened a pull request:

https://github.com/apache/spark/pull/14737

[Spark-17171][WEB UI] DAG will list all partitions in the graph

## What changes were proposed in this pull request?
DAG will list all partitions in the graph, it is too slow and hard to see 
all graph.
Always we don't want to see all partitionsï¼we just want to see the 
relations of DAG graph.
So I just show 2 root nodes for Rdds.

Before this PR, the DAG graph looks like 
[dag1.png](https://issues.apache.org/jira/secure/attachment/12824702/dag1.png), 
after this PR, the DAG graph looks like 
[dag2.png](https://issues.apache.org/jira/secure/attachment/12824703/dag2.png)






You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cenyuhai/spark SPARK-17171

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14737.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14737


commit 7991d7622260bc8e65ee9b934d376df2597c9a11
Author: cenyuhai 
Date:   2016-08-20T15:44:38Z

Just show 2 root partitions for a stage

commit 869eaaf23f79eefbc6a8ff7a7b9efbc4a9f8c6b7
Author: å²çæµ· <261810...@qq.com>
Date:   2016-08-21T03:55:04Z

Merge pull request #8 from apache/master

merge latest code to my fork

commit 595453fbb2ccdd4009821724adefb829a13890c7
Author: cenyuhai 
Date:   2016-08-21T04:06:06Z

Merge remote-tracking branch 'remotes/origin/master' into SPARK-17171




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

2016-08-20 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/14712
  
I suggest in the current stage, we still follow hive's convention. When 
spark sql has its own metastore, we can bridge between these two metastores by 
a mapping between two different sets of names/data structures, and then provide 
a config for users to declare their preference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14717: [SPARK-17090][ML]Make tree aggregation level in l...

2016-08-20 Thread hqzizania

Github user hqzizania commented on a diff in the pull request:

https://github.com/apache/spark/pull/14717#discussion_r75588359
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala ---
@@ -389,4 +389,21 @@ private[ml] trait HasSolver extends Params {
   /** @group getParam */
   final def getSolver: String = $(solver)
 }
+
+/**
+ * Trait for shared param aggregationDepth (default: 2).
+ */
+private[ml] trait HasAggregationDepth extends Params {
+
+  /**
+   * Param for suggested depth for treeAggregate (>= 2).
+   * @group param
--- End diff --

OK, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14717: [SPARK-17090][ML]Make tree aggregation level in l...

2016-08-20 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14717#discussion_r75588252
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala ---
@@ -389,4 +389,21 @@ private[ml] trait HasSolver extends Params {
   /** @group getParam */
   final def getSolver: String = $(solver)
 }
+
+/**
+ * Trait for shared param aggregationDepth (default: 2).
+ */
+private[ml] trait HasAggregationDepth extends Params {
+
+  /**
+   * Param for suggested depth for treeAggregate (>= 2).
+   * @group param
--- End diff --

This is very small. You can just submit a PR with minor in title without 
going through the JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14682: [SPARK-17104][SQL] LogicalRelation.newInstance should fo...

2016-08-20 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14682
  
@cloud-fan Thank you for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r75588230
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala
 ---
@@ -33,7 +34,7 @@ import 
org.apache.spark.sql.catalyst.catalog.{CatalogRelation, CatalogTable}
  * Right now, it only supports Hive tables and it only updates the size of 
a Hive table
  * in the Hive metastore.
  */
-case class AnalyzeTableCommand(tableName: String) extends RunnableCommand {
+case class AnalyzeTableCommand(tableName: String, noscan: Boolean = true) 
extends RunnableCommand {
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
--- End diff --

Not related to this PR, but looks like `AnalyzeTableCommand` doesn't handle 
the possible `NoSuchTableException` caused by 
`sessionState.catalog.lookupRelation`. It should be better to handle it and 
provide error message. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

2016-08-20 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14712
  
If it is a hive table, I think we should respect hive's statistics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-20 Thread sarutak

Github user sarutak commented on the issue:

https://github.com/apache/spark/pull/14719
  
@cloud-fan Of course. I'll write  a design doc soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14625: [SPARK-17045] [SQL] Build/move Join-related test ...

2016-08-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14625#discussion_r75588146
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ---
@@ -245,6 +245,10 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
 
 (1 to 100).map(i => (i, i.toString)).toDF("key", 
"value").createOrReplaceTempView("testdata")
 
+Seq((1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2))
--- End diff --

previously we have 3 pre-loaded tables: `testdata`, `arraydata`, `mapdata`, 
which are key-value table, array type table and map type table. For the new 
join tests, I think only `lowerCaseData`, `upperCaseData`, `srcpart` make 
sense, why can't we use `testdata` for `testData2`, `src` and `src2`? They are 
all key-value tables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14625: [SPARK-17045] [SQL] Build/move Join-related test ...

2016-08-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14625#discussion_r75588118
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/join.sql ---
@@ -0,0 +1,225 @@
+-- join nested table expressions (auto_join0.q)
--- End diff --

Do we need to reference to the hive `.q` file? I think hive golden file 
tests will be removed eventually.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14717: [SPARK-17090][ML]Make tree aggregation level in linear/l...

2016-08-20 Thread hqzizania

Github user hqzizania commented on the issue:

https://github.com/apache/spark/pull/14717
  
Thanks for the reviews :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-20 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14719
  
It's really a hard problem and we have discussed it many times but can't 
reach a consensus.

Do you mind sending a design doc first so that it's easy for other people 
to review and discuss? thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14717: [SPARK-17090][ML]Make tree aggregation level in l...

2016-08-20 Thread hqzizania

Github user hqzizania commented on a diff in the pull request:

https://github.com/apache/spark/pull/14717#discussion_r75588057
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala ---
@@ -389,4 +389,21 @@ private[ml] trait HasSolver extends Params {
   /** @group getParam */
   final def getSolver: String = $(solver)
 }
+
+/**
+ * Trait for shared param aggregationDepth (default: 2).
+ */
+private[ml] trait HasAggregationDepth extends Params {
+
+  /**
+   * Param for suggested depth for treeAggregate (>= 2).
+   * @group param
--- End diff --

Could it be done in the task 
(https://issues.apache.org/jira/browse/SPARK-17169) ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14717: [SPARK-17090][ML]Make tree aggregation level in l...

2016-08-20 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/14717#discussion_r75587929
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala ---
@@ -389,4 +389,21 @@ private[ml] trait HasSolver extends Params {
   /** @group getParam */
   final def getSolver: String = $(solver)
 }
+
+/**
+ * Trait for shared param aggregationDepth (default: 2).
+ */
+private[ml] trait HasAggregationDepth extends Params {
+
+  /**
+   * Param for suggested depth for treeAggregate (>= 2).
+   * @group param
--- End diff --

these should be `@group expertParam` and `@group getExpertParam` shouldn't 
they? Not a big deal, but we may want to fix this before it's forgotten. We'd 
need to modify the codegen file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14717: [SPARK-17090][ML]Make tree aggregation level in l...

2016-08-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14717


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14717: [SPARK-17090][ML]Make tree aggregation level in linear/l...

2016-08-20 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/14717
  
LGTM. Merge into master. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14717: [SPARK-17090][ML]Make tree aggregation level in l...

2016-08-20 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14717#discussion_r75587723
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -256,6 +256,17 @@ class LogisticRegression @Since("1.2.0") (
   @Since("1.5.0")
   override def getThresholds: Array[Double] = super.getThresholds
 
+  /**
+   * Suggested depth for treeAggregate (>= 2).
+   * If the dimensions of features or the number of partitions are large,
+   * this param could be adjusted to a larger size.
--- End diff --

larger value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14717: [SPARK-17090][ML]Make tree aggregation level in l...

2016-08-20 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14717#discussion_r75587709
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -48,7 +48,7 @@ import org.apache.spark.storage.StorageLevel
  */
 private[classification] trait LogisticRegressionParams extends 
ProbabilisticClassifierParams
   with HasRegParam with HasElasticNetParam with HasMaxIter with 
HasFitIntercept with HasTol
-  with HasStandardization with HasWeightCol with HasThreshold {
+  with HasStandardization with HasWeightCol with HasThreshold with 
HasAggregationDepth{
--- End diff --

space before `{`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14723: [SQL][WIP][Test] Supports object-based aggregation funct...

2016-08-20 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/14723
  
Can you create a jira?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14723: [SQL][WIP][Test] Supports object-based aggregatio...

2016-08-20 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14723#discussion_r75586776
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/AggregateWithObjectAggregateBufferSuite.scala
 ---
@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import 
org.apache.spark.sql.AggregateWithObjectAggregateBufferSuite.MaxWithObjectAggregateBuffer
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
Expression, GenericMutableRow, MutableRow, UnsafeRow}
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.{ImperativeAggregate, 
WithObjectAggregateBuffer}
+import org.apache.spark.sql.execution.aggregate.{SortAggregateExec}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types.{AbstractDataType, DataType, 
IntegerType, StructType}
+
+class AggregateWithObjectAggregateBufferSuite extends QueryTest with 
SharedSQLContext {
+
+  import testImplicits._
+
+  private val data = Seq((1, 0), (3, 1), (2, 0), (6, 3), (3, 1), (4, 1), 
(5, 0))
+
+
+  test("aggregate with object aggregate buffer, should not use 
HashAggregate") {
+val df = data.toDF("a", "b")
+val max = new MaxWithObjectAggregateBuffer($"a".expr)
+
+// Always use SortAggregateExec instead of HashAggregateExec for 
planning even if the aggregate
+//  buffer attributes are mutable fields (every field can be mutated 
inline like int, long...)
+val allFieldsMutable = 
max.aggBufferSchema.map(_.dataType).forall(UnsafeRow.isMutable)
+val sparkPlan = 
df.select(Column(max.toAggregateExpression())).queryExecution.sparkPlan
+assert(allFieldsMutable == true && 
sparkPlan.isInstanceOf[SortAggregateExec])
+  }
+
+  test("aggregate with object aggregate buffer, no group by") {
+val df = data.toDF("a", "b").coalesce(2)
+checkAnswer(
+  df.select(objectAggregateMax($"a"), count($"a"), 
objectAggregateMax($"b"), count($"b")),
+  Seq(Row(6, 7, 3, 7))
+)
+  }
+
+  test("aggregate with object aggregate buffer, with group by") {
+val df = data.toDF("a", "b").coalesce(2)
+checkAnswer(
+  df.groupBy($"b").agg(objectAggregateMax($"a"), count($"a"), 
objectAggregateMax($"a")),
+  Seq(
+Row(0, 5, 3, 5),
+Row(1, 4, 3, 4),
+Row(3, 6, 1, 6)
+  )
+)
+  }
+
+  test("aggregate with object aggregate buffer, empty inputs, no group 
by") {
+val empty = Seq.empty[(Int, Int)].toDF("a", "b")
+checkAnswer(
+  empty.select(objectAggregateMax($"a"), count($"a"), 
objectAggregateMax($"b"), count($"b")),
+  Seq(Row(Int.MinValue, 0, Int.MinValue, 0)))
+  }
+
+  test("aggregate with object aggregate buffer, empty inputs, with group 
by") {
+val empty = Seq.empty[(Int, Int)].toDF("a", "b")
+checkAnswer(
+  empty.groupBy($"b").agg(objectAggregateMax($"a"), count($"a"), 
objectAggregateMax($"a")),
+  Seq.empty[Row])
+  }
+
+  private def objectAggregateMax(column: Column): Column = {
+val max = MaxWithObjectAggregateBuffer(column.expr)
+Column(max.toAggregateExpression())
+  }
+}
+
+object AggregateWithObjectAggregateBufferSuite {
--- End diff --

(we do not need to put the example class inside this object.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14723: [SQL][WIP][Test] Supports object-based aggregatio...

2016-08-20 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14723#discussion_r75586764
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,89 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * This traits allow user to define an AggregateFunction which can store 
**arbitrary** Java objects
+ * in Aggregation buffer during aggregation of each key group. This trait 
must be mixed with
+ * class ImperativeAggregate.
+ *
+ * Here is how it works in a typical aggregation flow (Partial mode 
aggregate at Mapper side, and
+ * Final mode aggregate at Reducer side).
+ *
+ * Stage 1: Partial aggregate at Mapper side:
+ *
+ *  1. Upon calling method `initialize(aggBuffer: MutableRow)`, user 
stores an arbitrary empty
+ *object, object A for example, in aggBuffer. The object A will be 
used to store the
+ *accumulated aggregation result.
+ *  1. Upon calling method `update(mutableAggBuffer: MutableRow, inputRow: 
InternalRow)` in
+ *current group (group by key), user extracts object A from 
mutableAggBuffer, and then updates
+ *object A with current inputRow. After updating, object A is stored 
back to mutableAggBuffer.
+ *  1. After processing all rows of current group, the framework will call 
method
+ *`serializeObjectAggregationBufferInPlace(aggregationBuffer: 
MutableRow)` to serialize object A
+ *to a serializable format in place.
+ *  1. The framework may spill the aggregationBuffer to disk if there is 
not enough memory.
+ *It is safe since we have already convert aggregationBuffer to 
serializable format.
+ *  1. Spark framework moves on to next group, until all groups have been
+ *processed.
+ *
+ * Shuffling exchange data to Reducer tasks...
+ *
+ * Stage 2: Final mode aggregate at Reducer side:
+ *
+ *  1. Upon calling method `initialize(aggBuffer: MutableRow)`, user 
stores a new empty object A1
+ *in aggBuffer. The object A1 will be used to store the accumulated 
aggregation result.
+ *  1. Upon calling method `merge(mutableAggBuffer: MutableRow, 
inputAggBuffer: InternalRow)`, user
+ *extracts object A1 from mutableAggBuffer, and extracts object A2 
from inputAggBuffer. then
+ *user needs to merge A1, and A2, and stores the merged result back to 
mutableAggBuffer.
+ *  1. After processing all inputAggBuffer of current group (group by 
key), the Spark framework will
+ *call method 
`serializeObjectAggregationBufferInPlace(aggregationBuffer: MutableRow)` to
+ *serialize object A1 to a serializable format in place.
+ *  1. The Spark framework may spill the aggregationBuffer to disk if 
there is not enough memory.
+ *It is safe since we have already convert aggregationBuffer to 
serializable format.
+ *  1. Spark framework moves on to next group, until all groups have been 
processed.
+ */
+trait WithObjectAggregateBuffer {
+  this: ImperativeAggregate =>
+
+  /**
+   * Serializes and in-place replaces the object stored in Aggregation 
buffer. The framework
+   * calls this method every time after finishing updating/merging one 
group (group by key).
+   *
+   * aggregationBuffer before serialization:
+   *
+   * The object stored in aggregationBuffer can be **arbitrary** Java 
objects defined by user.
--- End diff --

Seems we want to mention that the data type is `ObjectType`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14723: [SQL][WIP][Test] Supports object-based aggregatio...

2016-08-20 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14723#discussion_r75586760
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,89 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * This traits allow user to define an AggregateFunction which can store 
**arbitrary** Java objects
+ * in Aggregation buffer during aggregation of each key group. This trait 
must be mixed with
+ * class ImperativeAggregate.
+ *
+ * Here is how it works in a typical aggregation flow (Partial mode 
aggregate at Mapper side, and
+ * Final mode aggregate at Reducer side).
+ *
+ * Stage 1: Partial aggregate at Mapper side:
+ *
+ *  1. Upon calling method `initialize(aggBuffer: MutableRow)`, user 
stores an arbitrary empty
+ *object, object A for example, in aggBuffer. The object A will be 
used to store the
+ *accumulated aggregation result.
+ *  1. Upon calling method `update(mutableAggBuffer: MutableRow, inputRow: 
InternalRow)` in
+ *current group (group by key), user extracts object A from 
mutableAggBuffer, and then updates
+ *object A with current inputRow. After updating, object A is stored 
back to mutableAggBuffer.
+ *  1. After processing all rows of current group, the framework will call 
method
+ *`serializeObjectAggregationBufferInPlace(aggregationBuffer: 
MutableRow)` to serialize object A
+ *to a serializable format in place.
+ *  1. The framework may spill the aggregationBuffer to disk if there is 
not enough memory.
+ *It is safe since we have already convert aggregationBuffer to 
serializable format.
+ *  1. Spark framework moves on to next group, until all groups have been
+ *processed.
+ *
+ * Shuffling exchange data to Reducer tasks...
+ *
+ * Stage 2: Final mode aggregate at Reducer side:
+ *
+ *  1. Upon calling method `initialize(aggBuffer: MutableRow)`, user 
stores a new empty object A1
+ *in aggBuffer. The object A1 will be used to store the accumulated 
aggregation result.
+ *  1. Upon calling method `merge(mutableAggBuffer: MutableRow, 
inputAggBuffer: InternalRow)`, user
+ *extracts object A1 from mutableAggBuffer, and extracts object A2 
from inputAggBuffer. then
+ *user needs to merge A1, and A2, and stores the merged result back to 
mutableAggBuffer.
+ *  1. After processing all inputAggBuffer of current group (group by 
key), the Spark framework will
+ *call method 
`serializeObjectAggregationBufferInPlace(aggregationBuffer: MutableRow)` to
+ *serialize object A1 to a serializable format in place.
+ *  1. The Spark framework may spill the aggregationBuffer to disk if 
there is not enough memory.
+ *It is safe since we have already convert aggregationBuffer to 
serializable format.
+ *  1. Spark framework moves on to next group, until all groups have been 
processed.
+ */
+trait WithObjectAggregateBuffer {
+  this: ImperativeAggregate =>
+
+  /**
+   * Serializes and in-place replaces the object stored in Aggregation 
buffer. The framework
+   * calls this method every time after finishing updating/merging one 
group (group by key).
+   *
+   * aggregationBuffer before serialization:
+   *
+   * The object stored in aggregationBuffer can be **arbitrary** Java 
objects defined by user.
+   *
+   * aggregationBuffer after serialization:
+   *
+   * The object's type must be one of:
+   *
+   *  - Null
+   *  - Boolean
+   *  - Byte
+   *  - Short
+   *  - Int
+   *  - Long
+   *  - Float
+   *  - Double
+   *  - Array[Byte]
+   *  - org.apache.spark.sql.types.Decimal
+   *  - org.apache.spark.unsafe.types.UTF8String
+   *  - org.apache.spark.unsafe.types.CalendarInterval
+   *  - org.apache.spark.sql.catalyst.util.MapData
+   *  - org.apache.spark.sql.catalyst.util.ArrayData
+   *  - org.apache.spark.sql.catalyst.InternalRow
+   *
+   * Code example:
+   *
+   * {{{
+   *   override def 
serializeObjectAggregationBufferInPlace(aggregationBuffer: MutableRow): Unit = {
+   * val obj = buffer.get(mutableAggBufferOffset, 
ObjectType(classOf[A])).asInstanceOf[A]
+   * // Convert the obj to bytes, which is a serializable format.
+   * buffer(mutableAggBufferOffset) = toBytes(obj)
--- End diff --

I am not sure it is the best example. At here, we are showing that the 
value of a field can be an java object or an byte array. 

I guess a more general question for this method will be if this approach 
work for all "supported" serialized types (e.g. the serialized

[GitHub] spark pull request #14723: [SQL][WIP][Test] Supports object-based aggregatio...

2016-08-20 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14723#discussion_r75586661
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,89 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * This traits allow user to define an AggregateFunction which can store 
**arbitrary** Java objects
+ * in Aggregation buffer during aggregation of each key group. This trait 
must be mixed with
+ * class ImperativeAggregate.
+ *
+ * Here is how it works in a typical aggregation flow (Partial mode 
aggregate at Mapper side, and
+ * Final mode aggregate at Reducer side).
+ *
+ * Stage 1: Partial aggregate at Mapper side:
+ *
+ *  1. Upon calling method `initialize(aggBuffer: MutableRow)`, user 
stores an arbitrary empty
+ *object, object A for example, in aggBuffer. The object A will be 
used to store the
+ *accumulated aggregation result.
+ *  1. Upon calling method `update(mutableAggBuffer: MutableRow, inputRow: 
InternalRow)` in
+ *current group (group by key), user extracts object A from 
mutableAggBuffer, and then updates
+ *object A with current inputRow. After updating, object A is stored 
back to mutableAggBuffer.
+ *  1. After processing all rows of current group, the framework will call 
method
+ *`serializeObjectAggregationBufferInPlace(aggregationBuffer: 
MutableRow)` to serialize object A
+ *to a serializable format in place.
+ *  1. The framework may spill the aggregationBuffer to disk if there is 
not enough memory.
+ *It is safe since we have already convert aggregationBuffer to 
serializable format.
+ *  1. Spark framework moves on to next group, until all groups have been
+ *processed.
+ *
+ * Shuffling exchange data to Reducer tasks...
+ *
+ * Stage 2: Final mode aggregate at Reducer side:
+ *
+ *  1. Upon calling method `initialize(aggBuffer: MutableRow)`, user 
stores a new empty object A1
+ *in aggBuffer. The object A1 will be used to store the accumulated 
aggregation result.
+ *  1. Upon calling method `merge(mutableAggBuffer: MutableRow, 
inputAggBuffer: InternalRow)`, user
+ *extracts object A1 from mutableAggBuffer, and extracts object A2 
from inputAggBuffer. then
+ *user needs to merge A1, and A2, and stores the merged result back to 
mutableAggBuffer.
+ *  1. After processing all inputAggBuffer of current group (group by 
key), the Spark framework will
+ *call method 
`serializeObjectAggregationBufferInPlace(aggregationBuffer: MutableRow)` to
+ *serialize object A1 to a serializable format in place.
+ *  1. The Spark framework may spill the aggregationBuffer to disk if 
there is not enough memory.
+ *It is safe since we have already convert aggregationBuffer to 
serializable format.
+ *  1. Spark framework moves on to next group, until all groups have been 
processed.
+ */
+trait WithObjectAggregateBuffer {
+  this: ImperativeAggregate =>
+
+  /**
+   * Serializes and in-place replaces the object stored in Aggregation 
buffer. The framework
+   * calls this method every time after finishing updating/merging one 
group (group by key).
+   *
+   * aggregationBuffer before serialization:
+   *
+   * The object stored in aggregationBuffer can be **arbitrary** Java 
objects defined by user.
+   *
+   * aggregationBuffer after serialization:
+   *
+   * The object's type must be one of:
--- End diff --

How about we rephrase this part? We mentioned that we can use `arbitrary` 
java objects. But, here we are saying that `The object's type must be one of:`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14723: [SQL][WIP][Test] Supports object-based aggregatio...

2016-08-20 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14723#discussion_r75586622
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,89 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * This traits allow user to define an AggregateFunction which can store 
**arbitrary** Java objects
+ * in Aggregation buffer during aggregation of each key group. This trait 
must be mixed with
+ * class ImperativeAggregate.
--- End diff --

I think at here, we need to emphasize that the buffer is an internal buffer 
because we will emit this buffer as the result of an aggregate operator.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14674: [SPARK-17002][CORE]: Document that spark.ssl.protocol. i...

2016-08-20 Thread wangmiao1981

Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/14674
  
@srowen Do you have any suggestions on our discussion above? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14723: [SQL][WIP][Test] Supports object-based aggregatio...

2016-08-20 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14723#discussion_r75586350
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,89 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * This traits allow user to define an AggregateFunction which can store 
**arbitrary** Java objects
+ * in Aggregation buffer during aggregation of each key group. This trait 
must be mixed with
+ * class ImperativeAggregate.
+ *
+ * Here is how it works in a typical aggregation flow (Partial mode 
aggregate at Mapper side, and
+ * Final mode aggregate at Reducer side).
+ *
+ * Stage 1: Partial aggregate at Mapper side:
+ *
+ *  1. Upon calling method `initialize(aggBuffer: MutableRow)`, user 
stores an arbitrary empty
+ *object, object A for example, in aggBuffer. The object A will be 
used to store the
+ *accumulated aggregation result.
+ *  1. Upon calling method `update(mutableAggBuffer: MutableRow, inputRow: 
InternalRow)` in
+ *current group (group by key), user extracts object A from 
mutableAggBuffer, and then updates
+ *object A with current inputRow. After updating, object A is stored 
back to mutableAggBuffer.
+ *  1. After processing all rows of current group, the framework will call 
method
+ *`serializeObjectAggregationBufferInPlace(aggregationBuffer: 
MutableRow)` to serialize object A
+ *to a serializable format in place.
+ *  1. The framework may spill the aggregationBuffer to disk if there is 
not enough memory.
+ *It is safe since we have already convert aggregationBuffer to 
serializable format.
+ *  1. Spark framework moves on to next group, until all groups have been
+ *processed.
+ *
+ * Shuffling exchange data to Reducer tasks...
+ *
+ * Stage 2: Final mode aggregate at Reducer side:
+ *
+ *  1. Upon calling method `initialize(aggBuffer: MutableRow)`, user 
stores a new empty object A1
+ *in aggBuffer. The object A1 will be used to store the accumulated 
aggregation result.
+ *  1. Upon calling method `merge(mutableAggBuffer: MutableRow, 
inputAggBuffer: InternalRow)`, user
+ *extracts object A1 from mutableAggBuffer, and extracts object A2 
from inputAggBuffer. then
+ *user needs to merge A1, and A2, and stores the merged result back to 
mutableAggBuffer.
+ *  1. After processing all inputAggBuffer of current group (group by 
key), the Spark framework will
+ *call method 
`serializeObjectAggregationBufferInPlace(aggregationBuffer: MutableRow)` to
+ *serialize object A1 to a serializable format in place.
+ *  1. The Spark framework may spill the aggregationBuffer to disk if 
there is not enough memory.
+ *It is safe since we have already convert aggregationBuffer to 
serializable format.
+ *  1. Spark framework moves on to next group, until all groups have been 
processed.
+ */
+trait WithObjectAggregateBuffer {
+  this: ImperativeAggregate =>
--- End diff --

oh, seems this trait will be still an java `interface`. But, I think in 
general, we do not really need to have this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14723: [SQL][WIP][Test] Supports object-based aggregatio...

2016-08-20 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14723#discussion_r75586238
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,89 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * This traits allow user to define an AggregateFunction which can store 
**arbitrary** Java objects
--- End diff --

I think it is better to remove `allow users` because it is not exposed to 
end-users for defining UDAFs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14723: [SQL][WIP][Test] Supports object-based aggregatio...

2016-08-20 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14723#discussion_r75586232
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,89 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * This traits allow user to define an AggregateFunction which can store 
**arbitrary** Java objects
+ * in Aggregation buffer during aggregation of each key group. This trait 
must be mixed with
+ * class ImperativeAggregate.
+ *
+ * Here is how it works in a typical aggregation flow (Partial mode 
aggregate at Mapper side, and
+ * Final mode aggregate at Reducer side).
+ *
+ * Stage 1: Partial aggregate at Mapper side:
+ *
+ *  1. Upon calling method `initialize(aggBuffer: MutableRow)`, user 
stores an arbitrary empty
+ *object, object A for example, in aggBuffer. The object A will be 
used to store the
+ *accumulated aggregation result.
+ *  1. Upon calling method `update(mutableAggBuffer: MutableRow, inputRow: 
InternalRow)` in
+ *current group (group by key), user extracts object A from 
mutableAggBuffer, and then updates
+ *object A with current inputRow. After updating, object A is stored 
back to mutableAggBuffer.
+ *  1. After processing all rows of current group, the framework will call 
method
+ *`serializeObjectAggregationBufferInPlace(aggregationBuffer: 
MutableRow)` to serialize object A
+ *to a serializable format in place.
+ *  1. The framework may spill the aggregationBuffer to disk if there is 
not enough memory.
+ *It is safe since we have already convert aggregationBuffer to 
serializable format.
+ *  1. Spark framework moves on to next group, until all groups have been
+ *processed.
+ *
+ * Shuffling exchange data to Reducer tasks...
+ *
+ * Stage 2: Final mode aggregate at Reducer side:
+ *
+ *  1. Upon calling method `initialize(aggBuffer: MutableRow)`, user 
stores a new empty object A1
+ *in aggBuffer. The object A1 will be used to store the accumulated 
aggregation result.
+ *  1. Upon calling method `merge(mutableAggBuffer: MutableRow, 
inputAggBuffer: InternalRow)`, user
+ *extracts object A1 from mutableAggBuffer, and extracts object A2 
from inputAggBuffer. then
+ *user needs to merge A1, and A2, and stores the merged result back to 
mutableAggBuffer.
+ *  1. After processing all inputAggBuffer of current group (group by 
key), the Spark framework will
+ *call method 
`serializeObjectAggregationBufferInPlace(aggregationBuffer: MutableRow)` to
+ *serialize object A1 to a serializable format in place.
+ *  1. The Spark framework may spill the aggregationBuffer to disk if 
there is not enough memory.
+ *It is safe since we have already convert aggregationBuffer to 
serializable format.
+ *  1. Spark framework moves on to next group, until all groups have been 
processed.
+ */
+trait WithObjectAggregateBuffer {
+  this: ImperativeAggregate =>
--- End diff --

I guess having this line will make this trait hard to be used in Java.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14723: [SQL][WIP][Test] Supports object-based aggregatio...

2016-08-20 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14723#discussion_r75586233
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,89 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * This traits allow user to define an AggregateFunction which can store 
**arbitrary** Java objects
--- End diff --

`This trait allows an AggregateFunction to use ...`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14723: [SQL][WIP][Test] Supports object-based aggregatio...

2016-08-20 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14723#discussion_r75586183
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -389,3 +389,89 @@ abstract class DeclarativeAggregate
 def right: AttributeReference = 
inputAggBufferAttributes(aggBufferAttributes.indexOf(a))
   }
 }
+
+/**
+ * This traits allow user to define an AggregateFunction which can store 
**arbitrary** Java objects
+ * in Aggregation buffer during aggregation of each key group. This trait 
must be mixed with
+ * class ImperativeAggregate.
+ *
+ * Here is how it works in a typical aggregation flow (Partial mode 
aggregate at Mapper side, and
+ * Final mode aggregate at Reducer side).
+ *
+ * Stage 1: Partial aggregate at Mapper side:
+ *
+ *  1. Upon calling method `initialize(aggBuffer: MutableRow)`, user 
stores an arbitrary empty
+ *object, object A for example, in aggBuffer. The object A will be 
used to store the
+ *accumulated aggregation result.
+ *  1. Upon calling method `update(mutableAggBuffer: MutableRow, inputRow: 
InternalRow)` in
+ *current group (group by key), user extracts object A from 
mutableAggBuffer, and then updates
+ *object A with current inputRow. After updating, object A is stored 
back to mutableAggBuffer.
+ *  1. After processing all rows of current group, the framework will call 
method
+ *`serializeObjectAggregationBufferInPlace(aggregationBuffer: 
MutableRow)` to serialize object A
+ *to a serializable format in place.
+ *  1. The framework may spill the aggregationBuffer to disk if there is 
not enough memory.
+ *It is safe since we have already convert aggregationBuffer to 
serializable format.
+ *  1. Spark framework moves on to next group, until all groups have been
+ *processed.
+ *
+ * Shuffling exchange data to Reducer tasks...
+ *
+ * Stage 2: Final mode aggregate at Reducer side:
+ *
+ *  1. Upon calling method `initialize(aggBuffer: MutableRow)`, user 
stores a new empty object A1
+ *in aggBuffer. The object A1 will be used to store the accumulated 
aggregation result.
+ *  1. Upon calling method `merge(mutableAggBuffer: MutableRow, 
inputAggBuffer: InternalRow)`, user
+ *extracts object A1 from mutableAggBuffer, and extracts object A2 
from inputAggBuffer. then
+ *user needs to merge A1, and A2, and stores the merged result back to 
mutableAggBuffer.
+ *  1. After processing all inputAggBuffer of current group (group by 
key), the Spark framework will
+ *call method 
`serializeObjectAggregationBufferInPlace(aggregationBuffer: MutableRow)` to
+ *serialize object A1 to a serializable format in place.
+ *  1. The Spark framework may spill the aggregationBuffer to disk if 
there is not enough memory.
+ *It is safe since we have already convert aggregationBuffer to 
serializable format.
+ *  1. Spark framework moves on to next group, until all groups have been 
processed.
+ */
+trait WithObjectAggregateBuffer {
+  this: ImperativeAggregate =>
--- End diff --

Semes we do not really need this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-20 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/14705
  
Thanks.

Reviewing each change, I think we need this PR (14705) and PR #14734 in 
2.0.1 - so maybe only a few lines of conflicts.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-20 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14705
  
Yeah so we can do a couple of things. One is we try to cherry-pick this PR 
to branch-2.0 and then fix all the merge conflicts that are thrown. I think 
that should handle cases where the method doesn't exist in 2.0 etc. 

The other option is to create a new PR that is targeted at branch-2.0 (i.e. 
the cherry-pick / merge can be done as a part of development) and then we can 
review, merge it. 

Let me know if you or @junyangq want to try the second option -- If not I 
can try the first one and see how many conflicts there are.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14731
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14731
  
**[Test build #64156 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64156/consoleFull)**
 for PR 14731 at commit 
[`b08e3c9`](https://github.com/apache/spark/commit/b08e3c9937a63a08b274a1491ea7064168646f1d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14731
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64156/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14735: [SPARK-17173][SPARKR] R MLlib refactor, cleanup, reforma...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14735
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14735: [SPARK-17173][SPARKR] R MLlib refactor, cleanup, reforma...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14735
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64157/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14735: [SPARK-17173][SPARKR] R MLlib refactor, cleanup, reforma...

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14735
  
**[Test build #64157 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64157/consoleFull)**
 for PR 14735 at commit 
[`30815e0`](https://github.com/apache/spark/commit/30815e067a37175e0f5d4539c80db6b0ec6cc159).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-20 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/14705
  
I think a subset of this should go to 2.0.1 as well (as requirement to fix 
warning for CRAN in 2.0.x), but it's a non-trivial port: mllib isoreg are new 
in 2.1.0 only.

What's the best way to proceed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14735: [SPARK-17173][SPARKR] R MLlib refactor, cleanup, reforma...

2016-08-20 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/14735
  
This also tighten the signature for mllib by removing the previously unused 
`...`:
```
"summary", signature(object = "GeneralizedLinearRegressionModel")
print.summary.GeneralizedLinearRegressionModel
"summary", signature(object = "NaiveBayesModel")
"summary", signature(object = "IsotonicRegressionModel")
"fitted", signature(object = "KMeansModel")
"summary", signature(object = "KMeansModel")
"spark.naiveBayes", signature(data = "SparkDataFrame", formula = "formula"
"summary", signature(object = "GaussianMixtureModel")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14735: [MINOR][SPARKR] R MLlib refactor, cleanup, reformat, fix...

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14735
  
**[Test build #64157 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64157/consoleFull)**
 for PR 14735 at commit 
[`30815e0`](https://github.com/apache/spark/commit/30815e067a37175e0f5d4539c80db6b0ec6cc159).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14719
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64155/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14719
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14719
  
**[Test build #64155 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64155/consoleFull)**
 for PR 14719 at commit 
[`9ddc9d8`](https://github.com/apache/spark/commit/9ddc9d858fc3d5b269a8a762b356a545f70646d6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13428: [SPARK-12666][CORE] SparkSubmit packages fix for when 'd...

2016-08-20 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/13428
  
Merged to master and branch-2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13428: [SPARK-12666][CORE] SparkSubmit packages fix for ...

2016-08-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13428


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14731
  
**[Test build #64156 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64156/consoleFull)**
 for PR 14731 at commit 
[`b08e3c9`](https://github.com/apache/spark/commit/b08e3c9937a63a08b274a1491ea7064168646f1d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14601: [SPARK-13979][Core] Killed executor is re spawned...

2016-08-20 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/14601#discussion_r75584298
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -107,6 +107,14 @@ class SparkHadoopUtil extends Logging {
 if (key.startsWith("spark.hadoop.")) {
   hadoopConf.set(key.substring("spark.hadoop.".length), value)
 }
+   // Copy any "fs.swift2d.foo=bar" properties into conf as 
"fs.swift2d.foo=bar"
--- End diff --

may want to add `fs.wasb` for azure on Hadoop 2.7+


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12695: [SPARK-14914] Normalize Paths/URIs for windows.

2016-08-20 Thread steveloughran

Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/12695
  
 As #13868 does adopt `org.apache.hadoop.io.Path`, I don't see this patch 
being needed âthough it may highlight some places where the new code may need 
applying


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12695: [SPARK-14914] Normalize Paths/URIs for windows.

2016-08-20 Thread steveloughran

Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/12695
  
If you are working with windows paths; Hadoop's Path class contains the 
code to do this, stabilised and addressing the corner cases


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14601: [SPARK-13979][Core] Killed executor is re spawned...

2016-08-20 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/14601#discussion_r75584303
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -102,11 +102,19 @@ class SparkHadoopUtil extends Logging {
 hadoopConf.set("fs.s3n.awsSecretAccessKey", accessKey)
 hadoopConf.set("fs.s3a.secret.key", accessKey)
   }
-  // Copy any "spark.hadoop.foo=bar" system properties into conf as 
"foo=bar"
   conf.getAll.foreach { case (key, value) =>
+// Copy any "spark.hadoop.foo=bar" system properties into conf as 
"foo=bar"
 if (key.startsWith("spark.hadoop.")) {
   hadoopConf.set(key.substring("spark.hadoop.".length), value)
 }
+   // Copy any "fs.swift2d.foo=bar" properties into conf as 
"fs.swift2d.foo=bar"
+else if (key.startsWith("fs.swift2d")){
+  hadoopConf.set(key, value)
--- End diff --

What's `swift2d`? It's not the swift client in `hadoop-openstack`, which is 
`fs.swift`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14718: [SPARK-16711] YarnShuffleService doesn't re-init properl...

2016-08-20 Thread steveloughran

Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/14718
  
Moving the jackson/leveldb dependencies isn't going to create problems on 
the yarn shuffle CP are they? Given the versions aren't changing, I'm not too 
worried âI just want to make sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14736: [SPARK-17024][SQL] Weird behaviour of the DataFrame when...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14736
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-20 Thread steveloughran

Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/14659
  
That Caller context doesn't list Spark as one of the users in its 
LimitedPrivate scope. Add a Hadoop patch there and I'll get it in. This avoids 
arguments later when someone breaks the API, and is especially important when 
using reflection, as it's harder to detect when the Class is being used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14736: [SPARK-17024][SQL] Weird behaviour of the DataFra...

2016-08-20 Thread izeigerman

GitHub user izeigerman opened a pull request:

https://github.com/apache/spark/pull/14736

[SPARK-17024][SQL] Weird behaviour of the DataFrame when a column name 
contains dots.

## What changes were proposed in this pull request?

The Spark SQL doesnât support field names that contains dots. Itâs not 
about queries like `select` but about any manipulations with the dataset.
Here is a dataset example:
```
field1,field1.some,field2,field3.some
"field1","field1.some","field2","field3.some"
```

And a code snippet:
```
scala> spark.sqlContext.read.format("csv").option("header", 
"true").option("inferSchema", "true").load(â/tmp/test.csv").collect
```

The result of this operation:
```
org.apache.spark.sql.AnalysisException: Can't extract value from field1#0;
  at 
org.apache.spark.sql.catalyst.expressions.ExtractValue$.apply(complexTypeExtractors.scala:73)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$3.apply(LogicalPlan.scala:253)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$3.apply(LogicalPlan.scala:252)
  at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
  at scala.collection.immutable.List.foldLeft(List.scala:84)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:252)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:168)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:130)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:129)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
â¦.
```

The following code fails with the same error:
```
scala> val df = spark.sqlContext.read.format("csv").option("header", 
"true").option("inferSchema", "true").load("/tmp/test.csv")
df: org.apache.spark.sql.DataFrame = [field1: string, field1.some: string 
... 2 more fields]

scala> df.select("field1", "`field1.some`", "field2", 
"`field3.some`").collect
```

This patch makes `LogicalPlan` treat a dot-separated string as an 
attribute's name in case when nested fields resolution fails.


## How was this patch tested?

Tested with a mentioned CSV file in the `CSVSuite` (not committed). I'm not 
sure where exactly I should put a test for this. `LogicalPlanSuite` doesn't 
look like appropriate place for this.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/izeigerman/spark iaroslav/spark-17024

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14736.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14736


commit 6059dfce21c071f4022ab6a17316a85748f0729e
Author: Iaroslav Zeigerman 
Date:   2016-08-20T19:18:24Z

fix attribute resolution for the Logical Plan in case when attributes 
contain dots in their names.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...

2016-08-20 Thread steveloughran

Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/14038
  
Path filtering in Hadoop FS calls on anything other than filename is very 
suboptimal; in #14731 you can see where the filtering has been postoned until 
after the listing, when the full `FileStatus` entry list has been returned.

As filtering is the last operation in the various listFiles calls, there's 
no penalty to doing the filtering after the results come in. In 
`FileSytem.globStatus()` the filtering takes place after the glob match, but 
during the scan...a larger list will be built and returned, but that is all.

I think a new filter should be executed after these operations, taking the 
`FileStatus` object, this provides a superset of filtering possible within the 
Hadoop calls (timestamp, filetype, ...), with no performance penalty. It's more 
flexible than the simple `accept(path)`, and will guarantee that nobody using 
the API will implement a suboptimal filter.

Consider also taking a predicate `Filesystem => Boolean`, rather than 
requiring callers to implement new classes. It can be fed straight into 
`Iterator.filter()`.

I note you are making extensive use of `listLeafFiles`; that's a 
potentially inefficent implementation against object stores. Keep using it 
âI'll patch it to use `FileSystem.listFiles(path, true)` for in FS recursion 
and O(files/5000) listing against S3A in Hadoop 2.8; eventually Azure and swoft


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-20 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/14731#discussion_r75584026
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
 ---
@@ -293,8 +290,8 @@ class FileInputDStream[K, V, F <: NewInputFormat[K, V]](
   }
 
   /** Get file mod time from cache or fetch it from the file system */
-  private def getFileModTime(path: Path) = {
-fileToModTime.getOrElseUpdate(path.toString, 
fs.getFileStatus(path).getModificationTime())
+  private def getFileModTime(fs: FileStatus) = {
--- End diff --

yes, I was just being minimal about the changes. Inlining is easy


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-20 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/14731#discussion_r75584030
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
 ---
@@ -241,16 +233,21 @@ class FileInputDStream[K, V, F <: NewInputFormat[K, 
V]](
*   The files with mod time T+5 are not remembered and cannot be 
ignored (since, t+5 > t+1).
*   Hence they can get selected as new files again. To prevent this, 
files whose mod time is more
*   than current batch time are not considered.
+   * @param fs file status
+   * @param currentTime time of the batch
+   * @param modTimeIgnoreThreshold the ignore threshold
+   * @return true if the file has been modified within the batch window
*/
-  private def isNewFile(path: Path, currentTime: Long, 
modTimeIgnoreThreshold: Long): Boolean = {
+ private def isNewFile(fs: FileStatus, currentTime: Long, 
modTimeIgnoreThreshold: Long): Boolean = {
--- End diff --

I'll fix this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14732: [SPARK-16320] [DOC] Document G1 heap region's effect on ...

2016-08-20 Thread petermaxlee

Github user petermaxlee commented on the issue:

https://github.com/apache/spark/pull/14732
  
Looks good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-20 Thread petermaxlee

Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/14731#discussion_r75583457
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
 ---
@@ -293,8 +290,8 @@ class FileInputDStream[K, V, F <: NewInputFormat[K, V]](
   }
 
   /** Get file mod time from cache or fetch it from the file system */
-  private def getFileModTime(path: Path) = {
-fileToModTime.getOrElseUpdate(path.toString, 
fs.getFileStatus(path).getModificationTime())
+  private def getFileModTime(fs: FileStatus) = {
--- End diff --

should we just remove this function now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14735: [MINOR][SPARKR] R MLlib refactor, cleanup, reformat, fix...

2016-08-20 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14735
  
This seems a big enough change that it might be good to have a JIRA for 
this ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-20 Thread petermaxlee

Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/14731#discussion_r75583446
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
 ---
@@ -241,16 +233,21 @@ class FileInputDStream[K, V, F <: NewInputFormat[K, 
V]](
*   The files with mod time T+5 are not remembered and cannot be 
ignored (since, t+5 > t+1).
*   Hence they can get selected as new files again. To prevent this, 
files whose mod time is more
*   than current batch time are not considered.
+   * @param fs file status
+   * @param currentTime time of the batch
+   * @param modTimeIgnoreThreshold the ignore threshold
+   * @return true if the file has been modified within the batch window
*/
-  private def isNewFile(path: Path, currentTime: Long, 
modTimeIgnoreThreshold: Long): Boolean = {
+ private def isNewFile(fs: FileStatus, currentTime: Long, 
modTimeIgnoreThreshold: Long): Boolean = {
--- End diff --

also fs is pretty confusing, because in this context it is often used to 
refer to as FileSystem. We should pick a different word.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-20 Thread petermaxlee

Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/14731#discussion_r75583436
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
 ---
@@ -241,16 +233,21 @@ class FileInputDStream[K, V, F <: NewInputFormat[K, 
V]](
*   The files with mod time T+5 are not remembered and cannot be 
ignored (since, t+5 > t+1).
*   Hence they can get selected as new files again. To prevent this, 
files whose mod time is more
*   than current batch time are not considered.
+   * @param fs file status
+   * @param currentTime time of the batch
+   * @param modTimeIgnoreThreshold the ignore threshold
+   * @return true if the file has been modified within the batch window
*/
-  private def isNewFile(path: Path, currentTime: Long, 
modTimeIgnoreThreshold: Long): Boolean = {
+ private def isNewFile(fs: FileStatus, currentTime: Long, 
modTimeIgnoreThreshold: Long): Boolean = {
--- End diff --

indent is wrong here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14719
  
**[Test build #64155 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64155/consoleFull)**
 for PR 14719 at commit 
[`9ddc9d8`](https://github.com/apache/spark/commit/9ddc9d858fc3d5b269a8a762b356a545f70646d6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14735: [MINOR][SPARKR] R MLlib refactor, cleanup, reformat, fix...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14735
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14735: [MINOR][SPARKR] R MLlib refactor, cleanup, reformat, fix...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14735
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64154/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14735: [MINOR][SPARKR] R MLlib refactor, cleanup, reformat, fix...

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14735
  
**[Test build #64154 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64154/consoleFull)**
 for PR 14735 at commit 
[`1ef18d6`](https://github.com/apache/spark/commit/1ef18d6abfe854c95e0323a406065d9ee4f11c15).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14155
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64149/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14155
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14155
  
**[Test build #64149 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64149/consoleFull)**
 for PR 14155 at commit 
[`38b838a`](https://github.com/apache/spark/commit/38b838a9d27d5e11bad5f5e7040fe2d6d2e56216).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14734: [SPARK-16508][SPARKR] small doc updates

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14734
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64153/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14734: [SPARK-16508][SPARKR] small doc updates

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14734
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14734: [SPARK-16508][SPARKR] small doc updates

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14734
  
**[Test build #64153 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64153/consoleFull)**
 for PR 14734 at commit 
[`4b6c42e`](https://github.com/apache/spark/commit/4b6c42ec1861cb3e48e85d83c22caccb910532ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14734: [SPARK-16508][SPARKR] small doc updates

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14734
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14734: [SPARK-16508][SPARKR] small doc updates

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14734
  
**[Test build #64152 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64152/consoleFull)**
 for PR 14734 at commit 
[`341a2f8`](https://github.com/apache/spark/commit/341a2f8b85d584e9715605c9689d4c77b53483a2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14734: [SPARK-16508][SPARKR] small doc updates

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14734
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64152/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14735: [MINOR][SPARKR] R MLlib refactor, cleanup, reformat, fix...

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14735
  
**[Test build #64151 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64151/consoleFull)**
 for PR 14735 at commit 
[`3ea30bb`](https://github.com/apache/spark/commit/3ea30bb3b5d22626f6de6e0699504180f267dfdc).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14735: [MINOR][SPARKR] R MLlib refactor, cleanup, reformat, fix...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14735
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64151/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14735: [MINOR][SPARKR] R MLlib refactor, cleanup, reformat, fix...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14735
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14735: [MINOR][SPARKR] R MLlib refactor, cleanup, reformat, fix...

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14735
  
**[Test build #64154 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64154/consoleFull)**
 for PR 14735 at commit 
[`1ef18d6`](https://github.com/apache/spark/commit/1ef18d6abfe854c95e0323a406065d9ee4f11c15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14719
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14719
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64148/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14719
  
**[Test build #64148 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64148/consoleFull)**
 for PR 14719 at commit 
[`48a0775`](https://github.com/apache/spark/commit/48a0775e80cc91340cb0754c62b35868f319cf44).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14734: [SPARK-16508][SPARKR] small doc updates

2016-08-20 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14734
  
**[Test build #64153 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64153/consoleFull)**
 for PR 14734 at commit 
[`4b6c42e`](https://github.com/apache/spark/commit/4b6c42ec1861cb3e48e85d83c22caccb910532ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14734: [SPARK-16508][SPARKR] small doc updates

2016-08-20 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/14734#discussion_r75581656
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2880,7 +2880,7 @@ setMethod("fillna",
 #'
 #' @param x a SparkDataFrame.
 #' @param row.names NULL or a character vector giving the row names for 
the data frame.
--- End diff --

updated a few places we are referencing `NULL` literally.
there are more "null" in DataFrame or column function documentation but 
they are in somewhat gray area - JVM `null` are mapped to R `NA` (and not to 
`NULL`) - and we should look into the best way to name functions or document 
them.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14719
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64146/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14719
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 280 matches

Mail list logo