date:20160612

[GitHub] spark issue #13634: [SPARK-15913][CORE] Dispatcher.stopped should be enclose...

2016-06-12 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13634
  
Hi, @vanzin .
Could you review this when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13634: [SPARK-15913][CORE] Dispatcher.stopped should be ...

2016-06-12 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13634

[SPARK-15913][CORE] Dispatcher.stopped should be enclosed by synchronized 
block.

## What changes were proposed in this pull request?

`Dispatcher.stopped` is guarded by `this`, but it is used without 
synchronization in `postMessage` function. This PR fixes this and also the 
exception message became more accurate.

## How was this patch tested?

Pass the existing Jenkins tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-15913

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13634.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13634


commit 75a5254371374faf66f166e1b2683d3f9803cb8e
Author: Dongjoon Hyun 
Date:   2016-06-13T05:53:47Z

[SPARK-15913][CORE] Dispatcher.stopped should be enclosed by synchronized 
block.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13623: [SPARK-15895][SQL] Filters out metadata files while doin...

2016-06-12 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13623
  
In this file PartitioningAwareFileCatalog.scala, we have multiple places 
that we filter out files that start with underscore. Should we also filter dot 
in those places?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-12 Thread yangw1234

Github user yangw1234 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13585#discussion_r66742485
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -92,6 +92,36 @@ object PhysicalOperation extends PredicateHelper {
   .map(Alias(_, a.name)(a.exprId, a.qualifier, isGenerated = 
a.isGenerated)).getOrElse(a)
 }
   }
+
+  /**
+   * Drop the non-partition key expression in the disjunctions, to 
optimize the partition pruning.
+   * For instances: (We assume part1 & part2 are the partition keys)
+   * (part1 == 1 and a > 3) or (part2 == 2 and a < 5)  ==> (part1 == 1 or 
part1 == 2)
+   * (part1 == 1 and a > 3) or (a < 100) => None
+   * (a > 100 && b < 100) or (part1 = 10) => None
+   * (a > 100 && b < 100 and part1 = 10) or (part1 == 2) => (part1 = 10 or 
part1 == 2)
+   * @param predicate disjunctions
+   * @param partitionKeyIds partition keys in attribute set
+   * @return
+   */
+  def partitionPrunningFromDisjunction(
+predicate: Expression, partitionKeyIds: AttributeSet): 
Option[Expression] = {
+// ignore the pure non-partition key expression in conjunction of the 
expression tree
+val additionalPartPredicate = predicate transformUp {
+  case a @ And(left, right) if a.deterministic &&
+left.references.intersect(partitionKeyIds).isEmpty => right
+  case a @ And(left, right) if a.deterministic &&
+right.references.intersect(partitionKeyIds).isEmpty => left
--- End diff --

The problem is here. Imagine a record `a = 2` in `partition = 1`. Such a 
record satisfies the above expression (`!(partition = 1 && a > 3)`), but if we 
simply drop `a > 3` and push `!(partition = 1)` down to the table scan, 
partition =1 will be discarded and the record won't appear in the result. 

The test case passed because the `BooleanSimplification` optimizer rule 
will  transform `!(partition =1 && a > 3)` to `(!(partition=1) || (a <= 3))`, 
such an expression will be dropped entirely by your 
`partitionPrunningFromDisjunction`, in which case "partition = 1" will not be 
discarded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13633: [SPARK-15912] [SQL] Replace getPartitionsByFilter by get...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13633
  
**[Test build #60382 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60382/consoleFull)**
 for PR 13633 at commit 
[`77c8808`](https://github.com/apache/spark/commit/77c8808e5325b04c54d5f7d7c043ad34a8b09477).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13633: [SPARK-15912] [SQL] Replace getPartitionsByFilter...

2016-06-12 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/13633

[SPARK-15912] [SQL] Replace getPartitionsByFilter by getPartitions in 
inputFiles of MetastoreRelation

 What changes were proposed in this pull request?
Always returns the files of all the partitions when calling `inputFiles`. 
Thus, the implementation of `inputFiles` in `MetastoreRelation` does not need 
to call `getPartitionsByFilter`. Instead, we should call `getPartitions`.

No test case is available for `inputFiles` API in `MetastoreRelation`. This 
PR also adds the missing test cases.

 How was this patch tested?
See above.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark testCase4InputFiles

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13633.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13633


commit 77c8808e5325b04c54d5f7d7c043ad34a8b09477
Author: gatorsmile 
Date:   2016-06-13T05:43:51Z

fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13631: [SPARK-15911][SQL] Remove the additional Project to be c...

2016-06-12 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13631
  
@cloud-fan There is a test ("Detect table partitioning with correct 
partition order") in `InsertIntoHiveTableSuite` which is dedicated to test 
`insertInto` with this column re-ordering. What you think we should do about 
it? Remove it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-12 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/13585#discussion_r66741771
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -65,15 +65,20 @@ private[hive] trait HiveStrategies {
 // hive table scan operator to be used for partition pruning.
 val partitionKeyIds = AttributeSet(relation.partitionKeys)
 val (pruningPredicates, otherPredicates) = predicates.partition { 
predicate =>
-  !predicate.references.isEmpty &&
+  predicate.references.nonEmpty &&
   predicate.references.subsetOf(partitionKeyIds)
 }
+val additionalPartPredicates =
+  PhysicalOperation.partitionPrunningFromDisjunction(
+otherPredicates.foldLeft[Expression](Literal(true))(And(_, 
_)), partitionKeyIds)
 
 pruneFilterProject(
   projectList,
   otherPredicates,
   identity[Seq[Expression]],
-  HiveTableScanExec(_, relation, pruningPredicates)(sparkSession)) 
:: Nil
+HiveTableScanExec(_,
+relation,
+pruningPredicates ++ additionalPartPredicates)(sparkSession)) 
:: Nil
--- End diff --

Sorry, @clockfly I am not so sure your mean, this PR is not designed to 
depends on the Optimizer (CNF), can you please give more concrete example if 
there is a bug?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66741569
  
--- Diff: R/pkg/R/mllib.R ---
@@ -218,9 +222,9 @@ setMethod("predict", signature(object = 
"GeneralizedLinearRegressionModel"),
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
 
-#' Make predictions from a naive Bayes model
+#' predict
--- End diff --

Same here: keep longer title


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66741577
  
--- Diff: R/pkg/R/mllib.R ---
@@ -582,9 +586,9 @@ setMethod("summary", signature(object = 
"AFTSurvivalRegressionModel"),
 return(list(coefficients = coefficients))
   })
 
-#' Make predictions from an AFT survival regression model
+#' predict
--- End diff --

ditto: keep long title


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66741575
  
--- Diff: R/pkg/R/mllib.R ---
@@ -357,9 +361,9 @@ setMethod("summary", signature(object = "KMeansModel"),
cluster = cluster, is.loaded = is.loaded))
   })
 
-#' Make predictions from a k-means model
+#' predict
--- End diff --

ditto: keep long title


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66741568
  
--- Diff: R/pkg/R/mllib.R ---
@@ -197,7 +201,7 @@ print.summary.GeneralizedLinearRegressionModel <- 
function(x, ...) {
   invisible(x)
   }
 
-#' Make predictions from a generalized linear model
+#' predict
--- End diff --

No need for this change.  We can keep the longer title.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66741560
  
--- Diff: R/pkg/R/column.R ---
@@ -170,6 +172,8 @@ setMethod("between", signature(x = "Column"),
 }
   })
 
+#' cast
+#'
 #' Casts the column to a different data type.
--- End diff --

This can remain the title, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-12 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13394#discussion_r66741566
  
--- Diff: R/pkg/R/functions.R ---
@@ -249,10 +249,7 @@ col <- function(x) {
 #'
 #' Returns a Column based on the given column name.
 #'
-#' @rdname col
-#' @name column
 #' @family normal_funcs
-#' @export
--- End diff --

This function is exported, right?  It's ```col``` which is not exported.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13413
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13413
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60379/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13413
  
**[Test build #60379 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60379/consoleFull)**
 for PR 13413 at commit 
[`535d27c`](https://github.com/apache/spark/commit/535d27c45a1dd62ccb35616bf25e8363b625).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class CollectionAccumulator[T] extends AccumulatorV2[T, 
java.util.List[T]] `
  * `class LibSVMFileFormat extends TextBasedFileFormat with 
DataSourceRegister `
  * `public static class Prefix `
  * `abstract class ForeachWriter[T] extends Serializable `
  * `abstract class SparkStrategy extends GenericStrategy[SparkPlan] `
  * `class CSVFileFormat extends TextBasedFileFormat with 
DataSourceRegister `
  * `case class RefreshResource(path: String)`
  * `abstract class TextBasedFileFormat extends FileFormat `
  * `class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister `
  * `class TextFileFormat extends TextBasedFileFormat with 
DataSourceRegister `
  * `class ForeachSink[T : Encoder](writer: ForeachWriter[T]) extends Sink 
with Serializable `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-12 Thread clockfly

Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13585#discussion_r66741254
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -65,15 +65,20 @@ private[hive] trait HiveStrategies {
 // hive table scan operator to be used for partition pruning.
 val partitionKeyIds = AttributeSet(relation.partitionKeys)
 val (pruningPredicates, otherPredicates) = predicates.partition { 
predicate =>
-  !predicate.references.isEmpty &&
+  predicate.references.nonEmpty &&
   predicate.references.subsetOf(partitionKeyIds)
 }
+val additionalPartPredicates =
+  PhysicalOperation.partitionPrunningFromDisjunction(
+otherPredicates.foldLeft[Expression](Literal(true))(And(_, 
_)), partitionKeyIds)
 
 pruneFilterProject(
   projectList,
   otherPredicates,
   identity[Seq[Expression]],
-  HiveTableScanExec(_, relation, pruningPredicates)(sparkSession)) 
:: Nil
+HiveTableScanExec(_,
+relation,
+pruningPredicates ++ additionalPartPredicates)(sparkSession)) 
:: Nil
--- End diff --

Sure, we understand that the additionalPartPredicates is the partition 
filter. But we may not be able to assure BooleanSimplification will push all 
NOT operator to leaf expression, as BooleanSimplification is an "optimizer" 
rule, which can be skipped if exceeding max iterations during optimization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13631: [SPARK-15911][SQL] Remove the additional Project to be c...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13631
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13631: [SPARK-15911][SQL] Remove the additional Project to be c...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13631
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60380/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13631: [SPARK-15911][SQL] Remove the additional Project to be c...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13631
  
**[Test build #60380 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60380/consoleFull)**
 for PR 13631 at commit 
[`5f4455a`](https://github.com/apache/spark/commit/5f4455ae3400302c4f3cb019419dbdada4edf5c9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13623: [SPARK-15895][SQL] Filters out metadata files while doin...

2016-06-12 Thread clockfly

Github user clockfly commented on the issue:

https://github.com/apache/spark/pull/13623
  
Looks good! +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13604: [SPARK-15898][SQL] DataFrameReader.text should re...

2016-06-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13604


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13604: [SPARK-15898][SQL] DataFrameReader.text should return Da...

2016-06-12 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13604
  
Merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13629: [SPARK-15370][SQL] Fix count bug

2016-06-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13629


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13629: [SPARK-15370][SQL] Fix count bug

2016-06-12 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13629
  
Thanks - merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13632: [SPARK-15910][SQL] Check schema consistency when using K...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13632
  
**[Test build #60381 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60381/consoleFull)**
 for PR 13632 at commit 
[`50bb48d`](https://github.com/apache/spark/commit/50bb48d6f2a59e2e88fe68699fceac308153e08a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13632: [SPARK-15910][SQL] Check schema consistency when ...

2016-06-12 Thread clockfly

GitHub user clockfly opened a pull request:

https://github.com/apache/spark/pull/13632

[SPARK-15910][SQL] Check schema consistency when using Kryo encoder to 
convert DataFrame to Dataset

## What changes were proposed in this pull request?

This PR enforces schema check when converting DataFrame to Dataset using 
Kryo encoder. For example.

**Before the change:**

Schema is NOT checked when converting DataFrame to Dataset using kryo 
encoder.
```
scala> case class B(b: Int)
scala> implicit val encoder = Encoders.kryo[B]
scala> val df = Seq((1)).toDF("b")
scala> val ds = df.as[B] // Schema compatibility is NOT checked
```

**After the change:**
Report AnalysisException since the schema is NOT compatible. 
```
scala> val ds = Seq((1)).toDF("b").as[B]
org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(`b` AS 
BINARY)' due to data type mismatch: cannot cast IntegerType to BinaryType;
...
```

## How was this patch tested?

Unit test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/clockfly/spark spark-15910

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13632.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13632


commit 50bb48d6f2a59e2e88fe68699fceac308153e08a
Author: Sean Zhong 
Date:   2016-06-13T04:01:57Z

SPARK-15910: Check schema




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13631: [SPARK-15911][SQL] Remove the additional Project to be c...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13631
  
**[Test build #60380 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60380/consoleFull)**
 for PR 13631 at commit 
[`5f4455a`](https://github.com/apache/spark/commit/5f4455ae3400302c4f3cb019419dbdada4edf5c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...

2016-06-12 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/13631

[SPARK-15911][SQL] Remove the additional Project to be consistent with SQL

## What changes were proposed in this pull request?

Currently In `DataFrameWriter`'s `insertInto` and `ResolveRelations` of 
`Analyzer`, we add additional Project to adjust column ordering. However, it 
should be using ordering not name for this resolution. This is how Hive does 
for dynamic partition.

## How was this patch tested?
Existing tests.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 inserttable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13631


commit 5f4455ae3400302c4f3cb019419dbdada4edf5c9
Author: Liang-Chi Hsieh 
Date:   2016-06-13T04:00:40Z

Remove the additional Project to be consistent with SQL.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13585
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13585
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60376/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13585
  
**[Test build #60376 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60376/consoleFull)**
 for PR 13585 at commit 
[`79f7acb`](https://github.com/apache/spark/commit/79f7acbb660c2c398e21a36a7b92f316b7e5037f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunctions sh...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13413
  
**[Test build #60379 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60379/consoleFull)**
 for PR 13413 at commit 
[`535d27c`](https://github.com/apache/spark/commit/535d27c45a1dd62ccb35616bf25e8363b625).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #10706: [SPARK-12543] [SPARK-4226] [SQL] Subquery in expression

2016-06-12 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/10706
  
@kamalcoursera you could use a predicate scalar subquery here, i.e.:
```sql
select   runon as runon
 case
  when (select max(true) from sqltesttable b where b.key = a.key 
and group = 'vowels') then 'vowels'
  else 'consonants'
 end as group,
 key as key,
 someint as someint
from sqltesttable a;
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13630: [SPARK-15892][ML] Change spark to sqlContext in t...

2016-06-12 Thread HyukjinKwon

Github user HyukjinKwon closed the pull request at:

https://github.com/apache/spark/pull/13630


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13630: [SPARK-15892][ML] Change spark to sqlContext in the test...

2016-06-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13630
  
It seems this should be reverted first. Closing this (see 
https://github.com/apache/spark/pull/13619).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13619: [SPARK-15892][ML] Incorrectly merged AFTAggregator with ...

2016-06-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13619
  
Yes, it seems it is still failing. I can change the PR to revert this if he 
is busy for now. Otherwise, I will close mine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13619: [SPARK-15892][ML] Incorrectly merged AFTAggregator with ...

2016-06-12 Thread zzcclp

Github user zzcclp commented on the issue:

https://github.com/apache/spark/pull/13619
  
@HyukjinKwon OK, but this pr should be reverted first.
@jkbradley  could you revert this pr for branch-1.6 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13558
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60378/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13558
  
**[Test build #60378 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60378/consoleFull)**
 for PR 13558 at commit 
[`8d87c0f`](https://github.com/apache/spark/commit/8d87c0f2bd9140928915f835fd7d21b178422c69).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13558
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13630: [SPARK-15892][ML] Change spark to sqlContext in the test...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13630
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13630: [SPARK-15892][ML] Change spark to sqlContext in the test...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13630
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60377/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13630: [SPARK-15892][ML] Change spark to sqlContext in the test...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13630
  
**[Test build #60377 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60377/consoleFull)**
 for PR 13630 at commit 
[`022`](https://github.com/apache/spark/commit/022d48d52127fb3cab804a78eed9ff253b76).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13558
  
**[Test build #60378 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60378/consoleFull)**
 for PR 13558 at commit 
[`8d87c0f`](https://github.com/apache/spark/commit/8d87c0f2bd9140928915f835fd7d21b178422c69).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-12 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/13558
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13619: [SPARK-15892][ML] Incorrectly merged AFTAggregator with ...

2016-06-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13619
  
@zzcclp I made a PR against branch-1.6 here, 
https://github.com/apache/spark/pull/13630. Thank you for pointing this out 
quickly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13630: [SPARK-15892][ML] Change spark to sqlContext in the test...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13630
  
**[Test build #60377 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60377/consoleFull)**
 for PR 13630 at commit 
[`022`](https://github.com/apache/spark/commit/022d48d52127fb3cab804a78eed9ff253b76).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13630: [SPARK-15892][ML] Change spark to sqlContext in the test...

2016-06-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13630
  
cc @jkbradley and @zzcclp 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13630: [SPARK-15892][ML] Change spark to sqlContext in t...

2016-06-12 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/13630

[SPARK-15892][ML] Change spark to sqlContext in the test in 
AFTSurvivalRegressionSuite for 1.6

## What changes were proposed in this pull request?

https://github.com/apache/spark/pull/13619 was merged into master, 
branch-2.0 and 1.6 as well but the unit test uses `spark`.

So, this PR change the `spark` to `sqlContext` in unit tests.

It seems builds are failing due to this.

## How was this patch tested?

Jenkins tests.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-15892-1.6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13630.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13630


commit 022d48d52127fb3cab804a78eed9ff253b76
Author: hyukjinkwon 
Date:   2016-06-13T02:04:39Z

Change spark to sqlContext for 1.6




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #10706: [SPARK-12543] [SPARK-4226] [SQL] Subquery in expression

2016-06-12 Thread kamalcoursera

Github user kamalcoursera commented on the issue:

https://github.com/apache/spark/pull/10706
  
Thank you ! Any alternative options to use instead of predicate subquery ? 
Is it in plan for amendment in 2.0?

Select 
runon as runon,
case 
when key in (Select key from sqltesttable where group = 'vowels') then 
'vowels'
else 'consonants'
end as group,
key as key,
someint as someint
from sqltesttable;


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-06-12 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/13585#discussion_r66733226
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -65,15 +65,20 @@ private[hive] trait HiveStrategies {
 // hive table scan operator to be used for partition pruning.
 val partitionKeyIds = AttributeSet(relation.partitionKeys)
 val (pruningPredicates, otherPredicates) = predicates.partition { 
predicate =>
-  !predicate.references.isEmpty &&
+  predicate.references.nonEmpty &&
   predicate.references.subsetOf(partitionKeyIds)
 }
+val additionalPartPredicates =
+  PhysicalOperation.partitionPrunningFromDisjunction(
+otherPredicates.foldLeft[Expression](Literal(true))(And(_, 
_)), partitionKeyIds)
 
 pruneFilterProject(
   projectList,
   otherPredicates,
   identity[Seq[Expression]],
-  HiveTableScanExec(_, relation, pruningPredicates)(sparkSession)) 
:: Nil
+HiveTableScanExec(_,
+relation,
+pruningPredicates ++ additionalPartPredicates)(sparkSession)) 
:: Nil
--- End diff --

@yangw1234 @liancheng @clockfly 
`pruningPredicates ++ additionalPartPredicates` is the partition filter, 
and, the original filter still need to be applied after the partition pruned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-12 Thread chenghao-intel

Github user chenghao-intel commented on the issue:

https://github.com/apache/spark/pull/13585
  
Updated with more meaningful function name and add more unit test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-12 Thread chenghao-intel

Github user chenghao-intel commented on the issue:

https://github.com/apache/spark/pull/13585
  
cc @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-12 Thread sun-rui

Github user sun-rui commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r66733080
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/RRunner.scala ---
@@ -40,7 +40,8 @@ private[spark] class RRunner[U](
 broadcastVars: Array[Broadcast[Object]],
 numPartitions: Int = -1,
 isDataFrame: Boolean = false,
-colNames: Array[String] = null)
+colNames: Array[String] = null,
+mode: Int = 0)
--- End diff --

it is better to define enumerations for mode instead of hard-coding. for 
example,
private[sql] object RRunnerModes = {
  val RDD = 0
  val DATAFRAME_DAPPLY = 1
  val DATAFRAME_GAPPLY = 2
}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13619: [SPARK-15892][ML] Incorrectly merged AFTAggregato...

2016-06-12 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13619#discussion_r66732958
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala
 ---
@@ -390,6 +390,18 @@ class AFTSurvivalRegressionSuite
 testEstimatorAndModelReadWrite(aft, datasetMultivariate,
   AFTSurvivalRegressionSuite.allParamSettings, checkModelData)
   }
+
+  test("SPARK-15892: Incorrectly merged AFTAggregator with zero total 
count") {
+// This `dataset` will contain an empty partition because it has two 
rows but
+// the parallelism is bigger than that. Because the issue was about 
`AFTAggregator`s
+// being merged incorrectly when it has an empty partition, running 
the codes below
+// should not throw an exception.
+val dataset = spark.createDataFrame(
--- End diff --

Oh, it seems this is merged into branch-1.6 too. Yes, it should be 
`sqlContext` for branch-1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13585
  
**[Test build #60376 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60376/consoleFull)**
 for PR 13585 at commit 
[`79f7acb`](https://github.com/apache/spark/commit/79f7acbb660c2c398e21a36a7b92f316b7e5037f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13619: [SPARK-15892][ML] Incorrectly merged AFTAggregato...

2016-06-12 Thread zzcclp

Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/spark/pull/13619#discussion_r66732911
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala
 ---
@@ -390,6 +390,18 @@ class AFTSurvivalRegressionSuite
 testEstimatorAndModelReadWrite(aft, datasetMultivariate,
   AFTSurvivalRegressionSuite.allParamSettings, checkModelData)
   }
+
+  test("SPARK-15892: Incorrectly merged AFTAggregator with zero total 
count") {
+// This `dataset` will contain an empty partition because it has two 
rows but
+// the parallelism is bigger than that. Because the issue was about 
`AFTAggregator`s
+// being merged incorrectly when it has an empty partition, running 
the codes below
+// should not throw an exception.
+val dataset = spark.createDataFrame(
--- End diff --

with branch-2.0, it is OK , i think that this pr should not be merged into 
branch-1.6 directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13619: [SPARK-15892][ML] Incorrectly merged AFTAggregato...

2016-06-12 Thread zzcclp

Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/spark/pull/13619#discussion_r66732825
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala
 ---
@@ -390,6 +390,18 @@ class AFTSurvivalRegressionSuite
 testEstimatorAndModelReadWrite(aft, datasetMultivariate,
   AFTSurvivalRegressionSuite.allParamSettings, checkModelData)
   }
+
+  test("SPARK-15892: Incorrectly merged AFTAggregator with zero total 
count") {
+// This `dataset` will contain an empty partition because it has two 
rows but
+// the parallelism is bigger than that. Because the issue was about 
`AFTAggregator`s
+// being merged incorrectly when it has an empty partition, running 
the codes below
+// should not throw an exception.
+val dataset = spark.createDataFrame(
--- End diff --

I compile it in branch-1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13629: [SPARK-15370][SQL] Fix count bug

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13629
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13629: [SPARK-15370][SQL] Fix count bug

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13629
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60375/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #6983: [SPARK-6785][SQL] fix DateTimeUtils for dates befo...

2016-06-12 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6983#discussion_r66732804
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ---
@@ -48,4 +49,41 @@ class DateTimeUtilsSuite extends SparkFunSuite {
 val t2 = DateTimeUtils.toJavaTimestamp(DateTimeUtils.fromJulianDay(d1, 
ns1))
 assert(t.equals(t2))
   }
+
+  test("SPARK-6785: java date conversion before and after epoch") {
+def checkFromToJavaDate(d1: Date): Unit = {
+  val d2 = DateTimeUtils.toJavaDate(DateTimeUtils.fromJavaDate(d1))
+  assert(d2.toString === d1.toString)
--- End diff --

Shouldn't it also be the case that `d1 === d2`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13619: [SPARK-15892][ML] Incorrectly merged AFTAggregato...

2016-06-12 Thread zzcclp

Github user zzcclp commented on a diff in the pull request:

https://github.com/apache/spark/pull/13619#discussion_r66732808
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala
 ---
@@ -390,6 +390,18 @@ class AFTSurvivalRegressionSuite
 testEstimatorAndModelReadWrite(aft, datasetMultivariate,
   AFTSurvivalRegressionSuite.allParamSettings, checkModelData)
   }
+
+  test("SPARK-15892: Incorrectly merged AFTAggregator with zero total 
count") {
+// This `dataset` will contain an empty partition because it has two 
rows but
+// the parallelism is bigger than that. Because the issue was about 
`AFTAggregator`s
+// being merged incorrectly when it has an empty partition, running 
the codes below
+// should not throw an exception.
+val dataset = spark.createDataFrame(
--- End diff --

@HyukjinKwon value `spark` is not found here, it should be `sqlContext`, 
right? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13629: [SPARK-15370][SQL] Fix count bug

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13629
  
**[Test build #60375 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60375/consoleFull)**
 for PR 13629 at commit 
[`30dd0bd`](https://github.com/apache/spark/commit/30dd0bd7d560151085e53667fcc4f6a8895844ed).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13571: [SPARK-15369][WIP][RFC][PySpark][SQL] Expose potential t...

2016-06-12 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/13571
  
So this is a WIP of what this could look like, but I'd really like your 
thoughts on the draft @davies - do you think this is heading in the right 
direction given the performance #s from the benchmark?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-12 Thread NarineK

Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r66732763
  
--- Diff: R/pkg/R/group.R ---
@@ -142,3 +142,58 @@ createMethods <- function() {
 }
 
 createMethods()
+
+#' gapply
+#'
+#' Applies a R function to each group in the input GroupedData
+#'
+#' @param x a GroupedData
+#' @param func A function to be applied to each group partition specified 
by GroupedData.
+#' The function `func` takes as argument a key - grouping 
columns and
+#' a data frame - a local R data.frame.
+#' The output of `func` is a local R data.frame.
+#' @param schema The schema of the resulting SparkDataFrame after the 
function is applied.
+#'   It must match the output of func.
+#' @return a SparkDataFrame
+#' @rdname gapply
+#' @name gapply
+#' @examples
+#' \dontrun{
+#' Computes the arithmetic mean of the second column by grouping
+#' on the first and third columns. Output the grouping values and the 
average.
+#'
+#' df <- createDataFrame (
+#' list(list(1L, 1, "1", 0.1), list(1L, 2, "1", 0.2), list(3L, 3, "3", 
0.3)),
+#'   c("a", "b", "c", "d"))
+#'
+#' schema <-  structType(structField("a", "integer"), structField("c", 
"string"),
+#'   structField("avg", "double"))
+#' df1 <- gapply(
+#'   df,
+#'   list("a", "c"),
+#'   function(key, x) {
+#' y <- data.frame(key, mean(x$b), stringsAsFactors = FALSE)
+#'   },
+#' schema)
+#' collect(df1)
+#'
+#' Result
+#' --
+#' a c avg
+#' 3 3 3.0
+#' 1 1 1.5
+#' }
+setMethod("gapply",
+  signature(x = "GroupedData"),
+  function(x, func, schema) {
+packageNamesArr <- serialize(.sparkREnv[[".packages"]],
+ connection = NULL)
+broadcastArr <- lapply(ls(.broadcastNames),
+  function(name) { get(name, .broadcastNames) 
})
+sdf <- callJMethod(x@sgd, "flatMapGroupsInR",
+ serialize(cleanClosure(func), connection = NULL),
+ packageNamesArr,
+ broadcastArr,
+ if (is.null(schema)) { schema } else { schema$jobj })
--- End diff --

Thnx, I set an assertion. we cannot do it exactly like dapply by forcing 
with schema because gapply for GroupedData is slightly different from 
DataFrame's gapply.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13604: [SPARK-15898][SQL] DataFrameReader.text should return Da...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13604
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60374/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13604: [SPARK-15898][SQL] DataFrameReader.text should return Da...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13604
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13604: [SPARK-15898][SQL] DataFrameReader.text should return Da...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13604
  
**[Test build #60374 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60374/consoleFull)**
 for PR 13604 at commit 
[`50538b7`](https://github.com/apache/spark/commit/50538b7c0952f6954a19402423e12349037b130c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class CollectionAccumulator[T] extends AccumulatorV2[T, 
java.util.List[T]] `
  * `class LibSVMFileFormat extends TextBasedFileFormat with 
DataSourceRegister `
  * `public static class Prefix `
  * `abstract class SparkStrategy extends GenericStrategy[SparkPlan] `
  * `class CSVFileFormat extends TextBasedFileFormat with 
DataSourceRegister `
  * `case class RefreshResource(path: String)`
  * `abstract class TextBasedFileFormat extends FileFormat `
  * `class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister `
  * `class TextFileFormat extends TextBasedFileFormat with 
DataSourceRegister `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13623: [SPARK-15895][SQL] Filters out metadata files while doin...

2016-06-12 Thread ssimeonov

Github user ssimeonov commented on the issue:

https://github.com/apache/spark/pull/13623
  
ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13628: [SPARK-15907] [SQL] Issue Exceptions when Not Enough Inp...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13628
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60373/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13628: [SPARK-15907] [SQL] Issue Exceptions when Not Enough Inp...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13628
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13628: [SPARK-15907] [SQL] Issue Exceptions when Not Enough Inp...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13628
  
**[Test build #60373 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60373/consoleFull)**
 for PR 13628 at commit 
[`4c49112`](https://github.com/apache/spark/commit/4c4911226136bd797ed17955e795615f9c145de8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyze...

2016-06-12 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13496#discussion_r66731642
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -452,6 +452,17 @@ class Analyzer(
 
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+// A partitioned relation's schema can be different from the input 
logicalPlan, since
+// partition columns are all moved after data columns. We Project 
to adjust the ordering.
+val input = if (parts.nonEmpty) {
+  val (inputPartCols, inputDataCols) = child.output.partition { 
attr =>
+parts.contains(attr.name)
+  }
+  Project(inputDataCols ++ inputPartCols, child)
+} else {
+  child
+}
--- End diff --

@cloud-fan ok. I will do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13542: [SPARK-15730][SQL][WIP] Respect the --hiveconf in...

2016-06-12 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/13542#discussion_r66731077
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
 ---
@@ -91,6 +91,8 @@ class CliSuite extends SparkFunSuite with 
BeforeAndAfterAll with Logging {
  |  --hiveconf ${ConfVars.METASTORECONNECTURLKEY}=$jdbcUrl
  |  --hiveconf ${ConfVars.METASTOREWAREHOUSE}=$warehousePath
  |  --hiveconf ${ConfVars.SCRATCHDIR}=$scratchDirPath
+ |  --hiveconf conf1=conftest
+ |  --hiveconf conf2=1
--- End diff --

yes, it works, that's intention, right?

But seems the below code in `SparkSQLCliDriver` will not work as we 
expected.
```scala
  if (key != "javax.jdo.option.ConnectionURL") {
conf.set(key, value)
sessionState.getOverriddenConfigurations.put(key, value)
  }
```

Why do we have to ignore the connection url? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13629: [SPARK-15370][SQL] Fix count bug

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13629
  
**[Test build #60375 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60375/consoleFull)**
 for PR 13629 at commit 
[`30dd0bd`](https://github.com/apache/spark/commit/30dd0bd7d560151085e53667fcc4f6a8895844ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13629: [SPARK-15370][SQL] Fix count bug

2016-06-12 Thread hvanhovell

GitHub user hvanhovell opened a pull request:

https://github.com/apache/spark/pull/13629

[SPARK-15370][SQL] Fix count bug

# What changes were proposed in this pull request?
This pull request fixes the COUNT bug in the 
`RewriteCorrelatedScalarSubquery` rule.

After this change, the rule tests the expression at the root of the 
correlated subquery to determine whether the expression returns `NULL` on empty 
input. If the expression does not return `NULL`, the rule generates additional 
logic in the `Project` operator above the rewritten subquery. This additional 
logic intercepts `NULL` values coming from the outer join and replaces them 
with the value that the subquery's expression would return on empty input.

This PR is a takes over https://github.com/apache/spark/pull/13155, and it 
only fixes an issue with `Literal` construction and some style.  All credits 
should go @frreiss.

# How was this patch tested?
Added regression tests to cover all branches of the updated rule (see 
changes to `SubquerySuite`).
Ran all existing automated regression tests after merging with latest trunk.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hvanhovell/spark SPARK-15370-cleanup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13629.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13629


commit 3b1649105869c72ccb16f86732e04829aaae0e93
Author: frreiss 
Date:   2016-05-16T17:58:00Z

Commit before merge.

commit 58df60d5468e53c4b6fc41a1d7c896abfb01cdd1
Author: frreiss 
Date:   2016-05-16T17:58:21Z

Merge branch 'master' of https://github.com/apache/spark

commit 910cbf54e2300a57640e017610c204da2d462964
Author: frreiss 
Date:   2016-05-16T20:46:55Z

Merge branch 'master' of https://github.com/apache/spark

commit 76d9f4528b8536d1e5680279ab76b9e26dd3a873
Author: frreiss 
Date:   2016-05-17T14:52:46Z

Merge branch 'master' of https://github.com/apache/spark

commit 1615d560310a59b08a4c03677dd53eb3b9b49e06
Author: frreiss 
Date:   2016-05-20T02:01:33Z

Second version of the updated rewrite

commit 1b4ba5ed629d9b1e72d919d89b3592f7b29f3f3c
Author: frreiss 
Date:   2016-05-20T14:57:24Z

Merge branch 'master' of https://github.com/apache/spark

commit fb7cb4304ba02815a79278d1d5d6d194fe8db25c
Author: frreiss 
Date:   2016-05-24T18:11:54Z

Merge branch 'master' of https://github.com/apache/spark

commit 8cd2877179dded4557c8da92e5b16011637289b0
Author: frreiss 
Date:   2016-06-10T05:02:47Z

Addressing additional corner cases and review comments.

commit e5c592032b5604a8f8f10326ecd10ade22b5dc43
Author: Herman van Hovell 
Date:   2016-06-12T23:43:30Z

Style fixes

commit 39f7e043c0abbe27823499699877e986f6fa2eb7
Author: Herman van Hovell 
Date:   2016-06-12T23:43:32Z

Merge remote-tracking branch 'apache-github/master' into SPARK-15370-cleanup

commit 30dd0bd7d560151085e53667fcc4f6a8895844ed
Author: Herman van Hovell 
Date:   2016-06-12T23:57:18Z

Some simplification




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12313: [SPARK-14543] [SQL] Improve InsertIntoTable column resol...

2016-06-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/12313
  
If this is too large for merging to 2.0, could @rdblue deliver a small fix 
for capturing the illegal user inputs? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13628: [SPARK-15907] [SQL] Issue Exceptions when Not Enough Inp...

2016-06-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13628
  
There is a better fix in https://github.com/apache/spark/pull/12313. Let me 
close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13628: [SPARK-15907] [SQL] Issue Exceptions when Not Eno...

2016-06-12 Thread gatorsmile

Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/13628


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyze...

2016-06-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13496#discussion_r66730126
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -452,6 +452,17 @@ class Analyzer(
 
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+// A partitioned relation's schema can be different from the input 
logicalPlan, since
+// partition columns are all moved after data columns. We Project 
to adjust the ordering.
+val input = if (parts.nonEmpty) {
+  val (inputPartCols, inputDataCols) = child.output.partition { 
attr =>
+parts.contains(attr.name)
+  }
+  Project(inputDataCols ++ inputPartCols, child)
+} else {
+  child
+}
--- End diff --

@gatorsmile  good catch! The reason we have `insertInto` is to have a SQL 
INSERT INTO version in `DataFrameWriter`. We should use `saveAsTable` if we 
need by-name resolution.

I have reverted this PR, @viirya do you mind open a new PR to also remove 
this logic in `insertInto` to make it consistent with SQL version? thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13626: [SPARK-15370][SQL] Revert PR "Update RewriteCorrelatedSu...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13626
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13626: [SPARK-15370][SQL] Revert PR "Update RewriteCorrelatedSu...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13626
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60372/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13626: [SPARK-15370][SQL] Revert PR "Update RewriteCorrelatedSu...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13626
  
**[Test build #60372 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60372/consoleFull)**
 for PR 13626 at commit 
[`ebef12a`](https://github.com/apache/spark/commit/ebef12ad77084ff40db8601cd269f67778de293a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13604: [SPARK-15898][SQL] DataFrameReader.text should return Da...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13604
  
**[Test build #60374 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60374/consoleFull)**
 for PR 13604 at commit 
[`50538b7`](https://github.com/apache/spark/commit/50538b7c0952f6954a19402423e12349037b130c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13628: [SPARK-15907] [SQL] Issue Exceptions when Not Enough Inp...

2016-06-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13628
  
**[Test build #60373 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60373/consoleFull)**
 for PR 13628 at commit 
[`4c49112`](https://github.com/apache/spark/commit/4c4911226136bd797ed17955e795615f9c145de8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13614: Support Stata-like tabulation of values in a single colu...

2016-06-12 Thread shafiquejamal

Github user shafiquejamal commented on the issue:

https://github.com/apache/spark/pull/13614
  
Ok, I'll close this an open a new PR with the title done correctly. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13628: [SPARK-15907] [SQL] Issue Exceptions when Not Eno...

2016-06-12 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/13628

[SPARK-15907] [SQL] Issue Exceptions when Not Enough Input Columns for 
Dynamic Partitioning

 What changes were proposed in this pull request?
```SQL
CREATE TABLE table_with_partition(c1 string)
PARTITIONED by (p1 string,p2 string)

INSERT OVERWRITE TABLE table_with_partition
partition (p1='a',p2) IF NOT EXISTS
SELECT 'blarr3'
```

In the above example, we do not have enough input columns for dynamic 
partitioning. The first input column is already taken as data columns. This PR 
is to issue an exception in this scenario.  

 How was this patch tested?
Added a test case and fixed an existing test case

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark dynamicPartitioningException

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13628.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13628


commit 4c4911226136bd797ed17955e795615f9c145de8
Author: gatorsmile 
Date:   2016-06-12T23:35:07Z

bug fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13596: [SPARK-15870][SQL] DataFrame can't execute after uncache...

2016-06-12 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13596
  
thanks, merging to master and 2.0!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13596: [SPARK-15870][SQL] DataFrame can't execute after ...

2016-06-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13596


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyze...

2016-06-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13496#discussion_r66729871
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -452,6 +452,17 @@ class Analyzer(
 
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+// A partitioned relation's schema can be different from the input 
logicalPlan, since
+// partition columns are all moved after data columns. We Project 
to adjust the ordering.
+val input = if (parts.nonEmpty) {
+  val (inputPartCols, inputDataCols) = child.output.partition { 
attr =>
+parts.contains(attr.name)
+  }
+  Project(inputDataCols ++ inputPartCols, child)
+} else {
+  child
+}
--- End diff --

If we use the name based resolution, we also need to check if the all the 
input columns have the expected partitioning names; Otherwise, the result will 
be not predictable. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13413: [SPARK-15663][SQL] SparkSession.catalog.listFunct...

2016-06-12 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13413#discussion_r66729858
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -89,6 +89,10 @@ class SimpleFunctionRegistry extends FunctionRegistry {
 functionBuilders.iterator.map(_._1).toList.sorted
   }
 
+  private[catalyst] def functionSet(): Set[String] = synchronized {
+functionBuilders.iterator.map(_._1).toSet
--- End diff --

Seems we are still creating the set every time when we call 
`FunctionRegistry.builtin.functionSet`. Can you create this set in the object 
of `FunctionRegistry`?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyze...

2016-06-12 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13496#discussion_r66729833
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -452,6 +452,17 @@ class Analyzer(
 
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+// A partitioned relation's schema can be different from the input 
logicalPlan, since
+// partition columns are all moved after data columns. We Project 
to adjust the ordering.
+val input = if (parts.nonEmpty) {
+  val (inputPartCols, inputDataCols) = child.output.partition { 
attr =>
+parts.contains(attr.name)
+  }
+  Project(inputDataCols ++ inputPartCols, child)
+} else {
+  child
+}
--- End diff --

Looks like Hive uses ordering not name to take dynamic partition columns. I 
am not sure if we want to completely follow this Hive behavior. 
DataFrameWriter's insertInto doesn't follow this. Besides, the rule in Analyzer 
is not completely follow this too.

@liancheng @rxin @cloud-fan What do you think? Do you think we should 
change current behavior to follow Hive? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13627: [SPARK-15906][MLlib][WIP] Add complementary naive bayes ...

2016-06-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13627
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13627: [SPARK-15906][MLlib][WIP] Add complementary naive...

2016-06-12 Thread tilumi

GitHub user tilumi opened a pull request:

https://github.com/apache/spark/pull/13627

[SPARK-15906][MLlib][WIP] Add complementary naive bayes algorithm

## What changes were proposed in this pull request?

Add `ComplementaryNaiveBayes.scala` in package 
`org.apache.spark.mllib.classification` in MLlib module


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tilumi/spark add_complementary_navie_bayes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13627.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13627


commit 4f67cd12364a830fa579e443c716dd09a9f13f8a
Author: Lucas Yang 
Date:   2016-06-11T04:37:55Z

extract data aggregattion part in run as method

commit d9f6191676c5c2253dcc6983e4418bcb67cf02b9
Author: Lucas Yang 
Date:   2016-06-11T04:38:18Z

add complementary naive bayes algorithm

commit 0f02643db606944e2f919ebeaae427efb45515b7
Author: Lucas Yang 
Date:   2016-06-12T23:03:09Z

add Since annotation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyze...

2016-06-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13496#discussion_r66729456
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -452,6 +452,17 @@ class Analyzer(
 
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+// A partitioned relation's schema can be different from the input 
logicalPlan, since
+// partition columns are all moved after data columns. We Project 
to adjust the ordering.
+val input = if (parts.nonEmpty) {
+  val (inputPartCols, inputDataCols) = child.output.partition { 
attr =>
+parts.contains(attr.name)
+  }
+  Project(inputDataCols ++ inputPartCols, child)
+} else {
+  child
+}
--- End diff --

If the dynamic partitioning columns have multiple columns, the name-based 
reordering becomes risky. Some partitioning columns might not have names/alias. 
Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyze...

2016-06-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13496#discussion_r66729419
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -452,6 +452,17 @@ class Analyzer(
 
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+// A partitioned relation's schema can be different from the input 
logicalPlan, since
+// partition columns are all moved after data columns. We Project 
to adjust the ordering.
+val input = if (parts.nonEmpty) {
+  val (inputPartCols, inputDataCols) = child.output.partition { 
attr =>
+parts.contains(attr.name)
+  }
+  Project(inputDataCols ++ inputPartCols, child)
+} else {
+  child
+}
--- End diff --

The names/alias of input columns are not used to determine whether they are 
the partitioning columns or data columns. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyze...

2016-06-12 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13496#discussion_r66729375
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -452,6 +452,17 @@ class Analyzer(
 
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
+// A partitioned relation's schema can be different from the input 
logicalPlan, since
+// partition columns are all moved after data columns. We Project 
to adjust the ordering.
+val input = if (parts.nonEmpty) {
+  val (inputPartCols, inputDataCols) = child.output.partition { 
attr =>
+parts.contains(attr.name)
+  }
+  Project(inputDataCols ++ inputPartCols, child)
+} else {
+  child
+}
--- End diff --

@gatorsmile Your example confuses me. As the spec you cited, the dynamic 
partition columns should be last but you put it in the first?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13570: [SPARK-15832][SQL] Embedded IN/EXISTS predicate subquery...

2016-06-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13570
  
@hvanhovell @jkbradley  Could you add @ioana-delaney to the whitelist? 
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 238 matches

Mail list logo