date:20170202

[GitHub] spark issue #16785: [SPARK-19443][SQL] The function to generate constraints ...

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16785
  
**[Test build #72303 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72303/testReport)**
 for PR 16785 at commit 
[`b4e514a`](https://github.com/apache/spark/commit/b4e514ade7ea478055db448bbf66f7a88caf3a86).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-02 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16775
  
For the issue reported on mailing list, I found the root cause makes 
significant difference between 1.6 and current branch. The fix is at #16785.

However, I think this patch is still useful. So I keep it open for a while 
for reviewers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16785: [SPARK-19443][SQL] The function to generate const...

2017-02-02 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/16785

[SPARK-19443][SQL] The function to generate constraints takes too long when 
the query plan grows continuously

## What changes were proposed in this pull request?

This issue is originally reported and discussed at 
http://apache-spark-developers-list.1001551.n3.nabble.com/SQL-ML-Pipeline-performance-regression-between-1-6-and-2-x-tc20803.html#a20821

When run a ML `Pipeline` with many stages, during the iterative updating to 
`Dataset` , it is observed the it takes longer time to finish the fit and 
transform as the query plan grows continuously.

The example code show as the following in benchmark.

Specially, the time spent on preparing optimized plan in current branch 
(74294 ms) is much higher than 1.6 (292 ms). Actually, the time is spent mostly 
on generating query plan's constraints during few optimization rules.

`getAliasedConstraints` is found to be a function costing most of the 
running time.

This patch tries to rewrite `getAliasedConstraints`. After this patch, the 
time to preparing optimized plan is reduced significantly from 74294 ms to 2573 
ms.

### Benchmark

Run the following codes locally.

import org.apache.spark.ml.{Pipeline, PipelineStage}
import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer, 
VectorAssembler}

val df = (1 to 40).foldLeft(Seq((1, "foo"), (2, "bar"), (3, 
"baz")).toDF("id", "x0"))((df, i) => df.withColumn(s"x$i", $"x0"))

val indexers = df.columns.tail.map(c => new StringIndexer()
  .setInputCol(c)
  .setOutputCol(s"${c}_indexed")
  .setHandleInvalid("skip"))

val encoders = indexers.map(indexer => new OneHotEncoder()
  .setInputCol(indexer.getOutputCol)
  .setOutputCol(s"${indexer.getOutputCol}_encoded")
  .setDropLast(true))

val stages: Array[PipelineStage] = indexers ++ encoders
val pipeline = new Pipeline().setStages(stages)

pipeline.fit(df).transform(df).show


## How was this patch tested?

Jenkins tests.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 improve-constraints-generation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16785.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16785


commit b4e514ade7ea478055db448bbf66f7a88caf3a86
Author: Liang-Chi Hsieh 
Date:   2017-02-03T07:08:47Z

Improve the code to generate constraints.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99286995
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -117,6 +134,34 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   data: _*)
   }
 
+  test("coalesce, custom") {
+
+val maxSplitSize = 512
+// Similar to the implementation of `test("custom RDD coalescer")` 
from [[RDDSuite]] we first
+// write out to disk, to ensure that our splits are in fact 
[[FileSplit]] instances.
+val data = (1 to 1000).map(i => ClassData(i.toString, i))
+data.toDS().repartition(10).write.format("csv").save(path.toString)
+
+val ds = spark.read.format("csv").load(path.toString).as[ClassData]
+val coalescedDataSet =
+  ds.coalesce(2, partitionCoalescer = Option(new 
SizeBasedCoalescer(maxSplitSize)))
+
+assert(coalescedDataSet.rdd.partitions.length <= 10)
+
+var totalPartitionCount = 0L
+coalescedDataSet.rdd.partitions.foreach(partition => {
+  var splitSizeSum = 0L
+  
partition.asInstanceOf[CoalescedRDDPartition].parents.foreach(partition => {
+val split = 
partition.asInstanceOf[HadoopPartition].inputSplit.value.asInstanceOf[FileSplit]
+splitSizeSum += split.getLength
+totalPartitionCount += 1
+  })
+  assert(splitSizeSum <= maxSplitSize)
+})
+assert(totalPartitionCount == 10)
+
--- End diff --

Nit: Remove this empty line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99286957
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -17,24 +17,41 @@
 
 package org.apache.spark.sql
 
-import java.io.{Externalizable, ObjectInput, ObjectOutput}
+import java.io.{Externalizable, File, ObjectInput, ObjectOutput}
 import java.sql.{Date, Timestamp}
 
+import org.apache.hadoop.mapred.FileSplit
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.rdd.{CoalescedRDDPartition, HadoopPartition, 
SizeBasedCoalescer}
 import org.apache.spark.sql.catalyst.encoders.{OuterScopes, RowEncoder}
 import org.apache.spark.sql.catalyst.util.sideBySide
-import org.apache.spark.sql.execution.{LogicalRDD, RDDScanExec, SortExec}
+import org.apache.spark.sql.execution.{LogicalRDD, RDDScanExec}
 import org.apache.spark.sql.execution.exchange.{BroadcastExchangeExec, 
ShuffleExchange}
 import org.apache.spark.sql.execution.streaming.MemoryStream
 import org.apache.spark.sql.functions._
 import org.apache.spark.sql.test.SharedSQLContext
 import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
 
 case class TestDataPoint(x: Int, y: Double, s: String, t: TestDataPoint2)
 case class TestDataPoint2(x: Int, s: String)
 
-class DatasetSuite extends QueryTest with SharedSQLContext {
+class DatasetSuite extends QueryTest with SharedSQLContext with 
BeforeAndAfter {
   import testImplicits._
 
+  private var path: File = null
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+path = Utils.createTempDir()
+path.delete()
+  }
+
+  after {
+Utils.deleteRecursively(path)
+  }
--- End diff --

No need to do it, if you use `withTempPath`. 
[This](https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala#L247-L265)
 is an example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99286805
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -117,6 +134,34 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   data: _*)
   }
 
+  test("coalesce, custom") {
+
+val maxSplitSize = 512
+// Similar to the implementation of `test("custom RDD coalescer")` 
from [[RDDSuite]] we first
+// write out to disk, to ensure that our splits are in fact 
[[FileSplit]] instances.
+val data = (1 to 1000).map(i => ClassData(i.toString, i))
+data.toDS().repartition(10).write.format("csv").save(path.toString)
--- End diff --

use `WithPath` to generate the path?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16779: [SPARK-19437] Rectify spark executor id in Heartb...

2017-02-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16779


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99286475
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -117,6 +134,34 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   data: _*)
   }
 
+  test("coalesce, custom") {
+
+val maxSplitSize = 512
+// Similar to the implementation of `test("custom RDD coalescer")` 
from [[RDDSuite]] we first
+// write out to disk, to ensure that our splits are in fact 
[[FileSplit]] instances.
+val data = (1 to 1000).map(i => ClassData(i.toString, i))
+data.toDS().repartition(10).write.format("csv").save(path.toString)
+
+val ds = spark.read.format("csv").load(path.toString).as[ClassData]
--- End diff --

```
cannot resolve '`a`' given input columns: [_c0, _c1];
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16779: [SPARK-19437] Rectify spark executor id in HeartbeatRece...

2017-02-02 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16779
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99286218
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -497,7 +496,9 @@ case class UnionExec(children: Seq[SparkPlan]) extends 
SparkPlan {
  * if you go from 1000 partitions to 100 partitions, there will not be a 
shuffle, instead each of
  * the 100 new partitions will claim 10 of the current partitions.
  */
-case class CoalesceExec(numPartitions: Int, child: SparkPlan) extends 
UnaryExecNode {
+case class CoalesceExec(numPartitions: Int, child: SparkPlan,
+partitionCoalescer: Option[PartitionCoalescer]
+   ) extends UnaryExecNode {
--- End diff --

```
case class CoalesceExec(
numPartitions: Int,
child: SparkPlan,
partitionCoalescer: Option[PartitionCoalescer]) extends UnaryExecNode {
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99286066
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -19,9 +19,8 @@ package org.apache.spark.sql.execution
 
 import scala.concurrent.{ExecutionContext, Future}
 import scala.concurrent.duration.Duration
-
--- End diff --

Add it back?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSVCSuite

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16784
  
**[Test build #72302 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72302/testReport)**
 for PR 16784 at commit 
[`fc1f7d1`](https://github.com/apache/spark/commit/fc1f7d10134638dfe5130eb19784852207acebd5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99285902
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -823,6 +825,17 @@ case class Repartition(numPartitions: Int, shuffle: 
Boolean, child: LogicalPlan)
 }
 
 /**
+ * Returns a new RDD that has exactly `numPartitions` partitions.
--- End diff --

This description is not right. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99285925
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -823,6 +825,17 @@ case class Repartition(numPartitions: Int, shuffle: 
Boolean, child: LogicalPlan)
 }
 
 /**
+ * Returns a new RDD that has exactly `numPartitions` partitions.
+ */
+case class CoalesceLogical(numPartitions: Int, partitionCoalescer: 
Option[PartitionCoalescer],
+child: LogicalPlan)
+  extends UnaryNode {
--- End diff --

```Scala
case class PartitionCoalesce(
numPartitions: Int,
partitionCoalescer: Option[PartitionCoalescer],
child: LogicalPlan) extends UnaryNode {
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99285876
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -823,6 +825,17 @@ case class Repartition(numPartitions: Int, shuffle: 
Boolean, child: LogicalPlan)
 }
 
 /**
+ * Returns a new RDD that has exactly `numPartitions` partitions.
+ */
+case class CoalesceLogical(numPartitions: Int, partitionCoalescer: 
Option[PartitionCoalescer],
--- End diff --

The name still looks inconsistent with the others. How about 
`PartitionCoalesce`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSV...

2017-02-02 Thread wangmiao1981

GitHub user wangmiao1981 opened a pull request:

https://github.com/apache/spark/pull/16784

[SPARK-19382][ML]:Test sparse vectors in LinearSVCSuite

## What changes were proposed in this pull request?

Add unit tests for testing SparseVector.

We can't add mixed DenseVector and SparseVector test case, as discussed in 
JIRA 19382.

 def merge(other: MultivariateOnlineSummarizer): this.type = {
if (this.totalWeightSum != 0.0 && other.totalWeightSum != 0.0) {
require(n == other.n, s"Dimensions mismatch when merging with another 
summarizer. " +
s"Expecting $n but got $
{other.n}

.")


## How was this patch tested?

Unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangmiao1981/spark bk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16784.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16784


commit 85336d1ecec425906356c09b5aa347288f7282bc
Author: wm...@hotmail.com 
Date:   2017-01-31T23:10:09Z

unit test backup

commit fc1f7d10134638dfe5130eb19784852207acebd5
Author: wm...@hotmail.com 
Date:   2017-02-03T07:06:55Z

add SparseVector test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99285447
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -17,7 +17,9 @@
 
 package org.apache.spark.sql.catalyst.plans.logical
 
+import org.apache.spark.rdd.PartitionCoalescer
 import org.apache.spark.sql.catalyst.{CatalystConf, TableIdentifier}
+import scala.collection.mutable.ArrayBuffer
--- End diff --

Useless?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99284849
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -497,7 +496,9 @@ case class UnionExec(children: Seq[SparkPlan]) extends 
SparkPlan {
  * if you go from 1000 partitions to 100 partitions, there will not be a 
shuffle, instead each of
  * the 100 new partitions will claim 10 of the current partitions.
  */
-case class CoalesceExec(numPartitions: Int, child: SparkPlan) extends 
UnaryExecNode {
+case class CoalesceExec(numPartitions: Int, child: SparkPlan,
+partitionCoalescer: Option[PartitionCoalescer]
+   ) extends UnaryExecNode {
--- End diff --

The same indent issue here. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r99284809
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2437,9 +2435,12 @@ class Dataset[T] private[sql](
* @group typedrel
* @since 1.6.0
*/
-  def coalesce(numPartitions: Int): Dataset[T] = withTypedPlan {
-Repartition(numPartitions, shuffle = false, logicalPlan)
-  }
+  def coalesce(numPartitions: Int, partitionCoalescer: 
Option[PartitionCoalescer]): Dataset[T] =
+withTypedPlan {
+  CoalesceLogical(numPartitions, partitionCoalescer, logicalPlan)
+}
+
+  def coalesce(numPartitions: Int): Dataset[T] = coalesce(numPartitions, 
None)
--- End diff --

Please also add the function description, like what we did in the other 
functions in Dataset.scala?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16765: [SPARK-19425][SQL] Make ExtractEquiJoinKeys support UDT ...

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16765
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16777: [SPARK-19435][SQL] Type coercion between ArrayTyp...

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16777#discussion_r99283834
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -101,24 +101,13 @@ object TypeCoercion {
 case _ => None
   }
 
-  /** Similar to [[findTightestCommonType]], but can promote all the way 
to StringType. */
-  def findTightestCommonTypeToString(left: DataType, right: DataType): 
Option[DataType] = {
-findTightestCommonTypeOfTwo(left, right).orElse((left, right) match {
-  case (StringType, t2: AtomicType) if t2 != BinaryType && t2 != 
BooleanType => Some(StringType)
-  case (t1: AtomicType, StringType) if t1 != BinaryType && t1 != 
BooleanType => Some(StringType)
-  case _ => None
-})
-  }
-
   /**
-   * Find the tightest common type of a set of types by continuously 
applying
-   * `findTightestCommonTypeOfTwo` on these types.
+   * Promotes all the way to StringType.
*/
-  private def findTightestCommonType(types: Seq[DataType]): 
Option[DataType] = {
--- End diff --

It becomes harder for reviewers to read this PR. Could you submit a 
separate PR for code cleaning? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16138: [SPARK-16609] Add to_date/to_timestamp with format funct...

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16138
  
**[Test build #72301 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72301/testReport)**
 for PR 16138 at commit 
[`a2d0221`](https://github.com/apache/spark/commit/a2d0221501eebc18e8520a58e1e1cd6bd80a02c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16138: [SPARK-16609] Add to_date/to_timestamp with forma...

2017-02-02 Thread anabranch

Github user anabranch commented on a diff in the pull request:

https://github.com/apache/spark/pull/16138#discussion_r99283708
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1047,6 +1048,64 @@ case class ToDate(child: Expression) extends 
UnaryExpression with ImplicitCastIn
 }
 
 /**
+ * Parses a column to a date based on the given format.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(date_str, fmt) - Parses the `left` expression with the 
`fmt` expression. Returns null with invalid input.",
+  extended = """
+Examples:
+  > SELECT _FUNC_('2016-12-31', '-MM-dd');
+   2016-12-31
+  """)
+// scalastyle:on line.size.limit
+case class ParseToDate(left: Expression, format: Expression, child: 
Expression)
+  extends RuntimeReplaceable {
+
+  def this(left: Expression, format: Expression) = {
+this(left, format, Cast(Cast(new UnixTimestamp(left, format), 
TimestampType), DateType))
+  }
+
+  def this(left: Expression) = {
+// RuntimeReplaceable forces the signature, the second value
+// is ignored completely
+this(left, Literal(""), ToDate(left))
+  }
+
+  override def flatArguments: Iterator[Any] = Iterator(left, format)
+  override def sql: String = s"$prettyName(${left.sql}, ${format.sql})"
--- End diff --

Fixed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16779: [SPARK-19437] Rectify spark executor id in HeartbeatRece...

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16779
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16779: [SPARK-19437] Rectify spark executor id in HeartbeatRece...

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16779
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72297/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16779: [SPARK-19437] Rectify spark executor id in HeartbeatRece...

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16779
  
**[Test build #72297 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72297/testReport)**
 for PR 16779 at commit 
[`a9bc3f4`](https://github.com/apache/spark/commit/a9bc3f47b9cd08f309c00c159bc0e1e6a6c6e763).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16740
  
@seth @imatiach-msft Let me know if there is any other changes needed. 
Thanks much for your review! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16765: [SPARK-19425][SQL] Make ExtractEquiJoinKeys support UDT ...

2017-02-02 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16765
  
@gatorsmile Updated. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16740
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16740
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72299/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16740
  
**[Test build #72299 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72299/testReport)**
 for PR 16740 at commit 
[`b57af08`](https://github.com/apache/spark/commit/b57af08f792a59438452a3cef070e16ef51316b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16783: [SPARK-19441] [SQL] Remove IN type coercion from Promote...

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16783
  
**[Test build #72300 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72300/testReport)**
 for PR 16783 at commit 
[`127a114`](https://github.com/apache/spark/commit/127a114801197e0927a5484a9fdb7b8ee93db22b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16783: [SPARK-19441] [SQL] Remove IN type coercion from ...

2017-02-02 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/16783

[SPARK-19441] [SQL] Remove IN type coercion from PromoteStrings

### What changes were proposed in this pull request?
The removed codes are not reachable, because `InConversion` already resolve 
the type coercion issues. 

### How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark typeCoercionIn

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16783.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16783


commit bd2d1e7ac3e995331c1eea2630c32c3c4f32
Author: gatorsmile 
Date:   2017-02-03T04:40:43Z

Merge remote-tracking branch 'upstream/master' into typeCoercionIn

commit 127a114801197e0927a5484a9fdb7b8ee93db22b
Author: gatorsmile 
Date:   2017-02-03T04:50:24Z

fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16740
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72298/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16740
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16740
  
**[Test build #72298 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72298/testReport)**
 for PR 16740 at commit 
[`931f7ec`](https://github.com/apache/spark/commit/931f7ecceff7a0cb0c1870af7e69d38454078c52).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14702
  
I will try to review it in the next few days. Thanks for working on it! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16765: [SPARK-19425][SQL] Make df.except work for UDT

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16765
  
Could you update the PR description and title? This PR fixes three 
scenarios:
- `except`on  two Datasets with UDT
- `intersect` on two Datasets with UDT
-  `Join` with the join conditions using `<=>` on UDT columns


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99279786
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
@@ -351,6 +370,36 @@ class DecisionTreeClassifierSuite
 dt.fit(df)
   }
 
+  test("training with sample weights") {
+val df = linearMulticlassDataset
+val numClasses = 3
+val predEquals = (x: Double, y: Double) => x == y
+// (impurity, maxDepth)
+val testParams = Seq(
+  ("gini", 10),
+  ("entropy", 10),
+  ("gini", 5)
+)
+for ((impurity, maxDepth) <- testParams) {
+  val estimator = new DecisionTreeClassifier()
+.setMaxDepth(maxDepth)
+.setSeed(seed)
+.setMinWeightFractionPerNode(0.049)
--- End diff --

maybe also add test to validate that an invalid minWeightFraction will 
throw an exception


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99279066
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/tree/ImpuritySuite.scala ---
@@ -18,23 +18,62 @@
 package org.apache.spark.mllib.tree
 
 import org.apache.spark.SparkFunSuite
-import org.apache.spark.mllib.tree.impurity.{EntropyAggregator, 
GiniAggregator}
+import org.apache.spark.ml.util.TestingUtils._
+import org.apache.spark.mllib.tree.impurity._
 
 /**
  * Test suites for [[GiniAggregator]] and [[EntropyAggregator]].
  */
 class ImpuritySuite extends SparkFunSuite {
+
+  private val seed = 42
+
   test("Gini impurity does not support negative labels") {
 val gini = new GiniAggregator(2)
 intercept[IllegalArgumentException] {
-  gini.update(Array(0.0, 1.0, 2.0), 0, -1, 0.0)
+  gini.update(Array(0.0, 1.0, 2.0), 0, -1, 3, 0.0)
 }
   }
 
   test("Entropy does not support negative labels") {
 val entropy = new EntropyAggregator(2)
 intercept[IllegalArgumentException] {
-  entropy.update(Array(0.0, 1.0, 2.0), 0, -1, 0.0)
+  entropy.update(Array(0.0, 1.0, 2.0), 0, -1, 3, 0.0)
+}
+  }
+
+  test("Classification impurities are insensitive to scaling") {
+val rng = new scala.util.Random(seed)
+val weightedCounts = Array.fill(5)(rng.nextDouble())
+val smallWeightedCounts = weightedCounts.map(_ * 0.0001)
+val largeWeightedCounts = weightedCounts.map(_ * 1)
+Seq(Gini, Entropy).foreach { impurity =>
+  val impurity1 = impurity.calculate(weightedCounts, 
weightedCounts.sum)
+  assert(impurity.calculate(smallWeightedCounts, 
smallWeightedCounts.sum)
+~== impurity1 relTol 0.005)
+  assert(impurity.calculate(largeWeightedCounts, 
largeWeightedCounts.sum)
+~== impurity1 relTol 0.005)
 }
   }
+  test("Regression impurities are insensitive to scaling") {
+def computeStats(samples: Seq[Double], weights: Seq[Double]): (Double, 
Double, Double) = {
+  samples.zip(weights).foldLeft((0.0, 0.0, 0.0)) { case ((wn, wy, 
wyy), (y, w)) =>
+(wn + w, wy + w * y, wyy + w * y * y)
+  }
+}
+val rng = new scala.util.Random(seed)
+val samples = Array.fill(10)(rng.nextDouble())
+val _weights = Array.fill(10)(rng.nextDouble())
+val smallWeights = _weights.map(_ * 0.0001)
+val largeWeights = _weights.map(_ * 1)
+val (count, sum, sumSquared) = computeStats(samples, _weights)
+Seq(Variance).foreach { impurity =>
+  val impurity1 = impurity.calculate(count, sum, sumSquared)
+  val (smallCount, smallSum, smallSumSquared) = computeStats(samples, 
smallWeights)
+  val (largeCount, largeSum, largeSumSquared) = computeStats(samples, 
largeWeights)
+  assert(impurity.calculate(smallCount, smallSum, smallSumSquared) ~== 
impurity1 relTol 0.005)
+  assert(impurity.calculate(largeCount, largeSum, largeSumSquared) ~== 
impurity1 relTol 0.005)
--- End diff --

these are really nice tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99278975
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
@@ -281,10 +283,26 @@ object MLTestingUtils extends SparkFunSuite {
   estimator: E with HasWeightCol,
   modelEquals: (M, M) => Unit): Unit = {
 estimator.set(estimator.weightCol, "weight")
-val models = Seq(0.001, 1.0, 1000.0).map { w =>
+val models = Seq(0.01, 1.0, 1000.0).map { w =>
--- End diff --

was there a specific reason to change the weight here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99278910
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala ---
@@ -124,8 +129,8 @@ private[ml] object TreeTests extends SparkFunSuite {
*   make mistakes such as creating loops of Nodes.
*/
   private def checkEqual(a: Node, b: Node): Unit = {
-assert(a.prediction === b.prediction)
-assert(a.impurity === b.impurity)
+assert(a.prediction ~== b.prediction absTol 1e-8)
+assert(a.impurity ~== b.impurity absTol 1e-8)
--- End diff --

can the tolerances be moved to a constant?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16138: [SPARK-16609] Add to_date/to_timestamp with forma...

2017-02-02 Thread anabranch

Github user anabranch commented on a diff in the pull request:

https://github.com/apache/spark/pull/16138#discussion_r99278789
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -1177,6 +1177,9 @@ test_that("column functions", {
   c17 <- cov(c, c1) + cov("c", "c1") + covar_samp(c, c1) + covar_samp("c", 
"c1")
   c18 <- covar_pop(c, c1) + covar_pop("c", "c1")
   c19 <- spark_partition_id()
+  c20 <- to_timestamp(c) + trim(c) + unbase64(c) + unhex(c) + upper(c)
+  c21 <- to_timestamp(c, "") + trim(c) + unbase64(c) + unhex(c) + 
upper(c)
+  c22 <- to_date(c, "") + trim(c) + unbase64(c) + unhex(c) + upper(c)
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16138: [SPARK-16609] Add to_date/to_timestamp with forma...

2017-02-02 Thread anabranch

Github user anabranch commented on a diff in the pull request:

https://github.com/apache/spark/pull/16138#discussion_r99278746
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1047,6 +1048,64 @@ case class ToDate(child: Expression) extends 
UnaryExpression with ImplicitCastIn
 }
 
 /**
+ * Parses a column to a date based on the given format.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(date_str, fmt) - Parses the `left` expression with the 
`fmt` expression. Returns null with invalid input.",
+  extended = """
+Examples:
+  > SELECT _FUNC_('2016-12-31', '-MM-dd');
+   2016-12-31
+  """)
+// scalastyle:on line.size.limit
+case class ParseToDate(left: Expression, format: Expression, child: 
Expression)
+  extends RuntimeReplaceable {
+
+  def this(left: Expression, format: Expression) = {
+this(left, format, Cast(Cast(new UnixTimestamp(left, format), 
TimestampType), DateType))
+  }
+
+  def this(left: Expression) = {
+// RuntimeReplaceable forces the signature, the second value
+// is ignored completely
+this(left, Literal(""), ToDate(left))
+  }
+
+  override def flatArguments: Iterator[Any] = Iterator(left, format)
+  override def sql: String = s"$prettyName(${left.sql}, ${format.sql})"
+
+  override def prettyName: String = "to_date"
+  override def dataType: DataType = DateType
--- End diff --

Removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16138: [SPARK-16609] Add to_date/to_timestamp with forma...

2017-02-02 Thread anabranch

Github user anabranch commented on a diff in the pull request:

https://github.com/apache/spark/pull/16138#discussion_r99278738
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1047,6 +1048,64 @@ case class ToDate(child: Expression) extends 
UnaryExpression with ImplicitCastIn
 }
 
 /**
+ * Parses a column to a date based on the given format.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(date_str, fmt) - Parses the `left` expression with the 
`fmt` expression. Returns null with invalid input.",
+  extended = """
+Examples:
+  > SELECT _FUNC_('2016-12-31', '-MM-dd');
+   2016-12-31
+  """)
+// scalastyle:on line.size.limit
+case class ParseToDate(left: Expression, format: Expression, child: 
Expression)
--- End diff --

I don't really understand this feedback. This is how I saw other 
`RuntimeReplaceable` expressions created.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16138: [SPARK-16609] Add to_date/to_timestamp with forma...

2017-02-02 Thread anabranch

Github user anabranch commented on a diff in the pull request:

https://github.com/apache/spark/pull/16138#discussion_r99278624
  
--- Diff: R/pkg/R/functions.R ---
@@ -1746,7 +1750,7 @@ setMethod("toRadians",
 #' to_date(df$c)
 #' to_date(df$c, '-MM-dd')
 #' }
-#' @note to_date(Column, format) since 2.2.0
+#' @note to_date(Column) since 1.5
--- End diff --

fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99278201
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
@@ -58,6 +62,20 @@ class DecisionTreeClassifierSuite
 categoricalDataPointsForMulticlassForOrderedFeaturesRDD = 
sc.parallelize(
   
OldDecisionTreeSuite.generateCategoricalDataPointsForMulticlassForOrderedFeatures())
   .map(_.asML)
+linearMulticlassDataset = {
+  val nPoints = 100
+  val coefficients = Array(
+-0.57997, 0.912083, -0.371077,
+-0.16624, -0.84355, -0.048509)
+
+  val xMean = Array(5.843, 3.057)
+  val xVariance = Array(0.6856, 0.1899)
+
+  val testData = 
LogisticRegressionSuite.generateMultinomialLogisticInput(
+coefficients, xMean, xVariance, addIntercept = true, nPoints, 42)
--- End diff --

pass in seed instead of 42 here (at the end)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99278110
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Variance.scala ---
@@ -70,17 +70,24 @@ object Variance extends Impurity {
  * Note: Instances of this class do not hold the data; they operate on 
views of the data.
  */
 private[spark] class VarianceAggregator()
-  extends ImpurityAggregator(statsSize = 3) with Serializable {
+  extends ImpurityAggregator(statsSize = 4) with Serializable {
 
   /**
* Update stats for one (node, feature, bin) with the given label.
* @param allStats  Flat stats array, with stats for this (node, 
feature, bin) contiguous.
* @param offsetStart index of stats for this (node, feature, bin).
*/
-  def update(allStats: Array[Double], offset: Int, label: Double, 
instanceWeight: Double): Unit = {
+  def update(
+  allStats: Array[Double],
+  offset: Int,
+  label: Double,
+  numSamples: Int,
+  sampleWeight: Double): Unit = {
+val instanceWeight = numSamples * sampleWeight
 allStats(offset) += instanceWeight
 allStats(offset + 1) += instanceWeight * label
 allStats(offset + 2) += instanceWeight * label * label
+allStats(offset + 3) += numSamples
--- End diff --

could the statistics that this computes be added to either the class 
documentation or this method (the former preferred)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-02-02 Thread actuaryzhang

Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16740#discussion_r99277921
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
 ---
@@ -743,6 +744,48 @@ class GeneralizedLinearRegressionSuite
 }
   }
 
+  test("generalized linear regression: intercept only") {
+/*
+  R code:
+  y <- c(17, 19, 23, 29)
+  w <- c(1, 2, 3, 4)
+  model1 <- glm(y ~ 1, family = poisson)
+  model2 <- glm(y ~ 1, family = poisson, weights = w)
+  as.vector(c(coef(model1), coef(model2)))
+  [1] 3.091042 3.178054
+ */
+
+val dataset = Seq(
+  Instance(17.0, 1.0, Vectors.zeros(0)),
+  Instance(19.0, 2.0, Vectors.zeros(0)),
+  Instance(23.0, 3.0, Vectors.zeros(0)),
+  Instance(29.0, 4.0, Vectors.zeros(0))
+).toDF()
+
+val expected = Seq(3.091, 3.178)
+
+import GeneralizedLinearRegression._
+
+var idx = 0
+for (useWeight <- Seq(false, true)) {
+  val trainer = new GeneralizedLinearRegression().setFamily("poisson")
+  if (useWeight) trainer.setWeightCol("weight")
+  val model = trainer.fit(dataset)
+  val actual = model.intercept
+  assert(actual ~== expected(idx) absTol 1E-3, "Model mismatch: 
intercept only GLM with " +
+s"useWeight = $useWeight.")
+  assert(model.coefficients === new DenseVector(Array.empty[Double]))
+
+  idx += 1
+}
+
+// throw exception for empty model
+val trainer = new GeneralizedLinearRegression().setFitIntercept(false)
+intercept[SparkException] {
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99277799
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Gini.scala ---
@@ -80,23 +80,29 @@ object Gini extends Impurity {
  * @param numClasses  Number of classes for label.
  */
 private[spark] class GiniAggregator(numClasses: Int)
-  extends ImpurityAggregator(numClasses) with Serializable {
+  extends ImpurityAggregator(numClasses + 1) with Serializable {
 
   /**
* Update stats for one (node, feature, bin) with the given label.
* @param allStats  Flat stats array, with stats for this (node, 
feature, bin) contiguous.
* @param offsetStart index of stats for this (node, feature, bin).
*/
-  def update(allStats: Array[Double], offset: Int, label: Double, 
instanceWeight: Double): Unit = {
-if (label >= statsSize) {
+  def update(
+  allStats: Array[Double],
+  offset: Int,
+  label: Double,
+  numSamples: Int,
+  sampleWeight: Double): Unit = {
+if (label >= numClasses) {
   throw new IllegalArgumentException(s"GiniAggregator given label 
$label" +
--- End diff --

not related to this code review, but it seems a bit strange that each of 
these ImpurityAggregators have the same checks/bounds for label, I would have 
preferred the abstract base class to implement these instead, although it is 
nice to have a more specific error message


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16740
  
**[Test build #72299 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72299/testReport)**
 for PR 16740 at commit 
[`b57af08`](https://github.com/apache/spark/commit/b57af08f792a59438452a3cef070e16ef51316b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-02-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16740#discussion_r99276472
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
 ---
@@ -743,6 +744,48 @@ class GeneralizedLinearRegressionSuite
 }
   }
 
+  test("generalized linear regression: intercept only") {
+/*
+  R code:
+  y <- c(17, 19, 23, 29)
+  w <- c(1, 2, 3, 4)
+  model1 <- glm(y ~ 1, family = poisson)
+  model2 <- glm(y ~ 1, family = poisson, weights = w)
+  as.vector(c(coef(model1), coef(model2)))
+  [1] 3.091042 3.178054
+ */
+
+val dataset = Seq(
+  Instance(17.0, 1.0, Vectors.zeros(0)),
+  Instance(19.0, 2.0, Vectors.zeros(0)),
+  Instance(23.0, 3.0, Vectors.zeros(0)),
+  Instance(29.0, 4.0, Vectors.zeros(0))
+).toDF()
+
+val expected = Seq(3.091, 3.178)
+
+import GeneralizedLinearRegression._
+
+var idx = 0
+for (useWeight <- Seq(false, true)) {
+  val trainer = new GeneralizedLinearRegression().setFamily("poisson")
+  if (useWeight) trainer.setWeightCol("weight")
+  val model = trainer.fit(dataset)
+  val actual = model.intercept
+  assert(actual ~== expected(idx) absTol 1E-3, "Model mismatch: 
intercept only GLM with " +
+s"useWeight = $useWeight.")
+  assert(model.coefficients === new DenseVector(Array.empty[Double]))
+
+  idx += 1
+}
+
+// throw exception for empty model
+val trainer = new GeneralizedLinearRegression().setFitIntercept(false)
+intercept[SparkException] {
--- End diff --

thank you for adding the test, could you also please wrap it in withClue to 
verify the message contents, eg:
withClue("Specified model is empty with neither intercept nor feature") {
intercept[SparkException] { 
trainer.fit(dataset) 
} 
}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16740
  
**[Test build #72298 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72298/testReport)**
 for PR 16740 at commit 
[`931f7ec`](https://github.com/apache/spark/commit/931f7ecceff7a0cb0c1870af7e69d38454078c52).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-02-02 Thread actuaryzhang

Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16740#discussion_r99276315
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -335,6 +335,11 @@ class GeneralizedLinearRegression @Since("2.0.0") 
(@Since("2.0.0") override val
   throw new SparkException(msg)
 }
 
+if (numFeatures == 0 && !$(fitIntercept)) {
+  val msg = "Specified model is empty with neither intercept nor 
feature."
+  throw new SparkException(msg)
--- End diff --

@imatiach-msft Test added. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99275923
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Entropy.scala ---
@@ -83,23 +83,29 @@ object Entropy extends Impurity {
  * @param numClasses  Number of classes for label.
  */
 private[spark] class EntropyAggregator(numClasses: Int)
-  extends ImpurityAggregator(numClasses) with Serializable {
+  extends ImpurityAggregator(numClasses + 1) with Serializable {
--- End diff --

I guess it is because the number of "stats" increases by one since we are 
adding the weight, if I understand correctly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99275700
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala ---
@@ -79,7 +79,12 @@ private[spark] abstract class ImpurityAggregator(val 
statsSize: Int) extends Ser
* @param allStats  Flat stats array, with stats for this (node, 
feature, bin) contiguous.
* @param offsetStart index of stats for this (node, feature, bin).
*/
-  def update(allStats: Array[Double], offset: Int, label: Double, 
instanceWeight: Double): Unit
+  def update(
+  allStats: Array[Double],
+  offset: Int,
+  label: Double,
+  numSamples: Int,
+  sampleWeight: Double): Unit
--- End diff --

should the numSamples/sampleWeight be added to the doc here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99274459
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Entropy.scala ---
@@ -83,23 +83,29 @@ object Entropy extends Impurity {
  * @param numClasses  Number of classes for label.
  */
 private[spark] class EntropyAggregator(numClasses: Int)
-  extends ImpurityAggregator(numClasses) with Serializable {
+  extends ImpurityAggregator(numClasses + 1) with Serializable {
--- End diff --

sorry, trying to follow this part of the code, why do we pass (numClasses + 
1) to the impurityAggregator?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16733: [SPARK-19392][SQL] Fix the bug that throws an exc...

2017-02-02 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16733#discussion_r99273860
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala ---
@@ -29,7 +29,12 @@ private case object OracleDialect extends JdbcDialect {
   override def getCatalystType(
--- End diff --

okay, I'll do!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16733: [SPARK-19392][SQL] Fix the bug that throws an exc...

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16733#discussion_r99273809
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala ---
@@ -29,7 +29,12 @@ private case object OracleDialect extends JdbcDialect {
   override def getCatalystType(
--- End diff --

Can you ask him to retry it in Spark 2.1? I am not sure whether he is using 
Apache Spark or the Spark released by other vendors. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-02-02 Thread imatiach-msft

Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16740#discussion_r99273642
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
 ---
@@ -335,6 +335,11 @@ class GeneralizedLinearRegression @Since("2.0.0") 
(@Since("2.0.0") override val
   throw new SparkException(msg)
 }
 
+if (numFeatures == 0 && !$(fitIntercept)) {
+  val msg = "Specified model is empty with neither intercept nor 
feature."
+  throw new SparkException(msg)
--- End diff --

suggestion: please add a test to validate this case


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16765: [SPARK-19425][SQL] Make df.except work for UDT

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16765
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72295/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16765: [SPARK-19425][SQL] Make df.except work for UDT

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16765
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16765: [SPARK-19425][SQL] Make df.except work for UDT

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16765
  
**[Test build #72295 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72295/testReport)**
 for PR 16765 at commit 
[`ac3c3bf`](https://github.com/apache/spark/commit/ac3c3bfa270dda077bf89db926c38b9946c4738e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16779: [SPARK-19437] Rectify spark executor id in HeartbeatRece...

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16779
  
**[Test build #72297 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72297/testReport)**
 for PR 16779 at commit 
[`a9bc3f4`](https://github.com/apache/spark/commit/a9bc3f47b9cd08f309c00c159bc0e1e6a6c6e763).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16740
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72296/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16740
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16779: [SPARK-19437] Rectify spark executor id in HeartbeatRece...

2017-02-02 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16779
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16740
  
**[Test build #72296 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72296/testReport)**
 for PR 16740 at commit 
[`3a0a2af`](https://github.com/apache/spark/commit/3a0a2aff5a7b09cb0e1db7ec2e756e55b561eace).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-02-02 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16739
  
: ) This might be caused by the optimizer rule `CollapseRepartition`. Can 
you output the plan by `explain(true)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-02-02 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/15435
  
@sethah 

If I merge the MulticlassLogisticRegressionSummary into 
LogisticRegressionSummary,
then, according to the hierarchy currently designed, it became:

class LogisticRegressionSummary extends MulticlassSummary with 
LogisticRegressionSummary
class LogisticRegressionTrainingSummary extends LogisticRegressionSummary 
with 

** Note that now LogisticRegressionTrainingSummary must become a class, not 
a trait, if merge the MulticlassLogisticRegressionSummary into 
LogisticRegressionSummary, it has to be class...**

Now consider the `BinaryLogisticRegressionSummary`:

class BinaryLogisticRegressionSummary extends LogisticRegressionSummary
class BinaryLogisticRegressionTrainingSummary extends 
BinaryLogisticRegressionSummary

** Now new problem occur: BinaryLogisticRegressionTrainingSummary cannot 
extend LogisticRegressionTrainingSummary, because 
`LogisticRegressionTrainingSummary` has changed into a class, not a trait... **

** BinaryLogisticRegressionTrainingSummary cannot extend 
LogisticRegressionTrainingSummary cause more API breaking, such as `def 
summary`...**

So these problems are troublesome... for causing so many API breaking...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-02-02 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16740#discussion_r99269075
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
 ---
@@ -743,6 +743,55 @@ class GeneralizedLinearRegressionSuite
 }
   }
 
+  test("generalized linear regression: intercept only") {
+/*
+  R code:
+  y <- c(17, 19, 23, 29)
+  w <- c(1, 2, 3, 4)
+  model1 <- glm(y ~ 1, family = poisson)
+  model2 <- glm(y ~ 1, family = poisson, weights = w)
+  as.vector(c(coef(model1), coef(model2)))
+  [1] 3.091042 3.178054
+ */
+
+val dataset = Seq(
+  Instance(17.0, 1.0, Vectors.zeros(0)),
+  Instance(19.0, 2.0, Vectors.zeros(0)),
+  Instance(23.0, 3.0, Vectors.zeros(0)),
+  Instance(29.0, 4.0, Vectors.zeros(0))
+).toDF()
+
+val expected = Seq(3.091, 3.178)
+
+import GeneralizedLinearRegression._
+
+var idx = 0
+for (useWeight <- Seq(false, true)) {
+  val trainer = new GeneralizedLinearRegression().setFamily("poisson")
+.setLinkPredictionCol("linkPrediction")
+  if (useWeight) trainer.setWeightCol("weight")
+  val model = trainer.fit(dataset)
+  val actual = model.intercept
+  assert(actual ~== expected(idx) absTol 1E-3, "Model mismatch: 
intercept only GLM with " +
+s"useWeight = $useWeight.")
+  assert(model.coefficients === new DenseVector(Array.empty[Double]))
+
+  val familyLink = FamilyAndLink(trainer)
+  model.transform(dataset).select("features", "prediction", 
"linkPrediction").collect()
+.foreach {
+  case Row(features: DenseVector, prediction1: Double, 
linkPrediction1: Double) =>
+val eta = BLAS.dot(features, model.coefficients) + 
model.intercept
+val prediction2 = familyLink.fitted(eta)
--- End diff --

That was fast! :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16740
  
**[Test build #72296 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72296/testReport)**
 for PR 16740 at commit 
[`3a0a2af`](https://github.com/apache/spark/commit/3a0a2aff5a7b09cb0e1db7ec2e756e55b561eace).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-02-02 Thread actuaryzhang

Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16740#discussion_r99269006
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
 ---
@@ -743,6 +743,55 @@ class GeneralizedLinearRegressionSuite
 }
   }
 
+  test("generalized linear regression: intercept only") {
+/*
+  R code:
+  y <- c(17, 19, 23, 29)
+  w <- c(1, 2, 3, 4)
+  model1 <- glm(y ~ 1, family = poisson)
+  model2 <- glm(y ~ 1, family = poisson, weights = w)
+  as.vector(c(coef(model1), coef(model2)))
+  [1] 3.091042 3.178054
+ */
+
+val dataset = Seq(
+  Instance(17.0, 1.0, Vectors.zeros(0)),
+  Instance(19.0, 2.0, Vectors.zeros(0)),
+  Instance(23.0, 3.0, Vectors.zeros(0)),
+  Instance(29.0, 4.0, Vectors.zeros(0))
+).toDF()
+
+val expected = Seq(3.091, 3.178)
+
+import GeneralizedLinearRegression._
+
+var idx = 0
+for (useWeight <- Seq(false, true)) {
+  val trainer = new GeneralizedLinearRegression().setFamily("poisson")
+.setLinkPredictionCol("linkPrediction")
+  if (useWeight) trainer.setWeightCol("weight")
+  val model = trainer.fit(dataset)
+  val actual = model.intercept
+  assert(actual ~== expected(idx) absTol 1E-3, "Model mismatch: 
intercept only GLM with " +
+s"useWeight = $useWeight.")
+  assert(model.coefficients === new DenseVector(Array.empty[Double]))
+
+  val familyLink = FamilyAndLink(trainer)
+  model.transform(dataset).select("features", "prediction", 
"linkPrediction").collect()
+.foreach {
+  case Row(features: DenseVector, prediction1: Double, 
linkPrediction1: Double) =>
+val eta = BLAS.dot(features, model.coefficients) + 
model.intercept
+val prediction2 = familyLink.fitted(eta)
--- End diff --

@sethah Agree. Removed this. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16779: [SPARK-19437] Rectify spark executor id in HeartbeatRece...

2017-02-02 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16779
  
@zsxwing 
Thanks a lot for reviewing this. Not sure why the test doesn't start 
automatically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16740: [SPARK-19400][ML] Allow GLM to handle intercept o...

2017-02-02 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16740#discussion_r99268773
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
 ---
@@ -743,6 +743,55 @@ class GeneralizedLinearRegressionSuite
 }
   }
 
+  test("generalized linear regression: intercept only") {
+/*
+  R code:
+  y <- c(17, 19, 23, 29)
+  w <- c(1, 2, 3, 4)
+  model1 <- glm(y ~ 1, family = poisson)
+  model2 <- glm(y ~ 1, family = poisson, weights = w)
+  as.vector(c(coef(model1), coef(model2)))
+  [1] 3.091042 3.178054
+ */
+
+val dataset = Seq(
+  Instance(17.0, 1.0, Vectors.zeros(0)),
+  Instance(19.0, 2.0, Vectors.zeros(0)),
+  Instance(23.0, 3.0, Vectors.zeros(0)),
+  Instance(29.0, 4.0, Vectors.zeros(0))
+).toDF()
+
+val expected = Seq(3.091, 3.178)
+
+import GeneralizedLinearRegression._
+
+var idx = 0
+for (useWeight <- Seq(false, true)) {
+  val trainer = new GeneralizedLinearRegression().setFamily("poisson")
+.setLinkPredictionCol("linkPrediction")
+  if (useWeight) trainer.setWeightCol("weight")
+  val model = trainer.fit(dataset)
+  val actual = model.intercept
+  assert(actual ~== expected(idx) absTol 1E-3, "Model mismatch: 
intercept only GLM with " +
+s"useWeight = $useWeight.")
+  assert(model.coefficients === new DenseVector(Array.empty[Double]))
+
+  val familyLink = FamilyAndLink(trainer)
+  model.transform(dataset).select("features", "prediction", 
"linkPrediction").collect()
+.foreach {
+  case Row(features: DenseVector, prediction1: Double, 
linkPrediction1: Double) =>
+val eta = BLAS.dot(features, model.coefficients) + 
model.intercept
+val prediction2 = familyLink.fitted(eta)
--- End diff --

I don't think we need to test this. This is essentially checking the 
correctness of the prediction mechanism, regardless of the "intercept-only" 
part. The prediction mechanism is tested elsewhere. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-02-02 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/15435
  
sethah 

About this issue:
Why is there a one-to-one overlap between MulticlassClassificationSummary 
and LogisticRegressionSummary, and MulticlassLogisticRegressionSummary inherits 
from them both?

If I merge the MulticlassLogisticRegressionSummary into 
LogisticRegressionSummary to remove the  one-to-one overlap between 
MulticlassClassificationSummary and LogisticRegressionSummary, it will cause 
**more API breaking**, because in this way it will make 
BinaryLogisticRegressionTrainingSummary cannot extends 
LogisticRegressionTrainingSummary and it will break some other public API such 
as `def summary`.
you can try to modify it and compile the code and will find this problem...
Maybe there is some better way but I haven't think out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16740
  
@srowen would you please take a look and merge this if all is good? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16780: [SPARK-19438] Both reading and updating executorD...

2017-02-02 Thread jinxing64

Github user jinxing64 closed the pull request at:

https://github.com/apache/spark/pull/16780


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16780: [SPARK-19438] Both reading and updating executorDataMap ...

2017-02-02 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16780
  
Thanks a lot for looking into this~ @zsxwing 
You are right. My understanding about this is incorrect. 
`CoarseGrainedSchedulerBackend: DriverEndpoint` is a `ThreadSafeRpcEndpoint`, 
thus concurrent message processing is disabled.
I'll close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15264: [SPARK-17477][SQL] SparkSQL cannot handle schema evoluti...

2017-02-02 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15264
  
Yeap, I will try to get that back after finishing up few issues I am 
currently working on. I just realised that it'd take a bit of time for me to 
proceed (as I noticed we need a more careful touch for it). Please feel free to 
take over it if anyone is interested in it. Otherwise, let me try to proceed 
even if it takes a while.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-02 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16607#discussion_r99263532
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala 
---
@@ -302,16 +302,36 @@ class Word2VecModel private[ml] (
 @Since("1.6.0")
 object Word2VecModel extends MLReadable[Word2VecModel] {
 
+  private case class Data(word: String, vector: Array[Float])
+
   private[Word2VecModel]
   class Word2VecModelWriter(instance: Word2VecModel) extends MLWriter {
 
-private case class Data(wordIndex: Map[String, Int], wordVectors: 
Seq[Float])
-
 override protected def saveImpl(path: String): Unit = {
   DefaultParamsWriter.saveMetadata(instance, path, sc)
-  val data = Data(instance.wordVectors.wordIndex, 
instance.wordVectors.wordVectors.toSeq)
+
+  val wordVectors = instance.wordVectors.getVectors
+  val dataArray = wordVectors.toSeq.map { case (word, vector) => 
Data(word, vector) }.toArray
--- End diff --

No need to convert back to an Array


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-02 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16607#discussion_r99263525
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala 
---
@@ -320,14 +340,29 @@ object Word2VecModel extends 
MLReadable[Word2VecModel] {
 private val className = classOf[Word2VecModel].getName
 
 override def load(path: String): Word2VecModel = {
+  val spark = sparkSession
+  import spark.implicits._
+
   val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
+  val (major, minor) = 
VersionUtils.majorMinorVersion(metadata.sparkVersion)
+
   val dataPath = new Path(path, "data").toString
-  val data = sparkSession.read.parquet(dataPath)
-.select("wordIndex", "wordVectors")
-.head()
-  val wordIndex = data.getAs[Map[String, Int]](0)
-  val wordVectors = data.getAs[Seq[Float]](1).toArray
-  val oldModel = new feature.Word2VecModel(wordIndex, wordVectors)
+
+  val oldModel = if (major.toInt < 2 || (major.toInt == 2 && 
minor.toInt < 2)) {
--- End diff --

major, minor are already Ints


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-02 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16607#discussion_r99259617
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala 
---
@@ -18,10 +18,9 @@
 package org.apache.spark.ml.feature
 
 import org.apache.hadoop.fs.Path
-
--- End diff --

Keep newline between non-spark and spark imports


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16686
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72294/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16686
  
**[Test build #72294 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72294/testReport)**
 for PR 16686 at commit 
[`5b48fc6`](https://github.com/apache/spark/commit/5b48fc65ac08e8ed4a09edd0d346990d40d042e0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16686
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-02-02 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16699#discussion_r99263111
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
 ---
@@ -743,6 +743,84 @@ class GeneralizedLinearRegressionSuite
 }
   }
 
+  test("generalized linear regression with offset") {
+/*
+  R code:
+  library(statmod)
+  df <- as.data.frame(matrix(c(
+1.0, 1.0, 2.0, 0.0, 5.0,
+2.0, 2.0, 0.5, 1.0, 2.0,
+1.0, 3.0, 1.0, 2.0, 1.0,
+2.0, 4.0, 0.0, 3.0, 3.0), 4, 5, byrow = TRUE))
+  families <- list(gaussian, poisson, Gamma, tweedie(1.5))
+  f1 <- V1 ~ -1 + V4 + V5
+  f2 <- V1 ~ V4 + V5
+  for (f in c(f1, f2)) {
+for (fam in families) {
+  model <- glm(f, df, family = fam, weights = V2, offset = V3)
+  print(as.vector(coef(model)))
+}
+  }
+
+  [1] 0.535040431 0.005390836
+  [1]  0.1968355 -0.2061711
+  [1]  0.307996 -0.153579
+  [1]  0.32166185 -0.09698986
+  [1] -0.880  0.7342857  0.1714286
+  [1] -1.9991044  0.7247511  0.1424392
+  [1] -0.27378146  0.31599396 -0.06204946
+  [1] -0.17118812  0.31200361 -0.02541656
+*/
+val dataset = Seq(
+  OffsetInstance(1.0, 1.0, 2.0, Vectors.dense(0.0, 5.0)),
+  OffsetInstance(2.0, 2.0, 0.5, Vectors.dense(1.0, 2.0)),
+  OffsetInstance(1.0, 3.0, 1.0, Vectors.dense(2.0, 1.0)),
+  OffsetInstance(2.0, 4.0, 0.0, Vectors.dense(3.0, 3.0))
+).toDF()
+
+val expected = Seq(
+  Vectors.dense(0.0, 0.535040431, 0.005390836),
+  Vectors.dense(0.0, 0.1968355, -0.2061711),
+  Vectors.dense(0.0, 0.307996, -0.153579),
+  Vectors.dense(0.0, 0.32166185, -0.09698986),
+  Vectors.dense(-0.88, 0.7342857, 0.1714286),
+  Vectors.dense(-1.9991044, 0.7247511, 0.1424392),
+  Vectors.dense(-0.27378146, 0.31599396, -0.06204946),
+  Vectors.dense(-0.17118812, 0.31200361, -0.02541656))
+
+import GeneralizedLinearRegression._
+
+var idx = 0
+for (fitIntercept <- Seq(false, true)) {
+  for (family <- Seq("gaussian", "poisson", "gamma", "tweedie")) {
+var trainer = new GeneralizedLinearRegression().setFamily(family)
+  .setFitIntercept(fitIntercept).setOffsetCol("offset")
+  .setWeightCol("weight").setLinkPredictionCol("linkPrediction")
+if (family == "tweedie") trainer = trainer.setVariancePower(1.5)
+val model = trainer.fit(dataset)
+val actual = Vectors.dense(model.intercept, model.coefficients(0), 
model.coefficients(1))
+assert(actual ~= expected(idx) absTol 1e-4, s"Model mismatch: GLM 
with family = $family," +
--- End diff --

We need to be checking more than just the coefficients. For example, the 
computation of the null deviance does not match R, since the null model 
computation does not consider the offsets.

Actually, I think we ought to just incorporate offsets into all of the 
other tests, which will make sure offsets are exhaustively tested. This has 
been done before e.g. https://github.com/apache/spark/pull/15488, and it _is_ a 
real pain, but it's probably the best way. I'd be open to other arguments 
though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2017-02-02 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/14702
  
can anyone please review this PR ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-02 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/16740
  
Ok, yeah, let's go with this fix now then - seems both R and statsmodels 
fit to compute the null model. Thanks for following up on that!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16765: [SPARK-19425][SQL] Make df.except work for UDT

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16765
  
**[Test build #72295 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72295/testReport)**
 for PR 16765 at commit 
[`ac3c3bf`](https://github.com/apache/spark/commit/ac3c3bfa270dda077bf89db926c38b9946c4738e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16733: [SPARK-19392][SQL] Fix the bug that throws an exc...

2017-02-02 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16733#discussion_r99261637
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala ---
@@ -29,7 +29,12 @@ private case object OracleDialect extends JdbcDialect {
   override def getCatalystType(
--- End diff --

I looked over the previous releases though, it seems `scale` always is set 
there. So, I'm not sure why this exception happens in the report. What do u 
think? Is it okay to close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16686
  
**[Test build #72294 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72294/testReport)**
 for PR 16686 at commit 
[`5b48fc6`](https://github.com/apache/spark/commit/5b48fc65ac08e8ed4a09edd0d346990d40d042e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16607
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16607
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72293/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16607
  
**[Test build #72293 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72293/testReport)**
 for PR 16607 at commit 
[`9b5e928`](https://github.com/apache/spark/commit/9b5e9288699012b2e5d9b347191fd3d141b31d7d).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-02 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16607
  
**[Test build #72293 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72293/testReport)**
 for PR 16607 at commit 
[`9b5e928`](https://github.com/apache/spark/commit/9b5e9288699012b2e5d9b347191fd3d141b31d7d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-02 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/16607
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16607: [SPARK-19247][ML] Save large word2vec models

2017-02-02 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/16607
  
Sorry for the delay; will take a look now!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14412: [SPARK-15355] [CORE] Proactive block replication

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14412
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14412: [SPARK-15355] [CORE] Proactive block replication

2017-02-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14412
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72291/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 304 matches

Mail list logo