[GitHub] spark pull request #20235: [Spark-22887][ML][TESTS][WIP] ML test for Structu...

2018-04-09 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20235#discussion_r180124798
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala 
---
@@ -34,86 +35,122 @@ class FPGrowthSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
   }
 
   test("FPGrowth fit and transform with different data types") {
-Array(IntegerType, StringType, ShortType, LongType, ByteType).foreach 
{ dt =>
-  val data = dataset.withColumn("items", 
col("items").cast(ArrayType(dt)))
-  val model = new FPGrowth().setMinSupport(0.5).fit(data)
-  val generatedRules = model.setMinConfidence(0.5).associationRules
-  val expectedRules = spark.createDataFrame(Seq(
-(Array("2"), Array("1"), 1.0),
-(Array("1"), Array("2"), 0.75)
-  )).toDF("antecedent", "consequent", "confidence")
-.withColumn("antecedent", col("antecedent").cast(ArrayType(dt)))
-.withColumn("consequent", col("consequent").cast(ArrayType(dt)))
-  assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
-generatedRules.sort("antecedent").rdd.collect()))
-
-  val transformed = model.transform(data)
-  val expectedTransformed = spark.createDataFrame(Seq(
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "3"), Array(2))
-  )).toDF("id", "items", "prediction")
-.withColumn("items", col("items").cast(ArrayType(dt)))
-.withColumn("prediction", col("prediction").cast(ArrayType(dt)))
-  assert(expectedTransformed.collect().toSet.equals(
-transformed.collect().toSet))
+  class DataTypeWithEncoder[A](val a: DataType)
+  (implicit val encoder: Encoder[(Int, 
Array[A], Array[A])])
+
+  Array(
+new DataTypeWithEncoder[Int](IntegerType),
+new DataTypeWithEncoder[String](StringType),
+new DataTypeWithEncoder[Short](ShortType),
+new DataTypeWithEncoder[Long](LongType)
+// , new DataTypeWithEncoder[Byte](ByteType)
+// TODO: using ByteType produces error, as Array[Byte] is handled 
as Binary
+// cannot resolve 'CAST(`items` AS BINARY)' due to data type 
mismatch:
+// cannot cast array to binary;
+  ).foreach { dt => {
+val data = dataset.withColumn("items", 
col("items").cast(ArrayType(dt.a)))
+val model = new FPGrowth().setMinSupport(0.5).fit(data)
+val generatedRules = model.setMinConfidence(0.5).associationRules
+val expectedRules = Seq(
+  (Array("2"), Array("1"), 1.0),
+  (Array("1"), Array("2"), 0.75)
+).toDF("antecedent", "consequent", "confidence")
+  .withColumn("antecedent", 
col("antecedent").cast(ArrayType(dt.a)))
+  .withColumn("consequent", 
col("consequent").cast(ArrayType(dt.a)))
+assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
+  generatedRules.sort("antecedent").rdd.collect()))
+
+val expectedTransformed = Seq(
+  (0, Array("1", "2"), Array.emptyIntArray),
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20235: [Spark-22887][ML][TESTS][WIP] ML test for Structu...

2018-04-09 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20235#discussion_r180027926
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala 
---
@@ -34,86 +35,122 @@ class FPGrowthSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
   }
 
   test("FPGrowth fit and transform with different data types") {
-Array(IntegerType, StringType, ShortType, LongType, ByteType).foreach 
{ dt =>
-  val data = dataset.withColumn("items", 
col("items").cast(ArrayType(dt)))
-  val model = new FPGrowth().setMinSupport(0.5).fit(data)
-  val generatedRules = model.setMinConfidence(0.5).associationRules
-  val expectedRules = spark.createDataFrame(Seq(
-(Array("2"), Array("1"), 1.0),
-(Array("1"), Array("2"), 0.75)
-  )).toDF("antecedent", "consequent", "confidence")
-.withColumn("antecedent", col("antecedent").cast(ArrayType(dt)))
-.withColumn("consequent", col("consequent").cast(ArrayType(dt)))
-  assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
-generatedRules.sort("antecedent").rdd.collect()))
-
-  val transformed = model.transform(data)
-  val expectedTransformed = spark.createDataFrame(Seq(
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "3"), Array(2))
-  )).toDF("id", "items", "prediction")
-.withColumn("items", col("items").cast(ArrayType(dt)))
-.withColumn("prediction", col("prediction").cast(ArrayType(dt)))
-  assert(expectedTransformed.collect().toSet.equals(
-transformed.collect().toSet))
+  class DataTypeWithEncoder[A](val a: DataType)
+  (implicit val encoder: Encoder[(Int, 
Array[A], Array[A])])
+
+  Array(
+new DataTypeWithEncoder[Int](IntegerType),
+new DataTypeWithEncoder[String](StringType),
+new DataTypeWithEncoder[Short](ShortType),
+new DataTypeWithEncoder[Long](LongType)
+// , new DataTypeWithEncoder[Byte](ByteType)
+// TODO: using ByteType produces error, as Array[Byte] is handled 
as Binary
+// cannot resolve 'CAST(`items` AS BINARY)' due to data type 
mismatch:
+// cannot cast array to binary;
+  ).foreach { dt => {
+val data = dataset.withColumn("items", 
col("items").cast(ArrayType(dt.a)))
+val model = new FPGrowth().setMinSupport(0.5).fit(data)
+val generatedRules = model.setMinConfidence(0.5).associationRules
+val expectedRules = Seq(
+  (Array("2"), Array("1"), 1.0),
+  (Array("1"), Array("2"), 0.75)
+).toDF("antecedent", "consequent", "confidence")
+  .withColumn("antecedent", 
col("antecedent").cast(ArrayType(dt.a)))
+  .withColumn("consequent", 
col("consequent").cast(ArrayType(dt.a)))
+assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
+  generatedRules.sort("antecedent").rdd.collect()))
+
+val expectedTransformed = Seq(
+  (0, Array("1", "2"), Array.emptyIntArray),
--- End diff --

I think the "id" column should be of values "0, 1, 2, 3".
Here id column is useless, we can remove it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20235: [Spark-22887][ML][TESTS][WIP] ML test for Structu...

2018-01-29 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20235#discussion_r164427189
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala 
---
@@ -34,86 +35,122 @@ class FPGrowthSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
   }
 
   test("FPGrowth fit and transform with different data types") {
-Array(IntegerType, StringType, ShortType, LongType, ByteType).foreach 
{ dt =>
-  val data = dataset.withColumn("items", 
col("items").cast(ArrayType(dt)))
-  val model = new FPGrowth().setMinSupport(0.5).fit(data)
-  val generatedRules = model.setMinConfidence(0.5).associationRules
-  val expectedRules = spark.createDataFrame(Seq(
-(Array("2"), Array("1"), 1.0),
-(Array("1"), Array("2"), 0.75)
-  )).toDF("antecedent", "consequent", "confidence")
-.withColumn("antecedent", col("antecedent").cast(ArrayType(dt)))
-.withColumn("consequent", col("consequent").cast(ArrayType(dt)))
-  assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
-generatedRules.sort("antecedent").rdd.collect()))
-
-  val transformed = model.transform(data)
-  val expectedTransformed = spark.createDataFrame(Seq(
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "3"), Array(2))
-  )).toDF("id", "items", "prediction")
-.withColumn("items", col("items").cast(ArrayType(dt)))
-.withColumn("prediction", col("prediction").cast(ArrayType(dt)))
-  assert(expectedTransformed.collect().toSet.equals(
-transformed.collect().toSet))
+  class DataTypeWithEncoder[A](val a: DataType)
+  (implicit val encoder: Encoder[(Int, 
Array[A], Array[A])])
+
+  Array(
+new DataTypeWithEncoder[Int](IntegerType),
+new DataTypeWithEncoder[String](StringType),
+new DataTypeWithEncoder[Short](ShortType),
+new DataTypeWithEncoder[Long](LongType)
+// , new DataTypeWithEncoder[Byte](ByteType)
+// TODO: using ByteType produces error, as Array[Byte] is handled 
as Binary
+// cannot resolve 'CAST(`items` AS BINARY)' due to data type 
mismatch:
+// cannot cast array to binary;
+  ).foreach { dt => {
+val data = dataset.withColumn("items", 
col("items").cast(ArrayType(dt.a)))
+val model = new FPGrowth().setMinSupport(0.5).fit(data)
+val generatedRules = model.setMinConfidence(0.5).associationRules
+val expectedRules = Seq(
+  (Array("2"), Array("1"), 1.0),
+  (Array("1"), Array("2"), 0.75)
+).toDF("antecedent", "consequent", "confidence")
+  .withColumn("antecedent", 
col("antecedent").cast(ArrayType(dt.a)))
+  .withColumn("consequent", 
col("consequent").cast(ArrayType(dt.a)))
+assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
+  generatedRules.sort("antecedent").rdd.collect()))
+
+val expectedTransformed = Seq(
+  (0, Array("1", "2"), Array.emptyIntArray),
+  (0, Array("1", "2"), Array.emptyIntArray),
+  (0, Array("1", "2"), Array.emptyIntArray),
+  (0, Array("1", "3"), Array(2))
+).toDF("id", "items", "expected")
+  .withColumn("items", col("items").cast(ArrayType(dt.a)))
+  .withColumn("expected", col("expected").cast(ArrayType(dt.a)))
+
+testTransformer(expectedTransformed, model,
+  "expected", "prediction") {
+  case Row(expected, prediction) => assert(expected === prediction,
+s"Expected $expected but found $prediction for data type $dt")
+}(dt.encoder)
+  }
 }
   }
 
   test("FPGrowth getFreqItems") {
 val model = new FPGrowth().setMinSupport(0.7).fit(dataset)
-val expectedFreq = spark.createDataFrame(Seq(
+val expectedFreq = Seq(
   (Array("1"), 4L),
   (Array("2"), 3L),
   (Array("1", "2"), 3L),
   (Array("2", "1"), 3L) // duplicate as the items sequence is not 
guaranteed
-)).toDF("items", "expectedFreq")
+).toDF("items", "expectedFreq")
 val freqItems = model.freqItemsets
 
 val checkDF = freqItems.join(expectedFreq, "items")
 assert(checkDF.count() == 3 && checkDF.filter(col("freq") === 
col("expectedFreq")).count() == 3)
   }
 
   test("FPGrowth getFreqItems with Null") {
-val df = spark.createDataFrame(Seq(
+val df = Seq(
   (1, Array("1", "2", "3", "5")),
   (2, Array("1", 

[GitHub] spark pull request #20235: [Spark-22887][ML][TESTS][WIP] ML test for Structu...

2018-01-23 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20235#discussion_r163241569
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala 
---
@@ -34,86 +35,122 @@ class FPGrowthSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
   }
 
   test("FPGrowth fit and transform with different data types") {
-Array(IntegerType, StringType, ShortType, LongType, ByteType).foreach 
{ dt =>
-  val data = dataset.withColumn("items", 
col("items").cast(ArrayType(dt)))
-  val model = new FPGrowth().setMinSupport(0.5).fit(data)
-  val generatedRules = model.setMinConfidence(0.5).associationRules
-  val expectedRules = spark.createDataFrame(Seq(
-(Array("2"), Array("1"), 1.0),
-(Array("1"), Array("2"), 0.75)
-  )).toDF("antecedent", "consequent", "confidence")
-.withColumn("antecedent", col("antecedent").cast(ArrayType(dt)))
-.withColumn("consequent", col("consequent").cast(ArrayType(dt)))
-  assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
-generatedRules.sort("antecedent").rdd.collect()))
-
-  val transformed = model.transform(data)
-  val expectedTransformed = spark.createDataFrame(Seq(
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "3"), Array(2))
-  )).toDF("id", "items", "prediction")
-.withColumn("items", col("items").cast(ArrayType(dt)))
-.withColumn("prediction", col("prediction").cast(ArrayType(dt)))
-  assert(expectedTransformed.collect().toSet.equals(
-transformed.collect().toSet))
+  class DataTypeWithEncoder[A](val a: DataType)
+  (implicit val encoder: Encoder[(Int, 
Array[A], Array[A])])
+
+  Array(
+new DataTypeWithEncoder[Int](IntegerType),
+new DataTypeWithEncoder[String](StringType),
+new DataTypeWithEncoder[Short](ShortType),
+new DataTypeWithEncoder[Long](LongType)
+// , new DataTypeWithEncoder[Byte](ByteType)
+// TODO: using ByteType produces error, as Array[Byte] is handled 
as Binary
+// cannot resolve 'CAST(`items` AS BINARY)' due to data type 
mismatch:
+// cannot cast array to binary;
+  ).foreach { dt => {
+val data = dataset.withColumn("items", 
col("items").cast(ArrayType(dt.a)))
+val model = new FPGrowth().setMinSupport(0.5).fit(data)
+val generatedRules = model.setMinConfidence(0.5).associationRules
+val expectedRules = Seq(
+  (Array("2"), Array("1"), 1.0),
+  (Array("1"), Array("2"), 0.75)
+).toDF("antecedent", "consequent", "confidence")
+  .withColumn("antecedent", 
col("antecedent").cast(ArrayType(dt.a)))
+  .withColumn("consequent", 
col("consequent").cast(ArrayType(dt.a)))
+assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
+  generatedRules.sort("antecedent").rdd.collect()))
+
+val expectedTransformed = Seq(
+  (0, Array("1", "2"), Array.emptyIntArray),
+  (0, Array("1", "2"), Array.emptyIntArray),
+  (0, Array("1", "2"), Array.emptyIntArray),
+  (0, Array("1", "3"), Array(2))
+).toDF("id", "items", "expected")
+  .withColumn("items", col("items").cast(ArrayType(dt.a)))
+  .withColumn("expected", col("expected").cast(ArrayType(dt.a)))
+
+testTransformer(expectedTransformed, model,
+  "expected", "prediction") {
+  case Row(expected, prediction) => assert(expected === prediction,
+s"Expected $expected but found $prediction for data type $dt")
+}(dt.encoder)
+  }
 }
   }
 
   test("FPGrowth getFreqItems") {
 val model = new FPGrowth().setMinSupport(0.7).fit(dataset)
-val expectedFreq = spark.createDataFrame(Seq(
+val expectedFreq = Seq(
   (Array("1"), 4L),
   (Array("2"), 3L),
   (Array("1", "2"), 3L),
   (Array("2", "1"), 3L) // duplicate as the items sequence is not 
guaranteed
-)).toDF("items", "expectedFreq")
+).toDF("items", "expectedFreq")
 val freqItems = model.freqItemsets
 
 val checkDF = freqItems.join(expectedFreq, "items")
 assert(checkDF.count() == 3 && checkDF.filter(col("freq") === 
col("expectedFreq")).count() == 3)
   }
 
   test("FPGrowth getFreqItems with Null") {
-val df = spark.createDataFrame(Seq(
+val df = Seq(
   (1, Array("1", "2", "3", "5")),
   (2, Array("1", 

[GitHub] spark pull request #20235: [Spark-22887][ML][TESTS][WIP] ML test for Structu...

2018-01-11 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20235#discussion_r160980529
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala 
---
@@ -34,86 +35,122 @@ class FPGrowthSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
   }
 
   test("FPGrowth fit and transform with different data types") {
-Array(IntegerType, StringType, ShortType, LongType, ByteType).foreach 
{ dt =>
-  val data = dataset.withColumn("items", 
col("items").cast(ArrayType(dt)))
-  val model = new FPGrowth().setMinSupport(0.5).fit(data)
-  val generatedRules = model.setMinConfidence(0.5).associationRules
-  val expectedRules = spark.createDataFrame(Seq(
-(Array("2"), Array("1"), 1.0),
-(Array("1"), Array("2"), 0.75)
-  )).toDF("antecedent", "consequent", "confidence")
-.withColumn("antecedent", col("antecedent").cast(ArrayType(dt)))
-.withColumn("consequent", col("consequent").cast(ArrayType(dt)))
-  assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
-generatedRules.sort("antecedent").rdd.collect()))
-
-  val transformed = model.transform(data)
-  val expectedTransformed = spark.createDataFrame(Seq(
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "3"), Array(2))
-  )).toDF("id", "items", "prediction")
-.withColumn("items", col("items").cast(ArrayType(dt)))
-.withColumn("prediction", col("prediction").cast(ArrayType(dt)))
-  assert(expectedTransformed.collect().toSet.equals(
-transformed.collect().toSet))
+  class DataTypeWithEncoder[A](val a: DataType)
+  (implicit val encoder: Encoder[(Int, 
Array[A], Array[A])])
--- End diff --

Done, thx.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20235: [Spark-22887][ML][TESTS][WIP] ML test for Structu...

2018-01-11 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/20235#discussion_r160979218
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala 
---
@@ -34,86 +35,122 @@ class FPGrowthSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
   }
 
   test("FPGrowth fit and transform with different data types") {
-Array(IntegerType, StringType, ShortType, LongType, ByteType).foreach 
{ dt =>
-  val data = dataset.withColumn("items", 
col("items").cast(ArrayType(dt)))
-  val model = new FPGrowth().setMinSupport(0.5).fit(data)
-  val generatedRules = model.setMinConfidence(0.5).associationRules
-  val expectedRules = spark.createDataFrame(Seq(
-(Array("2"), Array("1"), 1.0),
-(Array("1"), Array("2"), 0.75)
-  )).toDF("antecedent", "consequent", "confidence")
-.withColumn("antecedent", col("antecedent").cast(ArrayType(dt)))
-.withColumn("consequent", col("consequent").cast(ArrayType(dt)))
-  assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
-generatedRules.sort("antecedent").rdd.collect()))
-
-  val transformed = model.transform(data)
-  val expectedTransformed = spark.createDataFrame(Seq(
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "3"), Array(2))
-  )).toDF("id", "items", "prediction")
-.withColumn("items", col("items").cast(ArrayType(dt)))
-.withColumn("prediction", col("prediction").cast(ArrayType(dt)))
-  assert(expectedTransformed.collect().toSet.equals(
-transformed.collect().toSet))
+  class DataTypeWithEncoder[A](val a: DataType)
+  (implicit val encoder: Encoder[(Int, 
Array[A], Array[A])])
--- End diff --

In DataTypeWithEncoder I would suggest to rename the val "a" to "dataType".


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20235: [Spark-22887][ML][TESTS][WIP] ML test for Structu...

2018-01-11 Thread smurakozi
Github user smurakozi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20235#discussion_r160969767
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala 
---
@@ -34,86 +35,122 @@ class FPGrowthSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
   }
 
   test("FPGrowth fit and transform with different data types") {
-Array(IntegerType, StringType, ShortType, LongType, ByteType).foreach 
{ dt =>
-  val data = dataset.withColumn("items", 
col("items").cast(ArrayType(dt)))
-  val model = new FPGrowth().setMinSupport(0.5).fit(data)
-  val generatedRules = model.setMinConfidence(0.5).associationRules
-  val expectedRules = spark.createDataFrame(Seq(
-(Array("2"), Array("1"), 1.0),
-(Array("1"), Array("2"), 0.75)
-  )).toDF("antecedent", "consequent", "confidence")
-.withColumn("antecedent", col("antecedent").cast(ArrayType(dt)))
-.withColumn("consequent", col("consequent").cast(ArrayType(dt)))
-  assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
-generatedRules.sort("antecedent").rdd.collect()))
-
-  val transformed = model.transform(data)
-  val expectedTransformed = spark.createDataFrame(Seq(
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "3"), Array(2))
-  )).toDF("id", "items", "prediction")
-.withColumn("items", col("items").cast(ArrayType(dt)))
-.withColumn("prediction", col("prediction").cast(ArrayType(dt)))
-  assert(expectedTransformed.collect().toSet.equals(
-transformed.collect().toSet))
+  class DataTypeWithEncoder[A](val a: DataType)
+  (implicit val encoder: Encoder[(Int, 
Array[A], Array[A])])
--- End diff --

This class is needed for two purposes:
1. to connect data types with their corresponding DataType. 
Note: this information is already available in  AtomicType as InternalType, 
but it's not accessible. Using it from this test doesn't justify making it 
public.
2. to get the proper encoder to the testTransformer method. As the 
datatypes are put into an array dt is inferred to be their parent type, and 
implicit search is able to find the encoders only for concrete types. 
For a similar reason, we need to use the type of the final encoder. If we 
have only the encoder for A implicit search will not be able to construct 
Array[A], as we have implicit encoders for Array[Int], Array[Short]... but not 
for generic A, having an encoder.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20235: [Spark-22887][ML][TESTS][WIP] ML test for Structu...

2018-01-11 Thread smurakozi
GitHub user smurakozi opened a pull request:

https://github.com/apache/spark/pull/20235

[Spark-22887][ML][TESTS][WIP] ML test for StructuredStreaming: spark.ml.fpm

## What changes were proposed in this pull request?

Converting FPGrowth tests to also check code with structured streaming, 
using the  ML testing infrastructure implemented in SPARK-22882. 

Note: this is a WIP, test with Array[Byte] is not yet working due to some 
datatype issues (Array[Byte] vs Binary).

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/smurakozi/spark SPARK-22887

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20235.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20235


commit 331129556003bcf6e4bab6559e80e46ac0858706
Author: Sandor Murakozi 
Date:   2018-01-05T12:41:53Z

test 'FPGrowthModel setMinConfidence should affect rules generation and 
transform' is converted to use testTransformer

commit 93aff2c999eee4a88f7f4a3c32d6c7b601a918ac
Author: Sandor Murakozi 
Date:   2018-01-08T13:14:38Z

Test 'FPGrowth fit and transform with different data types' works with 
streaming, except for Byte

commit 8b0b00070a21bd47537a7c3ad580e2af38a481bd
Author: Sandor Murakozi 
Date:   2018-01-11T11:28:46Z

All tests use testTransformer.
Test with Array[Byte] is missing.

commit af61845ab6acfa82c4411bce3ab4a20afebd0aa3
Author: Sandor Murakozi 
Date:   2018-01-11T11:49:27Z

Unintentional changes in 93aff2c999 are reverted




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org