[GitHub] spark pull request #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow...

2017-02-22 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/16546


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow...

2017-01-12 Thread michalsenkyr
Github user michalsenkyr commented on a diff in the pull request:

https://github.com/apache/spark/pull/16546#discussion_r95927063
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -120,6 +120,32 @@ object ScalaReflection extends ScalaReflection {
   }
 
   /**
+   * Returns the element type for Seq[_] and its subclass.
+   */
+  def getElementTypeForSeq(t: `Type`): Option[`Type`] = {
+if (!(t <:< localTypeOf[Seq[_]])) {
+  return None
+}
+val TypeRef(_, _, elementTypeList) = t
--- End diff --

This would probably be better solved by using `match` and `typeParams`. 
Something like this:

```scala
t match {
  case TypeRef(_, _, Seq(elementType)) => Some(elementType)
  case _ =>
t.baseClasses.find { c =>
  val cType = c.asClass.toType
  cType <:< localTypeOf[Seq[_]] && cType.typeParams.nonEmpty
}.map(t.baseType(_).typeParams.head)
}
```

Also not sure whether types with more than one type parameter are handled 
correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow...

2017-01-10 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/16546

[WIP][SQL] Put check in ExpressionEncoder.fromRow to ensure we can convert 
deserialized object to required type

## What changes were proposed in this pull request?

Two problems are addressed in this patch.

1. Serialize subclass of `Seq[_]` which doesn't have element type

Currently, in `ScalaReflection.serializerFor`, we try to serialize all sub 
types of `Seq[_]`. But for `Range` which is a `Seq[Int]` and doesn't have 
element type, `serializerFor` will fail and show mystery messages:

scala.MatchError: scala.collection.immutable.Range.Inclusive (of class 
scala.reflect.internal.Types$ClassNoArgsTypeRef)
  at 
org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:520)
  at 
org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:463)
  at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
 

This patch tries to fix this by considering the types without element type.

2. Encoder can't deserialize internal row to required type

We serialize the objects with common super class such as `Seq[_]` to a 
common internal data. But when we want to deserialize the internal data back to 
the original objects, we will encounter the problem of initialization of 
different types of objects.

For example, we deserialize the data serialized from `Seq[_]` to 
`WrappedArray`. It works when we serialize data of `Seq[_]`. If we try to 
serialize data of subclass of `Seq[_]` (for example `Range`) which is not 
assignable from `WrappedArray`, there will be runtime error when converting 
deserialized data to the required subclass of `Seq[_]`.

Except for explicitly writing down the rule to deserialize each subclass of 
`Seq[_]`, I think the feasible solution is to check if we can convert 
deserialized data to the required type. This patch puts the check into 
`ExpressionEncoder.fromRow`. Once the requirement is not matched, we show a 
reasonable message to users.

## How was this patch tested?

Jenkins tests.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 encoder-range

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16546.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16546


commit 190fb6222d84991000f91735952579b1e0686a61
Author: Liang-Chi Hsieh 
Date:   2017-01-11T03:38:44Z

Put check in ExpressionEncoder.fromRow to ensure we can convert 
deserialized object to required type.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org