GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/16546
[WIP][SQL] Put check in ExpressionEncoder.fromRow to ensure we can convert
deserialized object to required type
## What changes were proposed in this pull request?
Two problems are addressed in this patch.
1. Serialize subclass of `Seq[_]` which doesn't have element type
Currently, in `ScalaReflection.serializerFor`, we try to serialize all sub
types of `Seq[_]`. But for `Range` which is a `Seq[Int]` and doesn't have
element type, `serializerFor` will fail and show mystery messages:
scala.MatchError: scala.collection.immutable.Range.Inclusive (of class
scala.reflect.internal.Types$ClassNoArgsTypeRef)
at
org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:520)
at
org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:463)
at
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
This patch tries to fix this by considering the types without element type.
2. Encoder can't deserialize internal row to required type
We serialize the objects with common super class such as `Seq[_]` to a
common internal data. But when we want to deserialize the internal data back to
the original objects, we will encounter the problem of initialization of
different types of objects.
For example, we deserialize the data serialized from `Seq[_]` to
`WrappedArray`. It works when we serialize data of `Seq[_]`. If we try to
serialize data of subclass of `Seq[_]` (for example `Range`) which is not
assignable from `WrappedArray`, there will be runtime error when converting
deserialized data to the required subclass of `Seq[_]`.
Except for explicitly writing down the rule to deserialize each subclass of
`Seq[_]`, I think the feasible solution is to check if we can convert
deserialized data to the required type. This patch puts the check into
`ExpressionEncoder.fromRow`. Once the requirement is not matched, we show a
reasonable message to users.
## How was this patch tested?
Jenkins tests.
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 encoder-range
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16546.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16546
commit 190fb6222d84991000f91735952579b1e0686a61
Author: Liang-Chi Hsieh
Date: 2017-01-11T03:38:44Z
Put check in ExpressionEncoder.fromRow to ensure we can convert
deserialized object to required type.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org