Through the DataFrame API, users should never see UTF8String.

Expression (and any class in the catalyst package) is considered internal
and so uses the internal representation of various types.  Which type we
use here is not stable across releases.

Is there a reason you aren't defining a UDF instead?

On Thu, Jun 11, 2015 at 8:08 PM, zsampson <zsamp...@palantir.com> wrote:

> I'm hoping for some clarity about when to expect String vs UTF8String when
> using the Java DataFrames API.
>
> In upgrading to Spark 1.4, I'm dealing with a lot of errors where what was
> once a String is now a UTF8String. The comments in the file and the related
> commit message indicate that maybe it should be internal to SparkSQL's
> implementation.
>
> However, when I add a column containing a custom subclass of Expression,
> the
> row passed to the eval method contains instances of UTF8String. Ditto for
> AggregateFunction.update. Is this expected? If so, when should I generally
> know to deal with UTF8String objects?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/When-to-expect-UTF8String-tp12710.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to