We are trying to dynamically create the query, with columns coming from different places. We can over come this with a few more lines of code, but it would be nice for us pass on the `alias` along (given that we can do so for all the rest of the frame operations.)
Created JIRA here https://issues.apache.org/jira/browse/SPARK-13253 Thanks for the help. On Tue, Feb 9, 2016 at 5:29 PM Ted Yu <yuzhih...@gmail.com> wrote: > What's your plan of using the arrayCol ? > It would be part of some query, right ? > > On Tue, Feb 9, 2016 at 2:27 PM, Rakesh Chalasani <vnit.rak...@gmail.com> > wrote: > >> Do you mean using "alias" instead of "as"? Unfortunately, that didn't help >> >> > val arrayCol = functions.array(df("a"), df("b")).alias("arrayCol") >> >> still throws the error. >> >> Surprisingly, doing the same thing inside a select works, >> > df.select(functions.array(df("a"), df("b")).as("arrayCol")).show() >> >> +--------+ >> |arrayCol| >> +--------+ >> | [0, 1]| >> | [1, 2]| >> | [2, 3]| >> | [3, 4]| >> | [4, 5]| >> | [5, 6]| >> | [6, 7]| >> | [7, 8]| >> | [8, 9]| >> | [9, 10]| >> +--------+ >> >> >> >> On Tue, Feb 9, 2016 at 4:52 PM Ted Yu <yuzhih...@gmail.com> wrote: >> >>> How about changing the last line to: >>> >>> scala> val df2 = df.select(functions.array(df("a"), >>> df("b")).alias("arrayCol")) >>> df2: org.apache.spark.sql.DataFrame = [arrayCol: array<int>] >>> >>> scala> df2.show() >>> +--------+ >>> |arrayCol| >>> +--------+ >>> | [0, 1]| >>> | [1, 2]| >>> | [2, 3]| >>> | [3, 4]| >>> | [4, 5]| >>> | [5, 6]| >>> | [6, 7]| >>> | [7, 8]| >>> | [8, 9]| >>> | [9, 10]| >>> +--------+ >>> >>> FYI >>> >>> On Tue, Feb 9, 2016 at 1:38 PM, Rakesh Chalasani <vnit.rak...@gmail.com> >>> wrote: >>> >>>> Sorry, didn't realize the mail didn't show the code. Using Spark >>>> release 1.6.0 >>>> >>>> Below is an example to reproduce it. >>>> >>>> import org.apache.spark.sql.SQLContext >>>> val sqlContext = new SQLContext(sparkContext) >>>> import sqlContext.implicits._ >>>> import org.apache.spark.sql.functions >>>> >>>> case class Test(a:Int, b:Int) >>>> val data = sparkContext.parallelize(Array.range(0, 10).map(x => Test(x, >>>> x+1))) >>>> val df = data.toDF() >>>> val arrayCol = functions.array(df("a"), df("b")).as("arrayCol") >>>> >>>> this throws the following exception: >>>> ava.lang.UnsupportedOperationException >>>> at >>>> org.apache.spark.sql.catalyst.expressions.PrettyAttribute.nullable(namedExpressions.scala:289) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.CreateArray$$anonfun$dataType$3.apply(complexTypeCreator.scala:40) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.CreateArray$$anonfun$dataType$3.apply(complexTypeCreator.scala:40) >>>> at >>>> scala.collection.IndexedSeqOptimized$$anonfun$exists$1.apply(IndexedSeqOptimized.scala:40) >>>> at >>>> scala.collection.IndexedSeqOptimized$$anonfun$exists$1.apply(IndexedSeqOptimized.scala:40) >>>> at >>>> scala.collection.IndexedSeqOptimized$class.segmentLength(IndexedSeqOptimized.scala:189) >>>> at >>>> scala.collection.mutable.ArrayBuffer.segmentLength(ArrayBuffer.scala:47) >>>> at >>>> scala.collection.GenSeqLike$class.prefixLength(GenSeqLike.scala:92) >>>> at scala.collection.AbstractSeq.prefixLength(Seq.scala:40) >>>> at >>>> scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:40) >>>> at >>>> scala.collection.mutable.ArrayBuffer.exists(ArrayBuffer.scala:47) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.CreateArray.dataType(complexTypeCreator.scala:40) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.Alias.dataType(namedExpressions.scala:136) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.NamedExpression$class.typeSuffix(namedExpressions.scala:84) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.Alias.typeSuffix(namedExpressions.scala:120) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.Alias.toString(namedExpressions.scala:155) >>>> at >>>> org.apache.spark.sql.catalyst.expressions.Expression.prettyString(Expression.scala:207) >>>> at org.apache.spark.sql.Column.toString(Column.scala:138) >>>> at java.lang.String.valueOf(String.java:2994) >>>> at scala.runtime.ScalaRunTime$.stringOf(ScalaRunTime.scala:331) >>>> at >>>> scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:337) >>>> at .<init>(<console>:20) >>>> at .<clinit>(<console>) >>>> at $print(<console>) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:497) >>>> at >>>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) >>>> at >>>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) >>>> at >>>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) >>>> at >>>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >>>> at >>>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >>>> >>>> On Tue, Feb 9, 2016 at 4:23 PM Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> Do you mind pastebin'ning code snippet and exception one more time - I >>>>> couldn't see them in your original email. >>>>> >>>>> Which Spark release are you using ? >>>>> >>>>> On Tue, Feb 9, 2016 at 11:55 AM, rakeshchalasani < >>>>> vnit.rak...@gmail.com> wrote: >>>>> >>>>>> Hi All: >>>>>> >>>>>> I am getting an "UnsupportedOperationException" when trying to alias >>>>>> an >>>>>> array column. The issue seems to be at "CreateArray" expression -> >>>>>> dataType, >>>>>> which checks for nullability of its children, while aliasing is >>>>>> creating a >>>>>> PrettyAttribute that does not implement nullability. >>>>>> >>>>>> Below is an example to reproduce it. >>>>>> >>>>>> >>>>>> >>>>>> this throws the following exception: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Error-aliasing-an-array-column-tp16288.html >>>>>> Sent from the Apache Spark Developers List mailing list archive at >>>>>> Nabble.com. >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>>>> >>>>>> >>>>> >>> >