Re: Error aliasing an array column.

Rakesh Chalasani Tue, 09 Feb 2016 14:41:27 -0800

We are trying to dynamically create the query, with columns coming from
different places. We can over come this with a few more lines of code, but
it would be nice for us pass on the `alias` along (given that we can do so
for all the rest of the frame operations.)


Created JIRA here https://issues.apache.org/jira/browse/SPARK-13253

Thanks for the help.


On Tue, Feb 9, 2016 at 5:29 PM Ted Yu <yuzhih...@gmail.com> wrote:

> What's your plan of using the arrayCol ?
> It would be part of some query, right ?
>
> On Tue, Feb 9, 2016 at 2:27 PM, Rakesh Chalasani <vnit.rak...@gmail.com>
> wrote:
>
>> Do you mean using "alias" instead of "as"? Unfortunately, that didn't help
>>
>> > val arrayCol = functions.array(df("a"), df("b")).alias("arrayCol")
>>
>> still throws the error.
>>
>> Surprisingly, doing the same thing inside a select works,
>> > df.select(functions.array(df("a"), df("b")).as("arrayCol")).show()
>>
>> +--------+
>> |arrayCol|
>> +--------+
>> |  [0, 1]|
>> |  [1, 2]|
>> |  [2, 3]|
>> |  [3, 4]|
>> |  [4, 5]|
>> |  [5, 6]|
>> |  [6, 7]|
>> |  [7, 8]|
>> |  [8, 9]|
>> | [9, 10]|
>> +--------+
>>
>>
>>
>> On Tue, Feb 9, 2016 at 4:52 PM Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> How about changing the last line to:
>>>
>>> scala> val df2 = df.select(functions.array(df("a"),
>>> df("b")).alias("arrayCol"))
>>> df2: org.apache.spark.sql.DataFrame = [arrayCol: array<int>]
>>>
>>> scala> df2.show()
>>> +--------+
>>> |arrayCol|
>>> +--------+
>>> |  [0, 1]|
>>> |  [1, 2]|
>>> |  [2, 3]|
>>> |  [3, 4]|
>>> |  [4, 5]|
>>> |  [5, 6]|
>>> |  [6, 7]|
>>> |  [7, 8]|
>>> |  [8, 9]|
>>> | [9, 10]|
>>> +--------+
>>>
>>> FYI
>>>
>>> On Tue, Feb 9, 2016 at 1:38 PM, Rakesh Chalasani <vnit.rak...@gmail.com>
>>> wrote:
>>>
>>>> Sorry, didn't realize the mail didn't show the code. Using Spark
>>>> release 1.6.0
>>>>
>>>> Below is an example to reproduce it.
>>>>
>>>> import org.apache.spark.sql.SQLContext
>>>> val sqlContext = new SQLContext(sparkContext)
>>>> import sqlContext.implicits._
>>>> import org.apache.spark.sql.functions
>>>>
>>>> case class Test(a:Int, b:Int)
>>>> val data = sparkContext.parallelize(Array.range(0, 10).map(x => Test(x,
>>>> x+1)))
>>>> val df = data.toDF()
>>>> val arrayCol = functions.array(df("a"), df("b")).as("arrayCol")
>>>>
>>>> this throws the following exception:
>>>> ava.lang.UnsupportedOperationException
>>>>         at
>>>> org.apache.spark.sql.catalyst.expressions.PrettyAttribute.nullable(namedExpressions.scala:289)
>>>>         at
>>>> org.apache.spark.sql.catalyst.expressions.CreateArray$$anonfun$dataType$3.apply(complexTypeCreator.scala:40)
>>>>         at
>>>> org.apache.spark.sql.catalyst.expressions.CreateArray$$anonfun$dataType$3.apply(complexTypeCreator.scala:40)
>>>>         at
>>>> scala.collection.IndexedSeqOptimized$$anonfun$exists$1.apply(IndexedSeqOptimized.scala:40)
>>>>         at
>>>> scala.collection.IndexedSeqOptimized$$anonfun$exists$1.apply(IndexedSeqOptimized.scala:40)
>>>>         at
>>>> scala.collection.IndexedSeqOptimized$class.segmentLength(IndexedSeqOptimized.scala:189)
>>>>         at
>>>> scala.collection.mutable.ArrayBuffer.segmentLength(ArrayBuffer.scala:47)
>>>>         at
>>>> scala.collection.GenSeqLike$class.prefixLength(GenSeqLike.scala:92)
>>>>         at scala.collection.AbstractSeq.prefixLength(Seq.scala:40)
>>>>         at
>>>> scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:40)
>>>>         at
>>>> scala.collection.mutable.ArrayBuffer.exists(ArrayBuffer.scala:47)
>>>>         at
>>>> org.apache.spark.sql.catalyst.expressions.CreateArray.dataType(complexTypeCreator.scala:40)
>>>>         at
>>>> org.apache.spark.sql.catalyst.expressions.Alias.dataType(namedExpressions.scala:136)
>>>>         at
>>>> org.apache.spark.sql.catalyst.expressions.NamedExpression$class.typeSuffix(namedExpressions.scala:84)
>>>>         at
>>>> org.apache.spark.sql.catalyst.expressions.Alias.typeSuffix(namedExpressions.scala:120)
>>>>         at
>>>> org.apache.spark.sql.catalyst.expressions.Alias.toString(namedExpressions.scala:155)
>>>>         at
>>>> org.apache.spark.sql.catalyst.expressions.Expression.prettyString(Expression.scala:207)
>>>>         at org.apache.spark.sql.Column.toString(Column.scala:138)
>>>>         at java.lang.String.valueOf(String.java:2994)
>>>>         at scala.runtime.ScalaRunTime$.stringOf(ScalaRunTime.scala:331)
>>>>         at
>>>> scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:337)
>>>>         at .<init>(<console>:20)
>>>>         at .<clinit>(<console>)
>>>>         at $print(<console>)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>         at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>         at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>         at java.lang.reflect.Method.invoke(Method.java:497)
>>>>         at
>>>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>>>>         at
>>>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>>>>         at
>>>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>>>>         at
>>>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>>>>         at
>>>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>>>>
>>>> On Tue, Feb 9, 2016 at 4:23 PM Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> Do you mind pastebin'ning code snippet and exception one more time - I
>>>>> couldn't see them in your original email.
>>>>>
>>>>> Which Spark release are you using ?
>>>>>
>>>>> On Tue, Feb 9, 2016 at 11:55 AM, rakeshchalasani <
>>>>> vnit.rak...@gmail.com> wrote:
>>>>>
>>>>>> Hi All:
>>>>>>
>>>>>> I am getting an "UnsupportedOperationException" when trying to alias
>>>>>> an
>>>>>> array column. The issue seems to be at "CreateArray" expression ->
>>>>>> dataType,
>>>>>> which checks for nullability of its children, while aliasing is
>>>>>> creating a
>>>>>> PrettyAttribute that does not implement nullability.
>>>>>>
>>>>>> Below is an example to reproduce it.
>>>>>>
>>>>>>
>>>>>>
>>>>>> this throws the following exception:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Error-aliasing-an-array-column-tp16288.html
>>>>>> Sent from the Apache Spark Developers List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>
>

Re: Error aliasing an array column.

Reply via email to