Re: array_contains in package org.apache.spark.sql.functions

2018-06-14 Thread 刘崇光
Hello Takuya,

Thanks for your message. I will do the JIRA and PR.

Best regards,
Chongguang

On Thu, Jun 14, 2018 at 11:25 PM, Takuya UESHIN 
wrote:

> Hi Chongguang,
>
> Thanks for the report!
>
> That makes sense and the proposition should work, or we can add something
> like `def array_contains(column: Column, value: Column)`.
> Maybe other functions, such as `array_position`, `element_at`, are the
> same situation.
>
> Could you file a JIRA, and submit a PR if possible?
> We can have a discussion more about the issue there.
>
> Btw, I guess you can use `expr("array_contains(columnA, columnB)")` as a
> workaround.
>
> Thanks.
>
>
> On Thu, Jun 14, 2018 at 2:15 AM, 刘崇光  wrote:
>
>>
>> -- Forwarded message --
>> From: 刘崇光 
>> Date: Thu, Jun 14, 2018 at 11:08 AM
>> Subject: array_contains in package org.apache.spark.sql.functions
>> To: user@spark.apache.org
>>
>>
>> Hello all,
>>
>> I ran into a use case in project with spark sql and want to share with
>> you some thoughts about the function array_contains.
>>
>> Say I have a Dataframe containing 2 columns. Column A of type "Array of
>> String" and Column B of type "String". I want to determine if the value of
>> column B is contained in the value of column A, without using a udf of
>> course.
>> The function array_contains came into my mind naturally:
>>
>> def array_contains(column: Column, value: Any): Column = withExpr {
>>   ArrayContains(column.expr, Literal(value))
>> }
>>
>> However the function takes the column B and does a "Literal" of column B,
>> which yields a runtime exception: RuntimeException("Unsupported literal
>> type " + v.getClass + " " + v).
>>
>> Then after discussion with my friends, we fund a solution without using
>> udf:
>>
>> new Column(ArrayContains(df("ColumnA").expr, df("ColumnB").expr)
>>
>>
>> With this solution, I think of empowering a little bit more the function,
>> by doing like this:
>>
>> def array_contains(column: Column, value: Any): Column = withExpr {
>>   value match {
>> case c: Column => ArrayContains(column.expr, c.expr)
>> case _ => ArrayContains(column.expr, Literal(value))
>>   }
>> }
>>
>>
>> It does a pattern matching to detect if value is of type Column. If yes,
>> it will use the .expr of the column, otherwise it will work as it used to.
>>
>> Any suggestion or opinion on the proposition?
>>
>>
>> Kind regards,
>> Chongguang LIU
>>
>>
>
>
> --
> Takuya UESHIN
> Tokyo, Japan
>
> http://twitter.com/ueshin
>


Re: array_contains in package org.apache.spark.sql.functions

2018-06-14 Thread Takuya UESHIN
Hi Chongguang,

Thanks for the report!

That makes sense and the proposition should work, or we can add something
like `def array_contains(column: Column, value: Column)`.
Maybe other functions, such as `array_position`, `element_at`, are the same
situation.

Could you file a JIRA, and submit a PR if possible?
We can have a discussion more about the issue there.

Btw, I guess you can use `expr("array_contains(columnA, columnB)")` as a
workaround.

Thanks.


On Thu, Jun 14, 2018 at 2:15 AM, 刘崇光  wrote:

>
> -- Forwarded message --
> From: 刘崇光 
> Date: Thu, Jun 14, 2018 at 11:08 AM
> Subject: array_contains in package org.apache.spark.sql.functions
> To: user@spark.apache.org
>
>
> Hello all,
>
> I ran into a use case in project with spark sql and want to share with you
> some thoughts about the function array_contains.
>
> Say I have a Dataframe containing 2 columns. Column A of type "Array of
> String" and Column B of type "String". I want to determine if the value of
> column B is contained in the value of column A, without using a udf of
> course.
> The function array_contains came into my mind naturally:
>
> def array_contains(column: Column, value: Any): Column = withExpr {
>   ArrayContains(column.expr, Literal(value))
> }
>
> However the function takes the column B and does a "Literal" of column B,
> which yields a runtime exception: RuntimeException("Unsupported literal
> type " + v.getClass + " " + v).
>
> Then after discussion with my friends, we fund a solution without using
> udf:
>
> new Column(ArrayContains(df("ColumnA").expr, df("ColumnB").expr)
>
>
> With this solution, I think of empowering a little bit more the function,
> by doing like this:
>
> def array_contains(column: Column, value: Any): Column = withExpr {
>   value match {
> case c: Column => ArrayContains(column.expr, c.expr)
> case _ => ArrayContains(column.expr, Literal(value))
>   }
> }
>
>
> It does a pattern matching to detect if value is of type Column. If yes,
> it will use the .expr of the column, otherwise it will work as it used to.
>
> Any suggestion or opinion on the proposition?
>
>
> Kind regards,
> Chongguang LIU
>
>


-- 
Takuya UESHIN
Tokyo, Japan

http://twitter.com/ueshin