Re: Re: the issue about the + in column,can we support the string please?

2018-04-01 Thread 1427357...@qq.com
Hi , 
I checked the code.
It seems it is hard to change the code.
Current code, string + int is translated to double + double.
If I change the the string + int to string + sting, it will incompatible whit 
old version.

Does anyone have better idea about this issue please?



1427357...@qq.com
 
From: Shmuel Blitz
Date: 2018-03-26 17:17
To: 1427357...@qq.com
CC: spark users; dev
Subject: Re: Re: the issue about the + in column,can we support the string 
please?
I agree.

Just pointed out the option, in case you missed it.

Cheers,
Shmuel

On Mon, Mar 26, 2018 at 10:57 AM, 1427357...@qq.com <1427357...@qq.com> wrote:
Hi,

Using concat is one of the way.
But the + is more intuitive and easy to understand.



1427357...@qq.com
 
From: Shmuel Blitz
Date: 2018-03-26 15:31
To: 1427357...@qq.com
CC: spark?users; dev
Subject: Re: the issue about the + in column,can we support the string please?
Hi,

you can get the same with:

import org.apache.spark.sql.functions._
import sqlContext.implicits._
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, 
StructType} 

val schema = StructType(Array(StructField("name", StringType),
StructField("age", IntegerType) ))

val lst = List(Row("Shmuel", 13), Row("Blitz", 23))
val rdd = sc.parallelize(lst)

val df = sqlContext.createDataFrame(rdd,schema)

df.withColumn("newName", concat($"name" ,  lit("abc"))  ).show()

On Mon, Mar 26, 2018 at 6:36 AM, 1427357...@qq.com <1427357...@qq.com> wrote:
Hi  all,

I have a table like below:

+---+-+---+
| id| name|sharding_id|
+---+-+---+
|  1|leader us|  1|
|  3|mycat|  1|
+---+-+---+

My schema is :
root
 |-- id: integer (nullable = false)
 |-- name: string (nullable = true)
 |-- sharding_id: integer (nullable = false)

I want add a new column named newName. The new column is based on "name" and 
append "abc" after it. My code looks like:

stud_scoreDF.withColumn("newName", stud_scoreDF.col("name") +  "abc"  ).show()
When I run the code, I got the reslult:
+---+-+---+---+
| id| name|sharding_id|newName|
+---+-+---+---+
|  1|leader us|  1|   null|
|  3|mycat|  1|   null|
+---+-+---+---+


I checked the code, the key code is  in arithmetic.scala. line 165.
It looks like:

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = dataType 
match {
  case dt: DecimalType =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$$plus($eval2)")
  case ByteType | ShortType =>
defineCodeGen(ctx, ev,
  (eval1, eval2) => s"(${ctx.javaType(dataType)})($eval1 $symbol $eval2)")
  case CalendarIntervalType =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.add($eval2)")
  case _ =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1 $symbol $eval2")
}

My issue is:
Can we add case StringType in this class to support string append please?





1427357...@qq.com



-- 
Shmuel Blitz 
Big Data Developer 
Email: shmuel.bl...@similarweb.com 
www.similarweb.com 



-- 
Shmuel Blitz 
Big Data Developer 
Email: shmuel.bl...@similarweb.com 
www.similarweb.com 


Re: Re: the issue about the + in column,can we support the string please?

2018-03-26 Thread Shmuel Blitz
I agree.

Just pointed out the option, in case you missed it.

Cheers,
Shmuel

On Mon, Mar 26, 2018 at 10:57 AM, 1427357...@qq.com <1427357...@qq.com>
wrote:

> Hi,
>
> Using concat is one of the way.
> But the + is more intuitive and easy to understand.
>
> --
> 1427357...@qq.com
>
>
> *From:* Shmuel Blitz <shmuel.bl...@similarweb.com>
> *Date:* 2018-03-26 15:31
> *To:* 1427357...@qq.com
> *CC:* spark?users <user@spark.apache.org>; dev <d...@spark.apache.org>
> *Subject:* Re: the issue about the + in column,can we support the string
> please?
> Hi,
>
> you can get the same with:
>
> import org.apache.spark.sql.functions._
> import sqlContext.implicits._
> import org.apache.spark.sql.types.{IntegerType, StringType, StructField,
> StructType}
>
> val schema = StructType(Array(StructField("name", StringType),
> StructField("age", IntegerType) ))
>
> val lst = List(Row("Shmuel", 13), Row("Blitz", 23))
> val rdd = sc.parallelize(lst)
>
> val df = sqlContext.createDataFrame(rdd,schema)
>
> df.withColumn("newName", concat($"name" ,  lit("abc"))  ).show()
>
> On Mon, Mar 26, 2018 at 6:36 AM, 1427357...@qq.com <1427357...@qq.com>
> wrote:
>
>> Hi  all,
>>
>> I have a table like below:
>>
>> +---+-+---+
>> | id| name|sharding_id|
>> +---+-+---+
>> |  1|leader us|  1|
>> |  3|mycat|  1|
>> +---+-+---+
>>
>> My schema is :
>> root
>>  |-- id: integer (nullable = false)
>>  |-- name: string (nullable = true)
>>  |-- sharding_id: integer (nullable = false)
>>
>> I want add a new column named newName. The new column is based on "name"
>> and append "abc" after it. My code looks like:
>>
>> stud_scoreDF.withColumn("newName", stud_scoreDF.col("name") +  "abc"  
>> ).show()
>>
>> When I run the code, I got the reslult:
>> +---+-+---+---+
>> | id| name|sharding_id|newName|
>> +---+-+---+---+
>> |  1|leader us|  1|   null|
>> |  3|mycat|  1|   null|
>> +---+-+---+---+
>>
>>
>> I checked the code, the key code is  in arithmetic.scala. line 165.
>> It looks like:
>>
>> override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = 
>> dataType match {
>>   case dt: DecimalType =>
>> defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$$plus($eval2)")
>>   case ByteType | ShortType =>
>> defineCodeGen(ctx, ev,
>>   (eval1, eval2) => s"(${ctx.javaType(dataType)})($eval1 $symbol 
>> $eval2)")
>>   case CalendarIntervalType =>
>> defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.add($eval2)")
>>   case _ =>
>> defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1 $symbol $eval2")
>> }
>>
>>
>> My issue is:
>> Can we add case StringType in this class to support string append please?
>>
>>
>>
>> --
>> 1427357...@qq.com
>>
>
>
>
> --
> Shmuel Blitz
> Big Data Developer
> Email: shmuel.bl...@similarweb.com
> www.similarweb.com
> <https://www.facebook.com/SimilarWeb/>
> <https://www.linkedin.com/company/429838/>
> <https://twitter.com/similarweb>
>
>


-- 
Shmuel Blitz
Big Data Developer
Email: shmuel.bl...@similarweb.com
www.similarweb.com
<https://www.facebook.com/SimilarWeb/>
<https://www.linkedin.com/company/429838/> <https://twitter.com/similarweb>


Re: Re: the issue about the + in column,can we support the string please?

2018-03-26 Thread 1427357...@qq.com
Hi,

Using concat is one of the way.
But the + is more intuitive and easy to understand.



1427357...@qq.com
 
From: Shmuel Blitz
Date: 2018-03-26 15:31
To: 1427357...@qq.com
CC: spark?users; dev
Subject: Re: the issue about the + in column,can we support the string please?
Hi,

you can get the same with:

import org.apache.spark.sql.functions._
import sqlContext.implicits._
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, 
StructType} 

val schema = StructType(Array(StructField("name", StringType),
StructField("age", IntegerType) ))

val lst = List(Row("Shmuel", 13), Row("Blitz", 23))
val rdd = sc.parallelize(lst)

val df = sqlContext.createDataFrame(rdd,schema)

df.withColumn("newName", concat($"name" ,  lit("abc"))  ).show()

On Mon, Mar 26, 2018 at 6:36 AM, 1427357...@qq.com <1427357...@qq.com> wrote:
Hi  all,

I have a table like below:

+---+-+---+
| id| name|sharding_id|
+---+-+---+
|  1|leader us|  1|
|  3|mycat|  1|
+---+-+---+

My schema is :
root
 |-- id: integer (nullable = false)
 |-- name: string (nullable = true)
 |-- sharding_id: integer (nullable = false)

I want add a new column named newName. The new column is based on "name" and 
append "abc" after it. My code looks like:

stud_scoreDF.withColumn("newName", stud_scoreDF.col("name") +  "abc"  ).show()
When I run the code, I got the reslult:
+---+-+---+---+
| id| name|sharding_id|newName|
+---+-+---+---+
|  1|leader us|  1|   null|
|  3|mycat|  1|   null|
+---+-+---+---+


I checked the code, the key code is  in arithmetic.scala. line 165.
It looks like:

override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = dataType 
match {
  case dt: DecimalType =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$$plus($eval2)")
  case ByteType | ShortType =>
defineCodeGen(ctx, ev,
  (eval1, eval2) => s"(${ctx.javaType(dataType)})($eval1 $symbol $eval2)")
  case CalendarIntervalType =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.add($eval2)")
  case _ =>
defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1 $symbol $eval2")
}

My issue is:
Can we add case StringType in this class to support string append please?





1427357...@qq.com



-- 
Shmuel Blitz 
Big Data Developer 
Email: shmuel.bl...@similarweb.com 
www.similarweb.com