Re: Spark 2.2 With Column usage

2019-06-11 Thread Jacek Laskowski
Hi,

Why are you doing the following two lines?

.select("id",lit(referenceFiltered))
.selectexpr(
"id"
)

What are you trying to achieve? What's lit and what's referenceFiltered?
What's the difference between select and selectexpr? Please start at
http://spark.apache.org/docs/latest/sql-programming-guide.html and then hop
onto
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package
to
know the Spark API better. I'm sure you'll quickly find out the answer(s).

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming
https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
Follow me at https://twitter.com/jaceklaskowski



On Sat, Jun 8, 2019 at 12:53 PM anbutech  wrote:

> Thanks Jacek Laskowski Sir.but i didn't get the point here
>
> please advise the below one are you expecting:
>
> dataset1.as("t1)
>
> join(dataset3.as("t2"),
>
> col(t1.col1) === col(t2.col1), JOINTYPE.Inner )
>
> .join(dataset4.as("t3"), col(t3.col1) === col(t1.col1),
>
> JOINTYPE.Inner)
> .select("id",lit(referenceFiltered))
> .selectexpr(
> "id"
> )
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Spark 2.2 With Column usage

2019-06-08 Thread anbutech
Thanks Jacek Laskowski Sir.but i didn't get the point here

please advise the below one are you expecting:

dataset1.as("t1) 

join(dataset3.as("t2"), 

col(t1.col1) === col(t2.col1), JOINTYPE.Inner ) 

.join(dataset4.as("t3"), col(t3.col1) === col(t1.col1), 

JOINTYPE.Inner) 
.select("id",lit(referenceFiltered)) 
.selectexpr( 
"id"
)



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark 2.2 With Column usage

2019-06-08 Thread Jacek Laskowski
Hi,

> val referenceFiltered = dataset2.filter(.dataDate ==
date).filter.someColumn).select("id").toString
> .withColumn("new_column",lit(referenceFiltered))

That won't work since lit is a function (adapter) to convert Scala values
to Catalyst expressions.

Unless I'm mistaken, in your case, what you really need is to replace
`withColumn` with `select("id")` itself and you're done.

When I'm writing this (I'm saying exactly what you actually have already)
and I'm feeling confused.

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming
https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
Follow me at https://twitter.com/jaceklaskowski



On Sat, Jun 8, 2019 at 6:05 AM anbutech  wrote:

> Hi Sir,
>
> Could you please advise to fix the below issue in the withColumn in the
> spark 2.2 scala 2.11 joins
>
> def processing(spark:SparkSession,
>
> dataset1:Dataset[Reference],
>
> dataset2:Dataset[DataCore],
>
> dataset3:Dataset[ThirdPartyData] ,
>
> dataset4:Dataset[OtherData]
>
> date:String):Dataset[DataMerge] {
>
> val referenceFiltered = dataset2.filter(.dataDate ==
> date).filter.someColumn).select("id").toString
>
> dataset1.as("t1)
>
> join(dataset3.as("t2"),
>
> col(t1.col1) === col(t2.col1), JOINTYPE.Inner )
>
> .join(dataset4.as("t3"), col(t3.col1) === col(t1.col1),
>
> JOINTYPE.Inner)
>
> .withColumn("new_column",lit(referenceFiltered))
>
> .selectexpr(
>
> "id", ---> want to get this value
>
> "column1,
>
> "column2,
>
> "column3",
>
> "column4" )
>
> }
>
> how do i get the String value ,let say the value"124567"
> ("referenceFiltered") inside the withColumn?
>
> im getting the withColumn output as "id:BigInt" . I want to get the same
> value for all the records.
>
> Note:
>
> I have asked not use cross join in the code. Any other way to fix this
> issue.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Spark 2.2 With Column usage

2019-06-07 Thread anbutech
Hi Sir,

Could you please advise to fix the below issue in the withColumn in the
spark 2.2 scala 2.11 joins

def processing(spark:SparkSession,

dataset1:Dataset[Reference],

dataset2:Dataset[DataCore],

dataset3:Dataset[ThirdPartyData] ,

dataset4:Dataset[OtherData]

date:String):Dataset[DataMerge] {

val referenceFiltered = dataset2.filter(.dataDate ==
date).filter.someColumn).select("id").toString

dataset1.as("t1)

join(dataset3.as("t2"),

col(t1.col1) === col(t2.col1), JOINTYPE.Inner )

.join(dataset4.as("t3"), col(t3.col1) === col(t1.col1),

JOINTYPE.Inner)

.withColumn("new_column",lit(referenceFiltered))

.selectexpr(

"id", ---> want to get this value

"column1,

"column2,

"column3",

"column4" )

}

how do i get the String value ,let say the value"124567"
("referenceFiltered") inside the withColumn?

im getting the withColumn output as "id:BigInt" . I want to get the same
value for all the records.

Note:

I have asked not use cross join in the code. Any other way to fix this
issue.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org