Re: Cache not getting cleaned.

2020-11-21 Thread Amit Sharma
please find attached the screenshot of no active task but memory i still
used .

[image: image.png]

On Sat, Nov 21, 2020 at 4:25 PM Amit Sharma  wrote:

> I am using df.cache and also unpersisting it. But when I check spark Ui
> storage I still see cache memory usage. Do I need to do any thing else.
>
> Also in executor tab on spark Ui for each executor memory used/total
> memory always display some used memory not sure if no request on streaming
> job then usages should be 0.
>
> Thanks
> Amit
>


Re: Spark SQL check timestamp with other table and update a column.

2020-11-21 Thread Khalid Mammadov

Hi,

I am not sure if you were writing pseudo-code or real one but there were 
few issues in the sql.


I have reproduced you example in the Spark REPL and all worked as 
expected and result is the one you need


Please see below full code:

## *Spark 3.0.0*

>>> a = spark.read.csv("tab1", sep="|", header=True, schema="col1 
TIMESTAMP, col2 INT, col3 STRING")

>>> a.printSchema
int, col3: string]>


>>> a.show()
++++
|    col1|col2|col3|
++++
|2020-11-17 20:50:...|   1|null|
++++

>>> b = spark.read.csv("tab2", sep="|", header=True, schema="col1 
TIMESTAMP, col2 INT, col3 STRING")

>>> b.printSchema
int, col3: string]>


>>> b.show()
++++
|    col1|col2|col3|
++++
|2020-11-17 21:19:...|   1| win|
|2020-11-17 20:49:...|   1| win|
|2020-11-17 20:19:...|   1| Win|
++++

>>> a.createOrReplaceTempView("table1")
>>> b.createOrReplaceTempView("table2")

>>> res = spark.sql("""
... select a.col1,a.col2, b.col1, b.col2, coalesce(a.col3,b.col3) as col3
... from table1 a *join* table2 b
... on (a.col2=b.col2) and (*a.col1 < b.col1*)
... """)
>>> res.show()
++++++
|    col1|col2|    col1|col2|col3|
++++++
|2020-11-17T20:50:...|   1|2020-11-17T21:19:...|   1| win|
++++++

Regards,

Khalid

On 19/11/2020 05:13, anbutech wrote:

select a.col1,a.col2.coalesce(a.col3,b.col3) as col3
from table1 a left table2 b
on (a.col2=b.col2) and (b.col1 < b.col1)


Cache not getting cleaned.

2020-11-21 Thread Amit Sharma
I am using df.cache and also unpersisting it. But when I check spark Ui
storage I still see cache memory usage. Do I need to do any thing else.

Also in executor tab on spark Ui for each executor memory used/total memory
always display some used memory not sure if no request on streaming job
then usages should be 0.

Thanks
Amit