What is DataFilters and while joining why is the filter isnotnull[joinKey] applied twice

2023-01-31 Thread Nitin Siwach
Pyspark version:3.1.3 *Question 1: *What is DataFilters in spark physical plan? How is it different from PushedFilters? *Question 2:* When joining two datasets, Why is the filter isnotnull applied twice on the joining key column? In the physical plan, it is once applied as a PushedFilter and then

Fwd: [Spark Standalone Mode] How to read from kerberised HDFS in spark standalone mode

2023-01-31 Thread Wei Yan
Glad to hear that! And hope it can help any other guys facing the same problem. -- Forwarded message - 发件人: Bansal, Jaimita Date: 2023年2月1日周三 03:15 Subject: RE: [Spark Standalone Mode] How to read from kerberised HDFS in spark standalone mode To: Wei Yan Cc: Chittajallu, Rajiv ,

[Spark/deeplyR] how come spark is caching tables read through jdbc connection from oracle, even when memory=false is chosen

2023-01-31 Thread Joris Billen
This question is related to using Spark and deeplyR. We load a lot of data from oracle in dataframes through a jdbc connection: dfX <- spark_read_jdbc(spConn, “myconnection", options = list( url = urlDEVdb, driver = "oracle.jdbc.OracleDriver",