Re: Re: A question about broadcast nest loop join

2019-10-23 Thread Wenchen Fan
Ah sorry I made a mistake. "Spark can only pick BroadcastNestedLoopJoin to implement left/right join" this should be "left/right non-equal join" On Thu, Oct 24, 2019 at 6:32 AM zhangliyun wrote: > > Hi Herman: >I guess what you mentioned before > ``` > if you are OK with slightly different

Re:Re: A question about broadcast nest loop join

2019-10-23 Thread zhangliyun
Hi Herman: I guess what you mentioned before ``` if you are OK with slightly different NULL semantics then you could use NOT EXISTS(subquery). The latter should perform a lot better. ``` is the NULL key1 of left table will be retained if NULL key2 is not found in the right table ( join

Re:Re: A question about broadcast nest loop join

2019-10-23 Thread zhangliyun
Hi all: From google , I know that: Spark can only pick BroadcastNestedLoopJoin to implement left/right join. but why I use following case , broascastnestedLoopJoin became Sortmerged join when set spark.sql.autoBroadcastJoinThreshold=-1; {code} set

Delete checkpointed data for a single dataset?

2019-10-23 Thread Isabelle Phan
Hello, In a non streaming application, I am using the checkpoint feature to truncate the lineage of complex datasets. At the end of the job, the checkpointed data, which is stored in HDFS, is deleted. I am looking for a way to delete the unused checkpointed data earlier than the end of the job.

RE: [External Sender] Spark Executor pod not getting created on kubernetes cluster

2019-10-23 Thread manishgupta88
Thanks Abhisehk I was able to resolve the issue. I was building an assembly jar which has some unwanted spring and netty classes. Because of which I was getting that exception. Regards Manish Gupta -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Spark executor pods not getting killed after task completion

2019-10-23 Thread manish gupta
Hi I am trying to run spark submit on kubernetes. I am able to achieve the desired results in a way that driver and executors are getting launched as per the given configuration and my job is able to run successfully. *But even after job completion spark driver pod is always in Running state and

Re: A question about broadcast nest loop join

2019-10-23 Thread Wenchen Fan
I haven't looked into your query yet, just want to let you know that: Spark can only pick BroadcastNestedLoopJoin to implement left/right join. If the table is very big, then OOM happens. Maybe there is an algorithm to implement left/right join in a distributed environment without broadcast, but

Re: driver crashesneed to find out why driver keeps crashing

2019-10-23 Thread Akshay Bhardwaj
Hi, Were you able to check the executors logs for this? If executors are running in a separate JVMs/machines, they will have separate log files from driver. If the OOME is due to concatenation of the large string, it may be reported in the executors logs first. How are you running this spark

A question about broadcast nest loop join

2019-10-23 Thread zhangliyun
Hi all: i want to ask a question about broadcast nestloop join? from google i know, that left outer/semi join and right outer/semi join will use broadcast nestloop. and in some cases, when the input data is very small, it is suitable to use. so here how to define the input data very