Hello sedona community,

I am running a geospatial Sedona cluster on Amazon EMR. Specifically, my 
cluster is based on Spark 3.3.0 and Sedona 1.3.1-incubating.

In my cluster of there are 10 executor nodes, and each runs two executors based 
on my configuration. I am using the above cluster to run a large st_contains 
join between two datasets in geoparquet format.

The issue I have been experiencing is, the vast majority of the executors 
complete their tasks within about 2 minutes. However 1 or two executors are 
stuck on the last few jobs. From the stderr logs on Spark History Web UI, this 
is the final state of the problematic executors:

23/04/19 05:37:10 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 1600001
23/04/19 06:20:08 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 1700001
23/04/19 07:02:13 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 1800001
23/04/19 07:45:25 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 1900001
23/04/19 08:29:24 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 2000001
23/04/19 09:14:25 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 2100001
23/04/19 09:56:18 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 2200001
23/04/19 10:38:24 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 2300001
23/04/19 11:21:53 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 2400001
23/04/19 12:05:49 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 2500001
23/04/19 12:53:34 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 2600001
23/04/19 13:36:55 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 2700001
23/04/19 14:19:18 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 2800001
23/04/19 15:04:06 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 2900001
23/04/19 16:08:55 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 3000001
23/04/19 16:51:07 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 3100001
23/04/19 17:36:03 INFO DynamicIndexLookupJudgement: [73, PID=2] [Streaming 
shapes] Reached a milestone: 3200001

I would greatly appreciate any comments on what do the above logs indicate, and 
what needs to be done with either the spark sql query, the input datasets or 
spark/Sedona configurations to alleviate this issue. Thank you very much!



Reply via email to