Hi,
Can you please try to see if you can increase the number of cores per task,
and therefore give each task more memory per executor?
I do not understand what is the XML, what is the data in it, and what is
the problem that you are trying to solve writing UDF's to parse XML. So
maybe we are not
Really depends on what your UDF is doing. You could read 2GB of XML into
much more than that as a DOM representation in memory.
Remember 15GB of executor memory is shared across tasks.
You need to get a handle on what memory your code is using to begin with to
start to reason about whether that's
Thanks for your quick response.
For some reasons I can't use spark-xml (schema related issue).
I've tried reducing number of tasks per executor by increasing the number
of executors, but it still throws same error.
I can't understand why does even 15gb of executor memory is not sufficient
to
Executor memory used shows data that is cached, not the VM usage. You're
running out of memory somewhere, likely in your UDF, which probably parses
massive XML docs as a DOM first or something. Use more memory, fewer tasks
per executor, or consider using spark-xml if you are really just parsing
I'm doing some complex operations inside spark UDF (parsing huge XML).
Dataframe:
| value |
| Content of XML File 1 |
| Content of XML File 2 |
| Content of XML File N |
val df = Dataframe.select(UDF_to_parse_xml(value))
UDF looks something like:
val XMLelements : Array[MyClass1] =