Hello,

we are writing a lot of data processing pipelines for Spark using pyspark
and add a lot of integration tests.

In our enterprise environment, a lot of people are running Windows PCs and
we notice that build times are really slow on Windows because of the
integration tests. These metrics are compared against the run of the builds
on Mac (dev PCs) or Linux (our CI servers are Linux).

We can not identify easily what is causing the slow down, but it's mostly
pyspark communicating with spark on the JVM.

Any pointers/clues to where to look for more information?
Obviously, plain help in the matter is more then welcome as well.

Kind regards,
Wim

Reply via email to