I have a Spark job that read data from database. By increasing submit parameter 
'--driver-memory 25g' the job can works without a problem locally but not in 
prod env because prod master do not have enough capacity.

So I have a few questions:

-  What functions such as collecct() would cause the data to be sent back to 
the driver program?
  My job so far merely uses `as`, `filter`, `map`, and `filter`.

- Is it possible to write data (in parquet format for instance) to hdfs 
directly from the executor? If so how can I do (any code snippet, doc for 
reference, or what keyword to search cause can't find by e.g. `spark direct 
executor hdfs write`)?

Thanks

Reply via email to