only an answer to one of your questions:
What about log statements in the > partition processing functions? Will their log statements get logged into > a > file residing on a given 'slave' machine, or will Spark capture this log > output and divert it into the log file of the driver's machine? > they get logged to files on the remote nodes. You can view the logs for each executor through the UI. If you are using spark on yarn, you can grab all the logs with "yarn logs".