Hi All,


Would like to gain some understanding on the questions listed below,



1.       When processing a large file with Apache Spark, with, say,
sc.textFile("somefile.xml"), does it split it for parallel processing
across executors or, will it be processed as a single chunk in a single
executor?

2.       When using dataframes, with implicit XMLContext from Databricks is
there any optimization prebuilt for such large file processing?



Please help!!!



http://stackoverflow.com/questions/39305310/does-spark-process-large-file-in-the-single-worker



Regards,

Somasundaram S

Reply via email to