Hello spark community. I wanted to ask if any work has been done on porting TeraSort (Tera Gen/Sort/Validate) from Hadoop to Spark on EC2/EMR I am looking for some guidance on lessons learned from this or similar efforts as we are trying to do some benchmarking on some of the newer EC2 instances to determine how to optimize in-memory processing of these instances with Spark for some of AWS' customers looking to move to Spark for their data processing workloads.
Any guidance the community can provide on this effort is greatly appreciated! Thanks, Dario Rivera Solutions Architect Cell: 571-205-2731 Email: dar...@amazon.com<mailto:dar...@amazon.com> [AWS Graphic]
<<inline: image003.jpg>>