Hello

I am new to Apache Spark and am looking for some close guidance or 
collaboration for my Spark Project which has the following main components:

1. Writing scripts for automated setup of a multi-node cluster for Apache Spark 
with Hadoop File System (HDFS). This is required since I don’t have a fixed set 
of machines to run my Spark experiments and hence, need an easy, quick and 
automated way to do the entire Spark setup.

2. Writing scripts for simple SQL queries which read input from HDFS, run the 
SQL queries on the multi-node spark cluster and store the output in HDFS.

3. Generating detailed profiling results such as latency, shuffled data size 
for every task/operator in the SQL query and generating graphs for the same.

Please let me know if anyone is interested. Happy to discuss in more detail.

Thanks
Dhruv
dh...@umn.edu <mailto:dh...@umn.edu>

--------------------------------------------------
Dhruv Kumar
PhD Candidate
Computer Science and Engineering
University of Minnesota
www.dhruvkumar.me <http://dhruvkumar.me/>

Reply via email to