I am not sure what you compare here. You would need to provide additional details, such as algorithms and functionality supported by your framework. For instance, Spark has built-in fault-tolerance and is a generic framework, which has advantage with respect to development and operations, but may have disadvantage in certain use cases wrt performance. Another concern it is the SDN which could be configured in disadvantageous for your approach or for Spark. I would not use it for generic performance comparison, except if it is the production network of your company and you want to compare it only for your company.
I doubt that focusing only on performance for a framework makes scientifically sense. Your approach sounds too simple to be of scientific value, but more for unscientific marketing purposes. That being said, it could be that you did not provide all the details. > On 01 Mar 2016, at 06:25, yasincelik <yasinceli...@gmail.com> wrote: > > Hello, > > I am working on a project as a part of my research. The system I am working > on is basically an in-memory computing system. I want to compare its > performance with Spark. Here is how I conduct experiments. For my project: I > have a software defined network(SDN) that allows HPC applications to share > data, such as sending and receiving messages through this network. For > example, in a word count application, a master reads a 10GB text file from > hard drive, slices into small chunks, and distribute the chunks. Each worker > will fetch some chunks, process them, and send them back to the SDN. Then > master collects the results. > > To compare with Spark, I run word count application. I run Spark in > standalone mode. I do not use any cluster manager. There is no pre-installed > HDFS. I use PBS to reserve nodes, which gives me list of nodes. Then I > simply run Spark on these nodes. Here is the command to run Spark: > ~/SPARK/bin/spark-submit --class word.JavaWordCount --num-executors 1 > spark.jar ~/data.txt > ~/wc > > Technically, these experiments are run under same conditions. Read file, cut > it into small chunks, distribute chunks, process chunks, collect results. > Do you think this is a reasonable comparison? Can someone make this claim: > "Well, Spark is designed to work on top of HDFS, in which the data is > already stored in nodes, and Spark jobs are submitted to these nodes to take > advantage of data locality" > > > Any comment is appreciated. > > Thanks > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-performance-comparison-for-research-tp16498.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org