Will slider help our use case?

John Lilley Fri, 27 Jun 2014 08:41:35 -0700

Our software doesn't use MapReduce. It is a pure YARN application that is 
basically a peer to MapReduce. There are a lot of reasons for this decision, 
but the main one is that we have a large code base that already executes data 
transformations in a single-server environment, and we wanted to produce a 
product without rewriting huge swaths of code. Given that, our software takes 
care of many things usually delegated to MapReduce, including distributed 
sort/partition (i.e. "the shuffle"). However, MapReduce has a special place in 
the ecosystem, in that it creates an auxiliary service to handle the 
distribution of shuffle data to reducers. It doesn't look like third-party apps 
have an easy time installing aux services. The JARs for any such service must 
be in Hadoop's classpath on all nodes at startup, creating both a management 
issue and a trust/security issue. Currently our software places temporary data 
into HDFS for this purpose, but we've found that HDFS has a huge overhead in 
terms of performance and file handles, even at low replication. We desire to 
replace the use of HDFS with a lighter-weight service to manage temp files and 
distribute their data.


Is the slider project something that can address our needs?

John Lilley

Will slider help our use case?

Reply via email to