Hi Marcelo, Thanks for the reference, again. I looked at your code - really great work! I had to replace Spark distribution to use it though - could not figure out how to build it separately.
Repository that I linked to does not require rebuilding Spark and could be used with current distribution, which is preferable in my case. Kind regards, Ivan On Wed, 19 Jul 2017 at 4:44 AM, Ivan Sadikov <ivan.sadi...@gmail.com> wrote: > Thanks for JIRA ticket reference! Frankly, I was aware of this work, but > didn't know that there was an API for storage implementation. > > Will try exploring that as well, thanks! > On Wed, 19 Jul 2017 at 4:18 AM, Marcelo Vanzin <van...@cloudera.com> > wrote: > >> See SPARK-18085. That has much of the same goals re: SHS resource >> usage, and also provides a (currently non-public) API where you could >> just create a MongoDB implementation if you want. >> >> On Tue, Jul 18, 2017 at 12:56 AM, Ivan Sadikov <ivan.sadi...@gmail.com> >> wrote: >> > Hello everyone! >> > >> > I have been working on Spark history server that uses MongoDB as a >> datastore >> > for processed events to iterate on idea that Spree project uses for >> Spark >> > UI. Project was originally designed to improve on standalone history >> server >> > with reduced memory footprint. >> > >> > Project lives here: https://github.com/lightcopy/history-server >> > >> > These are just very early days of the project, sort of pre-alpha (some >> > features are missing, and metrics in some failed jobs cases are >> > questionable). Code is being tested on several 8gb and 2gb logs and >> aims to >> > lower resource usage since we run history server together with several >> other >> > systems. >> > >> > Would greatly appreciate any feedback on repository (issues/pull >> > requests/suggestions/etc.). Thanks a lot! >> > >> > >> > Cheers, >> > >> > Ivan >> > >> >> >> >> -- >> Marcelo >> >