Hi Dmitry, Yes, Tachyon can help with your use case. You can read and write to Tachyon via the filesystem api ( http://tachyon-project.org/documentation/File-System-API.html). There is a native Java API as well as a Hadoop-compatible API. Spark is also able to interact with Tachyon via the Hadoop-compatible API, so Spark jobs can read input files from Tachyon and write output files to Tachyon.
I hope that helps, Gene On Tue, Jan 12, 2016 at 4:26 AM, Dmitry Goldenberg <dgoldenberg...@gmail.com > wrote: > I'd guess that if the resources are broadcast Spark would put them into > Tachyon... > > On Jan 12, 2016, at 7:04 AM, Dmitry Goldenberg <dgoldenberg...@gmail.com> > wrote: > > Would it make sense to load them into Tachyon and read and broadcast them > from there since Tachyon is already a part of the Spark stack? > > If so I wonder if I could do that Tachyon read/write via a Spark API? > > > On Jan 12, 2016, at 2:21 AM, Sabarish Sasidharan < > sabarish.sasidha...@manthan.com> wrote: > > One option could be to store them as blobs in a cache like Redis and then > read + broadcast them from the driver. Or you could store them in HDFS and > read + broadcast from the driver. > > Regards > Sab > > On Tue, Jan 12, 2016 at 1:44 AM, Dmitry Goldenberg < > dgoldenberg...@gmail.com> wrote: > >> We have a bunch of Spark jobs deployed and a few large resource files >> such as e.g. a dictionary for lookups or a statistical model. >> >> Right now, these are deployed as part of the Spark jobs which will >> eventually make the mongo-jars too bloated for deployments. >> >> What are some of the best practices to consider for maintaining and >> sharing large resource files like these? >> >> Thanks. >> > > > > -- > > Architect - Big Data > Ph: +91 99805 99458 > > Manthan Systems | *Company of the year - Analytics (2014 Frost and > Sullivan India ICT)* > +++ > >