As far as reading the tar itself, since we're looking for an implementation of o.a.h.FileSystem, a pointer to LocalFileSystem should work, though I've not tried it.
In terms of loading extra files (akin to Hadoop's DistributedCache, I imagine?), there's no direct support for this at the moment. What I'd do is have each task bring in the files (hosted for instance on a small local http server?) during the initialization phase (obtained by implementing InitableStreamTask). For more direct support, we could consider adding something into the configuration of resources to obtain during startup or, as part of another idea I've been kicking around, we could introduce some user code to be run on the AM before task startup that could grab these files and host them locally. -Jakob On Fri, Mar 14, 2014 at 9:13 AM, Anh Thu Vu <[email protected]> wrote: > Hmm, I think samza does support reading the tar from local FS too. If so, I > just have the first question about allowing extra optional resources > > Casey > > > On Fri, Mar 14, 2014 at 4:02 PM, Anh Thu Vu <[email protected]> wrote: > > > Hi guys, > > > > I have an extra (data) file that is required for my tasks in each > > container. This file is created on-the-fly and not included in my > > pre-created tar file. So I wonder if samza currently supports this > > functionality (to let user specify the extra resources to include when > > launching a task). > > > > If not, what do you guys think about having this with a a patch? > > > > Lastly, if I'm not wrong, samza currently only support either HTTP or > HDFS > > as the filesystem to read the tar file from. I think it would be nice to > be > > able to pass the tar file from the local filesystem of the launching > node. > > What do you think? > > > > Casey > > >
