Hey Casey, YARN (and therefore Samza) supports deployment from local filesystem. This is how hello-samza works. The yarn.package.path is specified with a file:/// URI.
As Jakob said, we don't support this explicitly right now, but an HTTP or HDFS path should suffice to pull it down. You could implement InitableTask, and pull down the data you need in the init() method. Longer term, one of the things we've talked about is the concept of a "global stream", which would be a stream whose partitions were all read by every TaskInstance. This would allow you to put any data you need into a stream, and just define the stream as "global" (or some better name). Then your TaskInstances would all fully read the stream (thereby bootstrapping all the data that it needs) before processing any other messages. This doesn't exist yet, but it's the long term direction for the type of use case you're describing. Cheers, Chris On 3/14/14 11:24 AM, "Jakob Homan" <[email protected]> wrote: >As far as reading the tar itself, since we're looking for an >implementation >of o.a.h.FileSystem, a pointer to LocalFileSystem should work, though I've >not tried it. > >In terms of loading extra files (akin to Hadoop's DistributedCache, I >imagine?), there's no direct support for this at the moment. What I'd do >is have each task bring in the files (hosted for instance on a small local >http server?) during the initialization phase (obtained by implementing >InitableStreamTask). > >For more direct support, we could consider adding something into the >configuration of resources to obtain during startup or, as part of another >idea I've been kicking around, we could introduce some user code to be run >on the AM before task startup that could grab these files and host them >locally. > >-Jakob > > >On Fri, Mar 14, 2014 at 9:13 AM, Anh Thu Vu <[email protected]> wrote: > >> Hmm, I think samza does support reading the tar from local FS too. If >>so, I >> just have the first question about allowing extra optional resources >> >> Casey >> >> >> On Fri, Mar 14, 2014 at 4:02 PM, Anh Thu Vu <[email protected]> >>wrote: >> >> > Hi guys, >> > >> > I have an extra (data) file that is required for my tasks in each >> > container. This file is created on-the-fly and not included in my >> > pre-created tar file. So I wonder if samza currently supports this >> > functionality (to let user specify the extra resources to include when >> > launching a task). >> > >> > If not, what do you guys think about having this with a a patch? >> > >> > Lastly, if I'm not wrong, samza currently only support either HTTP or >> HDFS >> > as the filesystem to read the tar file from. I think it would be nice >>to >> be >> > able to pass the tar file from the local filesystem of the launching >> node. >> > What do you think? >> > >> > Casey >> > >>
