> Hi, > > In our application Hive is used as a database. i.e. a result set from a > select query is consumed outside of hadoop cluster. > > The consumption process is not Hadoop friendly as in it is network bound not > cpu/disk bound. > > I'm in a process of converting hive query into pig query to see if it reads > better. > > What I'm stuck at is finding the content of a specific alias dump, from all > the other stuff being logged, to be able to trigger further process. > > STREAM <alias> THROUGH <cmd> seems to be one way to trigger a process, it's > just that it seems not suitable for the kind of process we are looking at, > because the <cmd> gets run in hadoop cluster. > > any thought? > > J
