How do I limit queries made to my Hadoop Cluster

2016-12-02 Thread Boudreau, Carl
Dear Hadoop Expert,

This is my first post to this group, and I am new to Hadoop, so if this is not 
the correct list please excuse me.  If you have a better group please let me 
know by replying directly to me.

I have a challenge before me.  In my Hadoop system I have data from three 
companies called ABC, XYZ, and 123.  And because of my Business Need; all the 
records from these three companies are in the same data store.  The records are 
randomly mixed so one record could be a record from ABC and the next could be 
from XYX or 123. When I query my Hadoop system, for all records that have the 
last name of Boudreau for data analytical work; I get all 3000 records that 
have the last name Boudreau.

However, I also have a contract with ABC, that says I cannot aggregate their 
records.  So I need a way to apply these contract rules when the data is 
queried.  Please note: I have given 20 other developers access to my Hadoop 
system, but I am responsible to mirage the contractual obligations for my 
customers

What is the best way about going about this?

Can or do I write a Plug-in or modify YARN to have it check my contract rules 
prior to returning a dataset?  Can or do I write a plug in for each and every 
Gateway Application such as Pig, Elastic Search, MapR, etc (about 10 
applications that have access to my Hadoop system)

What are other options?

I have installed, configured and running Hadoop onto my local machine.  I have 
the source code also downloaded onto my machine, and I am able to dig into it 
and compile it.

Regards Carl

This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.


Re: Hadoop 2.6 Timeline/history server: Killed/Complete Containers

2016-12-02 Thread Destin Ashwin
unsubscribe.

On Fri, Dec 2, 2016 at 8:17 PM, AJAY GUPTA  wrote:

> I have Hadoop 2.6 setup on my laptop and my workplace too has a Hadoop 2.6
> cluster. I have started the timeline server on both locations. I am seeing
> different behaviour on both these setups.
>
> *Application:* I am writing a rest API to fetch all killed containers for
> an application
>
> *On company cluster:* Assume I have started an application, assume one of
> the containers gets killed for this application. I am trying to fetch
> information for this killed container using historyClient (while the
> application is running). However this information is not available with
> historyServer. This information is available with history server only after
> application gets killed. The entire application's information is sent to
> history server only after the application is getting killed.
>
> *On local laptop setup:* The killed container information is available at
> history server even when the application is running.
>
> Am I missing some configuration setting on the company cluster which is
> causing this different behaviour?
>



-- 
Thanks,
Destin Ashwin
Cell: +91 914.573.4711
Email: destinash...@gmail.com


Hadoop 2.6 Timeline/history server: Killed/Complete Containers

2016-12-02 Thread AJAY GUPTA
I have Hadoop 2.6 setup on my laptop and my workplace too has a Hadoop 2.6
cluster. I have started the timeline server on both locations. I am seeing
different behaviour on both these setups.

*Application:* I am writing a rest API to fetch all killed containers for
an application

*On company cluster:* Assume I have started an application, assume one of
the containers gets killed for this application. I am trying to fetch
information for this killed container using historyClient (while the
application is running). However this information is not available with
historyServer. This information is available with history server only after
application gets killed. The entire application's information is sent to
history server only after the application is getting killed.

*On local laptop setup:* The killed container information is available at
history server even when the application is running.

Am I missing some configuration setting on the company cluster which is
causing this different behaviour?