How do I limit queries made to my Hadoop Cluster
Dear Hadoop Expert, This is my first post to this group, and I am new to Hadoop, so if this is not the correct list please excuse me. If you have a better group please let me know by replying directly to me. I have a challenge before me. In my Hadoop system I have data from three companies called ABC, XYZ, and 123. And because of my Business Need; all the records from these three companies are in the same data store. The records are randomly mixed so one record could be a record from ABC and the next could be from XYX or 123. When I query my Hadoop system, for all records that have the last name of Boudreau for data analytical work; I get all 3000 records that have the last name Boudreau. However, I also have a contract with ABC, that says I cannot aggregate their records. So I need a way to apply these contract rules when the data is queried. Please note: I have given 20 other developers access to my Hadoop system, but I am responsible to mirage the contractual obligations for my customers What is the best way about going about this? Can or do I write a Plug-in or modify YARN to have it check my contract rules prior to returning a dataset? Can or do I write a plug in for each and every Gateway Application such as Pig, Elastic Search, MapR, etc (about 10 applications that have access to my Hadoop system) What are other options? I have installed, configured and running Hadoop onto my local machine. I have the source code also downloaded onto my machine, and I am able to dig into it and compile it. Regards Carl This e-mail, including attachments, may include confidential and/or proprietary information, and may be used only by the person or entity to which it is addressed. If the reader of this e-mail is not the intended recipient or his or her authorized agent, the reader is hereby notified that any dissemination, distribution or copying of this e-mail is prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and delete this e-mail immediately.
Re: Hadoop 2.6 Timeline/history server: Killed/Complete Containers
unsubscribe. On Fri, Dec 2, 2016 at 8:17 PM, AJAY GUPTAwrote: > I have Hadoop 2.6 setup on my laptop and my workplace too has a Hadoop 2.6 > cluster. I have started the timeline server on both locations. I am seeing > different behaviour on both these setups. > > *Application:* I am writing a rest API to fetch all killed containers for > an application > > *On company cluster:* Assume I have started an application, assume one of > the containers gets killed for this application. I am trying to fetch > information for this killed container using historyClient (while the > application is running). However this information is not available with > historyServer. This information is available with history server only after > application gets killed. The entire application's information is sent to > history server only after the application is getting killed. > > *On local laptop setup:* The killed container information is available at > history server even when the application is running. > > Am I missing some configuration setting on the company cluster which is > causing this different behaviour? > -- Thanks, Destin Ashwin Cell: +91 914.573.4711 Email: destinash...@gmail.com
Hadoop 2.6 Timeline/history server: Killed/Complete Containers
I have Hadoop 2.6 setup on my laptop and my workplace too has a Hadoop 2.6 cluster. I have started the timeline server on both locations. I am seeing different behaviour on both these setups. *Application:* I am writing a rest API to fetch all killed containers for an application *On company cluster:* Assume I have started an application, assume one of the containers gets killed for this application. I am trying to fetch information for this killed container using historyClient (while the application is running). However this information is not available with historyServer. This information is available with history server only after application gets killed. The entire application's information is sent to history server only after the application is getting killed. *On local laptop setup:* The killed container information is available at history server even when the application is running. Am I missing some configuration setting on the company cluster which is causing this different behaviour?