Re: Improvement for Chukwa Agent and Collector

Ariel Rabkin Mon, 09 Aug 2010 18:00:20 -0700

Proposal overall sounds useful.  I like versioning.

--Ari


On Mon, Aug 9, 2010 at 5:03 PM, Eric Yang <[email protected]> wrote:
> I like to have /v1/ at least to identify the URL versioning.  Just to be
> safe, if we change URL in the future.  / and /tool to point to information
> UI make sense.
>
> Regards,
> Eric
>
> On 8/9/10 2:59 PM, "Bill Graham" <[email protected]> wrote:
>
>> I generally feel that all params should be able to be passed either entirely
>> in the body or entirely in the URI regardless which ones are 
>> required/optional
>> (with the exception of the asset id, which typically is in the path
>> regardless). I vote for passing them all in the body as a json blob in this
>> case (if Content-Type is set to application/json that is).
>>
>> Thinking more about the base path to the API that I proposed, perhaps the
>> /v1.0 in the URL is overkill. I could go for removing that part. The /rest
>> path has value though to me though, because I could see keeping '/' or 
>> '/tool'
>> to potentially point to an HTML summary page or mini-UI at some point.
>>
>>
>>
>> On Mon, Aug 9, 2010 at 2:42 PM, Eric Yang <[email protected]> wrote:
>>> Hi Bill,
>>>
>>> I like your design better.  +1 on the revised version.  RecordType and
>>> Adaptor are required parameters, would it make sense if we could put them on
>>> the path parameters for POST?
>>>
>>> Regards,
>>> Eric
>>>
>>> On 8/9/10 11:33 AM, "Bill Graham" <[email protected]> wrote:
>>>
>>>> I agree that we should implement the features you suggest. I've been
>>>> thinking about a REST API for the agents lately, as I'd also like to be 
>>>> able
>>>> to expose statistics to help with monitoring. Something similar to what the
>>>> collector does so you can attach monitoring to a URL see if the average 
>>>> data
>>>> rate suddenly drops.
>>>>
>>>> Regarding the proposed API protocol, I think we should use POST, GET and
>>>> DELETE to create, fetch and remove adaptors, similar to how you propose, 
>>>> but
>>>> the identifier in the rest resource should be the adaptor id, not the
>>>> filename. This is more RESTful since the adaptor is the thing being
>>>> accessed, not the file. Also, you could have more than one adaptor on a
>>>> given file and some adaptors (i.e., JMSAdaptor) don't have a file 
>>>> associated
>>>> with them.
>>>>
>>>> I propose something like this:
>>>>
>>>> - Add Adaptor:
>>>>
>>>> POST /rest/v1.0/adaptor HTTP/1.0
>>>> Accept: text/plain
>>>> Content-Type: application/json
>>>> { "RecordType" : "jvm", "Cluster": "demo", adaptor configs including 
>>>> offset,
>>>> other tags ... }
>>>>
>>>> Returns: adaptor metadata including id
>>>>
>>>> - Get Adaptor fcb0fe44e9dd6d2283962cb0e3b4ea0f:
>>>>
>>>> GET /rest/v1.0/adaptor/fcb0fe44e9dd6d2283962cb0e3b4ea0f HTTP/1.0
>>>>
>>>> - Remove Adaptor fcb0fe44e9dd6d2283962cb0e3b4ea0f:
>>>>
>>>> DELETE /rest/v1.0/adaptor/fcb0fe44e9dd6d2283962cb0e3b4ea0f HTTP/1.0
>>>>
>>>> - List all adaptors:
>>>> GET /rest/v1.0/adaptor HTTP/1.0
>>>>
>>>> - Help
>>>> GET /rest/v1.0/help HTTP/1.0
>>>>
>>>> - Statistics for all adaptors
>>>> GET /rest/v1.0/adaptorStats HTTP/1.0
>>>>
>>>> - Statistics for a single adaptor
>>>> GET /rest/v1.0/adaptorStats/fcb0fe44e9dd6d2283962cb0e3b4ea0f HTTP/1.0
>>>>
>>>> Thoughts?
>>>>
>>>> thanks,
>>>> Bill
>>>>
>>>> On Mon, Aug 9, 2010 at 10:01 AM, Eric Yang <[email protected]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>>  Chukwa Agent has a custom command protocol (port 9093).  The current
>>>>> protocol is not easy to modify to implement security related features such
>>>>> as authentication and authorization.  I would like to propose that we use
>>>>> web service REST like protocol to improve security and be more aligned 
>>>>> with
>>>>> web standards.  Let¹s go through the use cases of Chukwa Agent command
>>>>> protocol:
>>>>>
>>>>> Start an adaptor:
>>>>>
>>>>> Current command: Add
>>>>>
>>>>>
> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingA>>>>
> d
>>>>> aptorUTF8NewLineEscaped
>>>>> /tmp/chukwa/var/log/metrics/chukwa-hdfs-jvm-1271121726962.log 0
>>>>>
>>>>> Proposed:
>>>>> POST /tmp/chukwa/var/log/metrics/chukwa-hdfs-jvm-1271121726962.log 
>>>>> HTTP/1.0
>>>>> Accept: chukwa/UTF8NewLineEscaped (optional)
>>>>> Offset: 0 (optional)
>>>>> Content-Type: application/json
>>>>> { ³RecordType² : ³jvm², "Cluster": "demo", other tags ... }
>>>>>
>>>>> List adaptors:
>>>>>
>>>>> Current command: List
>>>>>
>>>>> Proposed:
>>>>> GET / HTTP/1.0
>>>>> Accept: text/html
>>>>> Get list of information about all streaming adatpors
>>>>>
>>>>> HEAD /tmp/chukwa/var/log/metrics/chukwa-hdfs-jvm-1271121726962.log 
>>>>> HTTP/1.0
>>>>> or
>>>>> HEAD /adaptor_fcb0fe44e9dd6d2283962cb0e3b4ea0f HTTP/1.0
>>>>> Get information about the streaming adaptor only.
>>>>>
>>>>> Stop adaptors:
>>>>>
>>>>> Current command: Stop adaptor_fcb0fe44e9dd6d2283962cb0e3b4ea0f
>>>>>
>>>>> Proposed:
>>>>> DELETE /tmp/chukwa/var/log/metrics/chukwa-hdfs-jvm-1271121726962.log
>>>>> HTTP/1.0 or
>>>>> DELETE /adaptor_fcb0fe44e9dd6d2283962cb0e3b4ea0f HTTP/1.0
>>>>> Delete the adaptor
>>>>>
>>>>> Help:
>>>>> Current command: Help
>>>>>
>>>>> Proposed:
>>>>> GET /help HTTP/1.0
>>>>> Accept: text/html
>>>>>
>>>>> With this modification, we can support encryption and Basic/Digest
>>>>> Authentication from existing libraries without reinvent the wheel.  If the
>>>>> community is ok with this change, I would like to propose the next
>>>>> improvement:
>>>>>
>>>>> Chukwa Agent and collectors are two different feature sets, but there
>>>>> shouldn¹t be any road block to build a switch to toggle the machine to
>>>>> serve
>>>>> different responsibilities.  For example, a chukwa agent machine can flip 
>>>>> a
>>>>> switch to join collector pool and continue to stream data from itself.
>>>>>  With
>>>>> this improvement, it is more easily to dynamically create bigger data
>>>>> collection pipeline on the fly.  Both system use the same communication
>>>>> protocol, hence it is easier to manage.  In the future, we can add 
>>>>> addition
>>>>> commands like TRACE /config/reload to reload configuration, and tap into
>>>>> ZooKeeper for managing data flow in centralized configuration management.
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>> Regards,
>>>>> Eric
>>>>>
>>>>>
>>>>
>>>
>>
>>
>
>



-- 
Ari Rabkin [email protected]
UC Berkeley Computer Science Department

Re: Improvement for Chukwa Agent and Collector

Reply via email to