[jira] [Commented] (NIFI-4428) Implement PutDruid Processor and Controller

ASF GitHub Bot (JIRA) Thu, 21 Dec 2017 08:57:36 -0800

    [ 
https://issues.apache.org/jira/browse/NIFI-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300272#comment-16300272
 ]


ASF GitHub Bot commented on NIFI-4428:
--------------------------------------

Github user pvillard31 commented on the issue:

    https://github.com/apache/nifi/pull/2310
  
    OK... So I've been doing some tests and it's working as I'd expect. The 
only remark I have is in terms of performances: ingestion rate is very low (and 
creating a lot of "dropped") but it might be explained because I'm running 
everything locally (?).
    
    For anyone interested to try this PR. I've been following the quick start 
[here](http://druid.io/docs/0.11.0/tutorials/quickstart.html). And I've been 
using this 
[workflow](https://gist.github.com/pvillard31/29956e9d7292551f9e8328bb62cbeb6c).
 (you'd need to update the ExecuteProcess processor to use the correct path 
pointing to the quickstart script generating data)
    
    Once data is ingested into Druid, I'm issuing requests to get top 25 
servers from the metrics we are ingesting:
    
    ````json
    {
      "queryType" : "topN",
      "dataSource" : "metrics",
      "intervals" : ["2017-01-01/2017-12-31"],
      "granularity" : "all",
      "dimension" : "server",
      "metric" : "count",
      "threshold" : 25,
      "aggregations" : [
        {
          "type" : "longSum",
          "name" : "count",
          "fieldName" : "count"
        }
      ]
    }
    ````
    
    Result:
    
    ````shell
    $ curl -L -H'Content-Type: application/json' -XPOST --data-binary 
@quickstart/metrics.json http://localhost:8082/druid/v2/?pretty
    [ {
      "timestamp" : "2017-12-21T16:28:00.000Z",
      "result" : [ {
        "count" : 2518,
        "server" : "www5.example.com"
      }, {
        "count" : 2505,
        "server" : "www1.example.com"
      }, {
        "count" : 2494,
        "server" : "www2.example.com"
      }, {
        "count" : 2467,
        "server" : "www4.example.com"
      }, {
        "count" : 2466,
        "server" : "www3.example.com"
      } ]
    } ]
    ````



> Implement PutDruid Processor and Controller
> -------------------------------------------
>
>                 Key: NIFI-4428
>                 URL: https://issues.apache.org/jira/browse/NIFI-4428
>             Project: Apache NiFi
>          Issue Type: New Feature
>    Affects Versions: 1.3.0
>            Reporter: Vadim Vaks
>            Assignee: Matt Burgess
>
> Implement a PutDruid Processor and Controller using Tranquility API. This 
> will enable Nifi to index contents of flow files in Druid. The implementation 
> should also be able to handle late arriving data (event timestamp points to 
> Druid indexing task that has closed, segment granularity and grace window 
> period expired). Late arriving data is typically dropped. Nifi should allow 
> late arriving data to be diverted to FAILED or DROPPED relationship. That 
> would allow late arriving data to be stored on HDFS or S3 until a re-indexing 
> task can merge it into the correct segment in deep storage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (NIFI-4428) Implement PutDruid Processor and Controller

Reply via email to