Re: Question on setting up nifi flow

2016-04-28 Thread Susheel Kumar
Thanks Pierre, Simon and Bryan.  Let me take a look and come back with few
more questions

On Thu, Apr 28, 2016 at 11:32 AM, Simon Ball  wrote:

> GetMongo is an ingest only processor, so cannot accept and input flow
> file. It also only has a success relation.
>
> A solution to this would be to use NiFi’s own deduplication.
>
> One Flow would seed values in the distributed cache by using GetMongo to
> pull the ids and PutDistributedMapCache to store them in NiFi’s cache.
>
> The main ingest flow would then use UpdateAttributes to create a
> hash.value that matched the values inserted to the cache ->
> DetectDuplicates -> flow to PutMongo (use the upset property) -success->
> PutSolrContentStream
>
> Simon
>
> On Apr 28, 2016, at 5:19 PM, Pierre Villard 
> wrote:
>
> Hi Susheel,
>
> 1. HandleHttpRequest
> 2. RouteOnAttribute + HandleHttpResponse in case of errors detected in
> headers
> 3. Depending of what you want, there are a lot of options to handle JSON
> data (EvaluateJsonPath will probably useful)
> 4. GetMongo (I think it will route on success in case there is an entry,
> and to failure if there is no record, but this has to be checked, otherwise
> an addional processor will do the job to check the result of the request).
> 5. & 6. PutMongo + PutFile (if local folder) + PutSolr (if you want to do
> Solr by yourself).
>
> Depending of the details, this could be slightly different, but I think it
> gives a good idea of the minimal set of processor you would need.
>
> HTH,
> Pierre
>
>
> 2016-04-28 16:54 GMT+02:00 Susheel Kumar :
>
>> Hi,
>>
>> After attending meetup in NYC, I am realizing NiFi can be used for the
>> data flow use case I have.  Can someone please share the steps/processors
>> necessary for below use case.
>>
>>
>>1. Receive JSON on a HTTP REST end point
>>2. Parse Http Header and do validation. Return Error code & messages
>>as JSON to the response in case of validation failures
>>3. Parse request JSON, perform various validations (missing data in
>>fields), massages some data, add some data
>>4. Check if the request JSON unique ID is present in MongoDB and
>>compare timestamp to validate if this is an update request or a new 
>> request
>>5. If new request, an entry is made in mongo and then JSON files are
>>written to output folder for another process to pick up and submit to 
>> Solr.
>>6. If update request, mongo record is updated and JSON files are
>>written to output folder
>>
>>
>> I understand that something like HandleHttpRequest Processor can be used
>> for receiving http request and then use PutSolrContentStream for writing to
>> Solr but not clear on what processors will be used for validation etc.
>> steps 2 thru 5 above.
>>
>> Appreciate your input.
>>
>> Thanks,
>> Susheel
>>
>>
>>
>>
>>
>
>


Re: Question on setting up nifi flow

2016-04-28 Thread Simon Ball
GetMongo is an ingest only processor, so cannot accept and input flow file. It 
also only has a success relation.

A solution to this would be to use NiFi’s own deduplication.

One Flow would seed values in the distributed cache by using GetMongo to pull 
the ids and PutDistributedMapCache to store them in NiFi’s cache.

The main ingest flow would then use UpdateAttributes to create a hash.value 
that matched the values inserted to the cache -> DetectDuplicates -> flow to 
PutMongo (use the upset property) -success-> PutSolrContentStream

Simon

On Apr 28, 2016, at 5:19 PM, Pierre Villard 
mailto:pierre.villard...@gmail.com>> wrote:

Hi Susheel,

1. HandleHttpRequest
2. RouteOnAttribute + HandleHttpResponse in case of errors detected in headers
3. Depending of what you want, there are a lot of options to handle JSON data 
(EvaluateJsonPath will probably useful)
4. GetMongo (I think it will route on success in case there is an entry, and to 
failure if there is no record, but this has to be checked, otherwise an 
addional processor will do the job to check the result of the request).
5. & 6. PutMongo + PutFile (if local folder) + PutSolr (if you want to do Solr 
by yourself).

Depending of the details, this could be slightly different, but I think it 
gives a good idea of the minimal set of processor you would need.

HTH,
Pierre


2016-04-28 16:54 GMT+02:00 Susheel Kumar 
mailto:susheel2...@gmail.com>>:
Hi,

After attending meetup in NYC, I am realizing NiFi can be used for the data 
flow use case I have.  Can someone please share the steps/processors necessary 
for below use case.


  1.  Receive JSON on a HTTP REST end point
  2.  Parse Http Header and do validation. Return Error code & messages as JSON 
to the response in case of validation failures
  3.  Parse request JSON, perform various validations (missing data in fields), 
massages some data, add some data
  4.  Check if the request JSON unique ID is present in MongoDB and compare 
timestamp to validate if this is an update request or a new request
  5.  If new request, an entry is made in mongo and then JSON files are written 
to output folder for another process to pick up and submit to Solr.
  6.  If update request, mongo record is updated and JSON files are written to 
output folder

I understand that something like HandleHttpRequest Processor can be used for 
receiving http request and then use PutSolrContentStream for writing to Solr 
but not clear on what processors will be used for validation etc. steps 2 thru 
5 above.

Appreciate your input.

Thanks,
Susheel








Re: Question on setting up nifi flow

2016-04-28 Thread Bryan Bende
Hi Susheel,

In addition to what Pierre mentioned, if you are interested in an example
of using HandleHttpRequest/Response, there is a template in this repository:

https://github.com/hortonworks-gallery/nifi-templates

The template is HttpExecuteLsCommand.xml and shows how to build a web
service in NiFi that performs a directory listing.

-Bryan


On Thu, Apr 28, 2016 at 11:19 AM, Pierre Villard <
pierre.villard...@gmail.com> wrote:

> Hi Susheel,
>
> 1. HandleHttpRequest
> 2. RouteOnAttribute + HandleHttpResponse in case of errors detected in
> headers
> 3. Depending of what you want, there are a lot of options to handle JSON
> data (EvaluateJsonPath will probably useful)
> 4. GetMongo (I think it will route on success in case there is an entry,
> and to failure if there is no record, but this has to be checked, otherwise
> an addional processor will do the job to check the result of the request).
> 5. & 6. PutMongo + PutFile (if local folder) + PutSolr (if you want to do
> Solr by yourself).
>
> Depending of the details, this could be slightly different, but I think it
> gives a good idea of the minimal set of processor you would need.
>
> HTH,
> Pierre
>
>
> 2016-04-28 16:54 GMT+02:00 Susheel Kumar :
>
>> Hi,
>>
>> After attending meetup in NYC, I am realizing NiFi can be used for the
>> data flow use case I have.  Can someone please share the steps/processors
>> necessary for below use case.
>>
>>
>>1. Receive JSON on a HTTP REST end point
>>2. Parse Http Header and do validation. Return Error code & messages
>>as JSON to the response in case of validation failures
>>3. Parse request JSON, perform various validations (missing data in
>>fields), massages some data, add some data
>>4. Check if the request JSON unique ID is present in MongoDB and
>>compare timestamp to validate if this is an update request or a new 
>> request
>>5. If new request, an entry is made in mongo and then JSON files are
>>written to output folder for another process to pick up and submit to 
>> Solr.
>>6. If update request, mongo record is updated and JSON files are
>>written to output folder
>>
>>
>> I understand that something like HandleHttpRequest Processor can be used
>> for receiving http request and then use PutSolrContentStream for writing to
>> Solr but not clear on what processors will be used for validation etc.
>> steps 2 thru 5 above.
>>
>> Appreciate your input.
>>
>> Thanks,
>> Susheel
>>
>>
>>
>>
>>
>


Re: Question on setting up nifi flow

2016-04-28 Thread Pierre Villard
Hi Susheel,

1. HandleHttpRequest
2. RouteOnAttribute + HandleHttpResponse in case of errors detected in
headers
3. Depending of what you want, there are a lot of options to handle JSON
data (EvaluateJsonPath will probably useful)
4. GetMongo (I think it will route on success in case there is an entry,
and to failure if there is no record, but this has to be checked, otherwise
an addional processor will do the job to check the result of the request).
5. & 6. PutMongo + PutFile (if local folder) + PutSolr (if you want to do
Solr by yourself).

Depending of the details, this could be slightly different, but I think it
gives a good idea of the minimal set of processor you would need.

HTH,
Pierre


2016-04-28 16:54 GMT+02:00 Susheel Kumar :

> Hi,
>
> After attending meetup in NYC, I am realizing NiFi can be used for the
> data flow use case I have.  Can someone please share the steps/processors
> necessary for below use case.
>
>
>1. Receive JSON on a HTTP REST end point
>2. Parse Http Header and do validation. Return Error code & messages
>as JSON to the response in case of validation failures
>3. Parse request JSON, perform various validations (missing data in
>fields), massages some data, add some data
>4. Check if the request JSON unique ID is present in MongoDB and
>compare timestamp to validate if this is an update request or a new request
>5. If new request, an entry is made in mongo and then JSON files are
>written to output folder for another process to pick up and submit to Solr.
>6. If update request, mongo record is updated and JSON files are
>written to output folder
>
>
> I understand that something like HandleHttpRequest Processor can be used
> for receiving http request and then use PutSolrContentStream for writing to
> Solr but not clear on what processors will be used for validation etc.
> steps 2 thru 5 above.
>
> Appreciate your input.
>
> Thanks,
> Susheel
>
>
>
>
>


Question on setting up nifi flow

2016-04-28 Thread Susheel Kumar
Hi,

After attending meetup in NYC, I am realizing NiFi can be used for the data
flow use case I have.  Can someone please share the steps/processors
necessary for below use case.


   1. Receive JSON on a HTTP REST end point
   2. Parse Http Header and do validation. Return Error code & messages as
   JSON to the response in case of validation failures
   3. Parse request JSON, perform various validations (missing data in
   fields), massages some data, add some data
   4. Check if the request JSON unique ID is present in MongoDB and compare
   timestamp to validate if this is an update request or a new request
   5. If new request, an entry is made in mongo and then JSON files are
   written to output folder for another process to pick up and submit to Solr.
   6. If update request, mongo record is updated and JSON files are written
   to output folder


I understand that something like HandleHttpRequest Processor can be used
for receiving http request and then use PutSolrContentStream for writing to
Solr but not clear on what processors will be used for validation etc.
steps 2 thru 5 above.

Appreciate your input.

Thanks,
Susheel