Re: InvokeHTTP Remote URL vs Trusted Hostname - ExpLang issue

2019-02-12 Thread Andy LoPresto
Hi Ara,

To follow up, I just opened a Jira [1] to remove that property entirely. It was 
created in a legacy environment and isn’t a good solution anymore. The TLS 
certificates should be properly configured (we can help with that if you need 
it), but Trusted Hostname isn’t a secure behavior, as Bryan mentioned. There 
was an existing Jira for adding Expression Language support to that property, 
which I have closed as “Won’t Fix” [2]. 

[1] https://issues.apache.org/jira/browse/NIFI-6019 

[2] https://issues.apache.org/jira/browse/NIFI-3435

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Feb 12, 2019, at 7:31 AM, Bryan Bende  wrote:
> 
> Hello,
> 
> It looks like InvokeHttp creates an instance of the OkHttp client in
> the onScheduled method which is called when the processor is started,
> and when it creates the client it will specify a hostname verifier to
> always accept whatever the trusted hostname is. So the issue is that
> if trusted hostname were to support EL from flow file attributes, then
> you could no longer create the client instance in onScheduled, you
> would have to lazily create it per flow file with some type of cache
> from trusted hostname to client instance, making the logic of the
> processor a bit more complex.
> 
> I suspect the expectation was that trusted hostname should be used
> very sparingly since it is really a bit of a hack to bypass a proper
> TLS configuration, and so maybe it was not expected that you would
> need to specify many different trusted hostnames, but I'm only
> guessing.
> 
> I think it would be very easy to allow a comma separated list, or
> possibly a regex. You would just have to modify the hostname verifier
> here:
> 
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/InvokeHTTP.java#L1213-L1230
> 
> - Bryan
> 
> On Tue, Feb 12, 2019 at 10:06 AM ara m.  wrote:
>> 
>> In the InvokeHTTP processor, the Remote URL property supports Expression
>> Language but the Trusted Hostname does not. I can't use any form of stars,
>> *.*.my.expected.domain.com, and i cant use comma-separated values.
>> 
>> You can see this is a huge problem as domain name difference will cause
>> errors when we pass down varying Remote URL's and the Trusted Host stays the
>> same. For one of them I can use a variable, and the other is stuck.
>> Why was this implemented this way, was it an oversight? What is the work
>> around for this? Create custom processor and import all the libs required
>> for the code? Or modify nifi processor itself and rebuild nifi jars?
>> What is your recommendation? And thank you ahead of time.
>> 
>> 
>> 
>> --
>> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/



Custom Processor - TestRunner Out of Memory

2019-02-12 Thread Shawn Weeks
With the NiFi TestRunner class for a Processor, is there a way to have it write 
the output stream of the processor to disk so that it's not trying to store the 
thing in a ByteArrayOutputStream? I've got a test case that uses a rather large 
test file to verify some edge cases and I can't figure out how to test it. I'm 
working off of the NiFi 1.5 Processor ArchType.

Thanks
Shawn Weeks


Re: InvokeHTTP Remote URL vs Trusted Hostname - ExpLang issue

2019-02-12 Thread Bryan Bende
Hello,

It looks like InvokeHttp creates an instance of the OkHttp client in
the onScheduled method which is called when the processor is started,
and when it creates the client it will specify a hostname verifier to
always accept whatever the trusted hostname is. So the issue is that
if trusted hostname were to support EL from flow file attributes, then
you could no longer create the client instance in onScheduled, you
would have to lazily create it per flow file with some type of cache
from trusted hostname to client instance, making the logic of the
processor a bit more complex.

I suspect the expectation was that trusted hostname should be used
very sparingly since it is really a bit of a hack to bypass a proper
TLS configuration, and so maybe it was not expected that you would
need to specify many different trusted hostnames, but I'm only
guessing.

I think it would be very easy to allow a comma separated list, or
possibly a regex. You would just have to modify the hostname verifier
here:

https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/InvokeHTTP.java#L1213-L1230

- Bryan

On Tue, Feb 12, 2019 at 10:06 AM ara m.  wrote:
>
> In the InvokeHTTP processor, the Remote URL property supports Expression
> Language but the Trusted Hostname does not. I can't use any form of stars,
> *.*.my.expected.domain.com, and i cant use comma-separated values.
>
> You can see this is a huge problem as domain name difference will cause
> errors when we pass down varying Remote URL's and the Trusted Host stays the
> same. For one of them I can use a variable, and the other is stuck.
> Why was this implemented this way, was it an oversight? What is the work
> around for this? Create custom processor and import all the libs required
> for the code? Or modify nifi processor itself and rebuild nifi jars?
> What is your recommendation? And thank you ahead of time.
>
>
>
> --
> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Re: PutSQL benchmarking ?

2019-02-12 Thread l vic
Would it be possible to work around this by passing "upsert" as attribute
to flowfile? If so: where can i find some examples of using
PutDatabaseRecord with RecordReader to extract/save Json array?
Thank you

On Thu, Feb 7, 2019 at 1:03 PM Matt Burgess  wrote:

> Yeah that's a gap that needs filling. I'm hopefully wrapping up some
> stuff shortly, and would like to take a crack at upsert for PDR.
>
> Regards,
> Matt
>
> On Thu, Feb 7, 2019 at 12:54 PM l vic  wrote:
> >
> > Sorry, I realize i do indeed perform record splitting, the problem with
> PutDatabaseRecord is that it doesn't seem to recognize "upsert"
> >
> > On Wed, Feb 6, 2019 at 4:10 PM Matt Burgess 
> wrote:
> >>
> >> If you don't do record splitting, how are you getting SQL to send to
> >> PutSQL? Can you describe your flow (processors, e.g.)?
> >>
> >> Thanks,
> >> Matt
> >>
> >> On Wed, Feb 6, 2019 at 3:41 PM l vic  wrote:
> >> >
> >> > Hi Matt,
> >> > No, I don't do record splitting, data looks like {
> "attr1":"val1",...[{}]}
> >> > where "parent" data is saved into 1 record in "parent" table and
> array data is saved into multiple records in "child" table...
> >> > What's "lineage duration"?
> >> > Event Duration
> >> > < 1ms
> >> > Lineage Duration
> >> > 00:00:00.070
> >> >
> >> > On Wed, Feb 6, 2019 at 2:59 PM Matt Burgess 
> wrote:
> >> >>
> >> >> In your flow, what does the data look like? Are you splitting it into
> >> >> individual records, then converting to SQL (probably via JSON) and
> >> >> calling PutSQL? If so, that's not going to be very performant; the
> >> >> PutDatabaseRecord processor combines all that together so you can
> >> >> leave your data in its original state (i.e. many records in one flow
> >> >> file). For benchmarking PutDatabaseRecord (PDR), you could provide
> >> >> sample data via GenerateFlowFile, run a few through PDR, and check
> the
> >> >> provenance events for fields such as durationMillis or calculations
> >> >> like (timestampMills - lineageStart).
> >> >>
> >> >> Regards,
> >> >> Matt
> >> >>
> >> >> On Wed, Feb 6, 2019 at 2:07 PM l vic  wrote:
> >> >> >
> >> >> > I have performance issues with PutSQL i my flow... Is there some
> way to benchmark time required to write certain number of records to table
> from GenerateFlowFile?
> >> >> > Thank you,
>


InvokeHTTP Remote URL vs Trusted Hostname - ExpLang issue

2019-02-12 Thread ara m.
In the InvokeHTTP processor, the Remote URL property supports Expression
Language but the Trusted Hostname does not. I can't use any form of stars,
*.*.my.expected.domain.com, and i cant use comma-separated values.

You can see this is a huge problem as domain name difference will cause
errors when we pass down varying Remote URL's and the Trusted Host stays the
same. For one of them I can use a variable, and the other is stuck. 
Why was this implemented this way, was it an oversight? What is the work
around for this? Create custom processor and import all the libs required
for the code? Or modify nifi processor itself and rebuild nifi jars?  
What is your recommendation? And thank you ahead of time.



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Re: Is the DistributedMapCacheService a single point of failure?

2019-02-12 Thread James Srinivasan
(you can make it slightly less yucky by persisting the cache to shared
storage so you don't lose the contents when another node starts up,
but you do have to manually poke the clients)

On Tue, 12 Feb 2019 at 14:06, Bryan Bende  wrote:
>
> As James pointed out, there are alternate implementations of the DMC
> client that use external services that can be configured for high
> availability, such as HBase or Redis.
>
> When using the DMC client service, which is meant to work with the DMC
> server, the server is a single point of failure. In a cluster, the
> server runs on all nodes, but it doesn't replicate data between them,
> and the client can only point at one of these nodes. If you have to
> switch the client to point at a new server, then the cache will be
> starting over on the new server.
>
> On Tue, Feb 12, 2019 at 8:11 AM James Srinivasan
>  wrote:
> >
> > We switched to HBase_1_1_2_ClientMapCacheService for precisely this
> > reason. It works great (we already had HBase which probably helped)
> >
> > On Tue, 12 Feb 2019 at 12:51, Vos, Walter  wrote:
> > >
> > > Hi,
> > >
> > > I'm on NiFi 1.5 and we're currently having an issue with one of the nodes 
> > > in our three node cluster. No biggie, just disconnect it from the cluster 
> > > and let the other two nodes run things for a while, right? Unfortunately, 
> > > some of our flows are using a DistributedMapCacheService that have that 
> > > particular node that we took out set as the server hostname. For me as an 
> > > admin, this is worrying :-)
> > >
> > > Is there anything I can do in terms of configuration to "clusterize" the 
> > > DistributedMapCacheServices? I can already see that the 
> > > DistributedMapCacheServer doesn't define a hostname, so I guess that runs 
> > > on all nodes. Can we set multiple hostnames in the 
> > > DistributedMapCacheService then? Or should I just change it over in case 
> > > of node failure? Is the cache shared among the cluster? I.e. do all nodes 
> > > have the same values for each signal identifier/counter name?
> > >
> > > Kind regards,
> > >
> > > Walter
> > >
> > > 
> > >
> > > Deze e-mail, inclusief eventuele bijlagen, is uitsluitend bestemd voor 
> > > (gebruik door) de geadresseerde. De e-mail kan persoonlijke of 
> > > vertrouwelijke informatie bevatten. Openbaarmaking, vermenigvuldiging, 
> > > verspreiding en/of verstrekking van (de inhoud van) deze e-mail (en 
> > > eventuele bijlagen) aan derden is uitdrukkelijk niet toegestaan. Indien u 
> > > niet de bedoelde geadresseerde bent, wordt u vriendelijk verzocht degene 
> > > die de e-mail verzond hiervan direct op de hoogte te brengen en de e-mail 
> > > (en eventuele bijlagen) te vernietigen.
> > >
> > > Informatie vennootschap


Re: Is the DistributedMapCacheService a single point of failure?

2019-02-12 Thread Bryan Bende
As James pointed out, there are alternate implementations of the DMC
client that use external services that can be configured for high
availability, such as HBase or Redis.

When using the DMC client service, which is meant to work with the DMC
server, the server is a single point of failure. In a cluster, the
server runs on all nodes, but it doesn't replicate data between them,
and the client can only point at one of these nodes. If you have to
switch the client to point at a new server, then the cache will be
starting over on the new server.

On Tue, Feb 12, 2019 at 8:11 AM James Srinivasan
 wrote:
>
> We switched to HBase_1_1_2_ClientMapCacheService for precisely this
> reason. It works great (we already had HBase which probably helped)
>
> On Tue, 12 Feb 2019 at 12:51, Vos, Walter  wrote:
> >
> > Hi,
> >
> > I'm on NiFi 1.5 and we're currently having an issue with one of the nodes 
> > in our three node cluster. No biggie, just disconnect it from the cluster 
> > and let the other two nodes run things for a while, right? Unfortunately, 
> > some of our flows are using a DistributedMapCacheService that have that 
> > particular node that we took out set as the server hostname. For me as an 
> > admin, this is worrying :-)
> >
> > Is there anything I can do in terms of configuration to "clusterize" the 
> > DistributedMapCacheServices? I can already see that the 
> > DistributedMapCacheServer doesn't define a hostname, so I guess that runs 
> > on all nodes. Can we set multiple hostnames in the 
> > DistributedMapCacheService then? Or should I just change it over in case of 
> > node failure? Is the cache shared among the cluster? I.e. do all nodes have 
> > the same values for each signal identifier/counter name?
> >
> > Kind regards,
> >
> > Walter
> >
> > 
> >
> > Deze e-mail, inclusief eventuele bijlagen, is uitsluitend bestemd voor 
> > (gebruik door) de geadresseerde. De e-mail kan persoonlijke of 
> > vertrouwelijke informatie bevatten. Openbaarmaking, vermenigvuldiging, 
> > verspreiding en/of verstrekking van (de inhoud van) deze e-mail (en 
> > eventuele bijlagen) aan derden is uitdrukkelijk niet toegestaan. Indien u 
> > niet de bedoelde geadresseerde bent, wordt u vriendelijk verzocht degene 
> > die de e-mail verzond hiervan direct op de hoogte te brengen en de e-mail 
> > (en eventuele bijlagen) te vernietigen.
> >
> > Informatie vennootschap


Re: Is the DistributedMapCacheService a single point of failure?

2019-02-12 Thread James Srinivasan
We switched to HBase_1_1_2_ClientMapCacheService for precisely this
reason. It works great (we already had HBase which probably helped)

On Tue, 12 Feb 2019 at 12:51, Vos, Walter  wrote:
>
> Hi,
>
> I'm on NiFi 1.5 and we're currently having an issue with one of the nodes in 
> our three node cluster. No biggie, just disconnect it from the cluster and 
> let the other two nodes run things for a while, right? Unfortunately, some of 
> our flows are using a DistributedMapCacheService that have that particular 
> node that we took out set as the server hostname. For me as an admin, this is 
> worrying :-)
>
> Is there anything I can do in terms of configuration to "clusterize" the 
> DistributedMapCacheServices? I can already see that the 
> DistributedMapCacheServer doesn't define a hostname, so I guess that runs on 
> all nodes. Can we set multiple hostnames in the DistributedMapCacheService 
> then? Or should I just change it over in case of node failure? Is the cache 
> shared among the cluster? I.e. do all nodes have the same values for each 
> signal identifier/counter name?
>
> Kind regards,
>
> Walter
>
> 
>
> Deze e-mail, inclusief eventuele bijlagen, is uitsluitend bestemd voor 
> (gebruik door) de geadresseerde. De e-mail kan persoonlijke of vertrouwelijke 
> informatie bevatten. Openbaarmaking, vermenigvuldiging, verspreiding en/of 
> verstrekking van (de inhoud van) deze e-mail (en eventuele bijlagen) aan 
> derden is uitdrukkelijk niet toegestaan. Indien u niet de bedoelde 
> geadresseerde bent, wordt u vriendelijk verzocht degene die de e-mail verzond 
> hiervan direct op de hoogte te brengen en de e-mail (en eventuele bijlagen) 
> te vernietigen.
>
> Informatie vennootschap


Is the DistributedMapCacheService a single point of failure?

2019-02-12 Thread Vos, Walter
Hi,

I'm on NiFi 1.5 and we're currently having an issue with one of the nodes in 
our three node cluster. No biggie, just disconnect it from the cluster and let 
the other two nodes run things for a while, right? Unfortunately, some of our 
flows are using a DistributedMapCacheService that have that particular node 
that we took out set as the server hostname. For me as an admin, this is 
worrying :-)

Is there anything I can do in terms of configuration to "clusterize" the 
DistributedMapCacheServices? I can already see that the 
DistributedMapCacheServer doesn't define a hostname, so I guess that runs on 
all nodes. Can we set multiple hostnames in the DistributedMapCacheService 
then? Or should I just change it over in case of node failure? Is the cache 
shared among the cluster? I.e. do all nodes have the same values for each 
signal identifier/counter name?

Kind regards,

Walter



Deze e-mail, inclusief eventuele bijlagen, is uitsluitend bestemd voor (gebruik 
door) de geadresseerde. De e-mail kan persoonlijke of vertrouwelijke informatie 
bevatten. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking 
van (de inhoud van) deze e-mail (en eventuele bijlagen) aan derden is 
uitdrukkelijk niet toegestaan. Indien u niet de bedoelde geadresseerde bent, 
wordt u vriendelijk verzocht degene die de e-mail verzond hiervan direct op de 
hoogte te brengen en de e-mail (en eventuele bijlagen) te vernietigen.

Informatie vennootschap