Re: NPE MergeContent processor

2016-11-15 Thread Conrad Crampton
Hi Mark,
That is really good news. As to whether it sounds familiar – I try so many 
things when I was upgrading from 0.6.1 to 1.0.0 I couldn’t say whether this was 
indeed the cause. It may have been – I think though whilst the outcome is 
probably the same, the exact cause may have been due the aforementioned 
‘titting about’ ☺
Anyway, I was thinking of splitting my cluster in two anyway to make better use 
of the syslog ingestion (I think that will give me better throughput as I am 
seeing the syslog ingestion as a bottleneck with repeated warnings over full 
buffer), at which point I will delete the provenance repository anyway which 
will get rid of this won’t it? I’m assuming I can delete all repositories and 
just leave the flowfile.xml to have the same starting point for the workflows?
Anyway, thanks again for pursuing this and once again I am incredibly impressed 
with the reaction to issues/ bugs etc. in this community.
Regards
Conrad

From: Mark Payne 
Reply-To: "users@nifi.apache.org" 
Date: Tuesday, 15 November 2016 at 18:09
To: "users@nifi.apache.org" 
Subject: Re: NPE MergeContent processor


Conrad,



Good news - I have been able to replicate the issue and track down the problem. 
I created a JIRA to address it - 
https://issues.apache.org/jira/browse/NIFI-3040.

I have a PR up to address the issue. It looks like the problem is due to 
Replaying a FlowFile from Provenance and then restarting NiFi before the 
replayed FlowFile

has completed processing. Does that sound familiar?



In the case of MergeContent you'd see a NullPointerException. In other cases it 
will generally just complain because the UUID is null.



The issue has to do with the FlowFile not being properly persisted when a 
REPLAY event occurs. So if you still have the FlowFile that is causing this, 
you'd have

to manually remove it from its queue to address the issue, but the issue 
shouldn't happen any more after this fix makes its way in.



Sorry that this has been you, but thanks for working with us to give us all we 
needed to investigate. And thanks for being patient as we've diagnosed and dug 
in.



Cheers

-Mark


From: Oleg Zhurakousky 
Sent: Friday, November 11, 2016 2:07 PM
To: users@nifi.apache.org
Subject: Re: NPE MergeContent processor

Sorry, I should have been more clear.
I’ve spent considerable amount of time slicing and dicing this thing and while 
I am still validating few possibilities, this is more likely to due to FlowFile 
being rehydrated from the corrupted repo with missing UUID and when such file’s 
ID ends up to be in a parent/child of ProvenanceEventRecord we get this issue.
Basically FlowFile must never exist without UUID similar to the way provenance 
event record where existence if UUID is validated during the call to build(). 
We should definitely do the same in a builder for FlowFile and even though it 
will not eliminate the issue it may help to pin point its origin.

I’ll raise  the corresponding JIRA to improve FlowFile validation.

Cheers
Oleg

> On Nov 11, 2016, at 3:00 PM, Joe Witt  wrote:
>
> that said even if it is due to crashes or even disk full cases we
> should figure out what happened and make it not possible.  We must
> always work to eliminate the possibility of corruption causing events
> and work to recover well in the face of corruption...
>
> On Fri, Nov 11, 2016 at 2:57 PM, Oleg Zhurakousky
>  wrote:
>> Conrad
>>
>> Is it possible that you may be dealing with corrupted repositories (swap,
>> flow file etc.) due to your upgrades or may be even possible crashes?
>>
>> Cheers
>> Oleg
>>
>> On Nov 11, 2016, at 3:11 AM, Conrad Crampton 
>> wrote:
>>
>> Hi,
>> This is the flow. The incoming flow is basically a syslog message which is
>> parsed, enriched then saved to HDFS
>> 1.   Parse (extracttext)
>> 2.   Assign matching parts to attributes
>> 3.   Enrich ip address location
>> 4.   Assign attributes with geoenrichment
>> 5.   Execute python script to parse useragent
>> 6.   Create json from attributes
>> 7.   Convert to avro (all strings)
>> 8.   Convert to target avro schema (had to do 7 & 8 due to bug(?) where
>> couldn’t go directly from json to avro with integers/longs)
>> 9.   Merge into bins (see props below)
>> 10.   Append ‘.avro’ to filenames (for reading in Spark subsequently)
>> 11.   Save to HDFS
>>
>> Does this help at all?
>> If you need anything else just shout.
>> Regards
>> Conrad
>>
>> 
>>
>>
>> 
>> additional out of shot
>> · compression level : 1
>> · Keep Path : false
>>
>>
>> From: Oleg Zhurakousky 
>> Reply-To: "users@nifi.apache.org&qu

Re: NPE MergeContent processor

2016-11-15 Thread Mark Payne
Conrad,


Good news - I have been able to replicate the issue and track down the problem. 
I created a JIRA to address it - 
https://issues.apache.org/jira/browse/NIFI-3040.

I have a PR up to address the issue. It looks like the problem is due to 
Replaying a FlowFile from Provenance and then restarting NiFi before the 
replayed FlowFile

has completed processing. Does that sound familiar?


In the case of MergeContent you'd see a NullPointerException. In other cases it 
will generally just complain because the UUID is null.


The issue has to do with the FlowFile not being properly persisted when a 
REPLAY event occurs. So if you still have the FlowFile that is causing this, 
you'd have

to manually remove it from its queue to address the issue, but the issue 
shouldn't happen any more after this fix makes its way in.


Sorry that this has been you, but thanks for working with us to give us all we 
needed to investigate. And thanks for being patient as we've diagnosed and dug 
in.


Cheers

-Mark



From: Oleg Zhurakousky 
Sent: Friday, November 11, 2016 2:07 PM
To: users@nifi.apache.org
Subject: Re: NPE MergeContent processor

Sorry, I should have been more clear.
I’ve spent considerable amount of time slicing and dicing this thing and while 
I am still validating few possibilities, this is more likely to due to FlowFile 
being rehydrated from the corrupted repo with missing UUID and when such file’s 
ID ends up to be in a parent/child of ProvenanceEventRecord we get this issue.
Basically FlowFile must never exist without UUID similar to the way provenance 
event record where existence if UUID is validated during the call to build(). 
We should definitely do the same in a builder for FlowFile and even though it 
will not eliminate the issue it may help to pin point its origin.

I’ll raise  the corresponding JIRA to improve FlowFile validation.

Cheers
Oleg

> On Nov 11, 2016, at 3:00 PM, Joe Witt  wrote:
>
> that said even if it is due to crashes or even disk full cases we
> should figure out what happened and make it not possible.  We must
> always work to eliminate the possibility of corruption causing events
> and work to recover well in the face of corruption...
>
> On Fri, Nov 11, 2016 at 2:57 PM, Oleg Zhurakousky
>  wrote:
>> Conrad
>>
>> Is it possible that you may be dealing with corrupted repositories (swap,
>> flow file etc.) due to your upgrades or may be even possible crashes?
>>
>> Cheers
>> Oleg
>>
>> On Nov 11, 2016, at 3:11 AM, Conrad Crampton 
>> wrote:
>>
>> Hi,
>> This is the flow. The incoming flow is basically a syslog message which is
>> parsed, enriched then saved to HDFS
>> 1.   Parse (extracttext)
>> 2.   Assign matching parts to attributes
>> 3.   Enrich ip address location
>> 4.   Assign attributes with geoenrichment
>> 5.   Execute python script to parse useragent
>> 6.   Create json from attributes
>> 7.   Convert to avro (all strings)
>> 8.   Convert to target avro schema (had to do 7 & 8 due to bug(?) where
>> couldn’t go directly from json to avro with integers/longs)
>> 9.   Merge into bins (see props below)
>> 10.   Append ‘.avro’ to filenames (for reading in Spark subsequently)
>> 11.   Save to HDFS
>>
>> Does this help at all?
>> If you need anything else just shout.
>> Regards
>> Conrad
>>
>> 
>>
>>
>> 
>> additional out of shot
>> · compression level : 1
>> · Keep Path : false
>>
>>
>> From: Oleg Zhurakousky 
>> Reply-To: "users@nifi.apache.org" 
>> Date: Thursday, 10 November 2016 at 18:40
>> To: "users@nifi.apache.org" 
>> Subject: Re: NPE MergeContent processor
>>
>> Conrad
>>
>> Any chance you an provide a bit more info about your flow?
>> I was able to find a condition when something like this can happen, but it
>> would have to be with some legacy NiFi distribution, so it’s a bit puzzling,
>> but i really want o see if we can close the loop on this.
>> In any event I think it is safe to raise JIRA on this one
>>
>> Cheers
>> Oleg
>>
>>
>> On Nov 10, 2016, at 10:06 AM, Conrad Crampton 
>> wrote:
>>
>> Hi,
>> The processor continues to write (to HDFS – the next processor in flow) and
>> doesn’t block any others coming into this processor (MergeContent), so not
>> quite the same observed behaviour as NIFI-2015.
>> If there is anything else you would like me to do to help with this more
>> than happy to help.
>> Regards
>> Conrad
>>
>> From: Bryan Bende 
>> Reply-To: "users@n

Re: NPE MergeContent processor

2016-11-11 Thread Oleg Zhurakousky
Sorry, I should have been more clear.
I’ve spent considerable amount of time slicing and dicing this thing and while 
I am still validating few possibilities, this is more likely to due to FlowFile 
being rehydrated from the corrupted repo with missing UUID and when such file’s 
ID ends up to be in a parent/child of ProvenanceEventRecord we get this issue. 
Basically FlowFile must never exist without UUID similar to the way provenance 
event record where existence if UUID is validated during the call to build(). 
We should definitely do the same in a builder for FlowFile and even though it 
will not eliminate the issue it may help to pin point its origin.

I’ll raise  the corresponding JIRA to improve FlowFile validation.

Cheers
Oleg

> On Nov 11, 2016, at 3:00 PM, Joe Witt  wrote:
> 
> that said even if it is due to crashes or even disk full cases we
> should figure out what happened and make it not possible.  We must
> always work to eliminate the possibility of corruption causing events
> and work to recover well in the face of corruption...
> 
> On Fri, Nov 11, 2016 at 2:57 PM, Oleg Zhurakousky
>  wrote:
>> Conrad
>> 
>> Is it possible that you may be dealing with corrupted repositories (swap,
>> flow file etc.) due to your upgrades or may be even possible crashes?
>> 
>> Cheers
>> Oleg
>> 
>> On Nov 11, 2016, at 3:11 AM, Conrad Crampton 
>> wrote:
>> 
>> Hi,
>> This is the flow. The incoming flow is basically a syslog message which is
>> parsed, enriched then saved to HDFS
>> 1.   Parse (extracttext)
>> 2.   Assign matching parts to attributes
>> 3.   Enrich ip address location
>> 4.   Assign attributes with geoenrichment
>> 5.   Execute python script to parse useragent
>> 6.   Create json from attributes
>> 7.   Convert to avro (all strings)
>> 8.   Convert to target avro schema (had to do 7 & 8 due to bug(?) where
>> couldn’t go directly from json to avro with integers/longs)
>> 9.   Merge into bins (see props below)
>> 10.   Append ‘.avro’ to filenames (for reading in Spark subsequently)
>> 11.   Save to HDFS
>> 
>> Does this help at all?
>> If you need anything else just shout.
>> Regards
>> Conrad
>> 
>> 
>> 
>> 
>> 
>> additional out of shot
>> · compression level : 1
>> · Keep Path : false
>> 
>> 
>> From: Oleg Zhurakousky 
>> Reply-To: "users@nifi.apache.org" 
>> Date: Thursday, 10 November 2016 at 18:40
>> To: "users@nifi.apache.org" 
>> Subject: Re: NPE MergeContent processor
>> 
>> Conrad
>> 
>> Any chance you an provide a bit more info about your flow?
>> I was able to find a condition when something like this can happen, but it
>> would have to be with some legacy NiFi distribution, so it’s a bit puzzling,
>> but i really want o see if we can close the loop on this.
>> In any event I think it is safe to raise JIRA on this one
>> 
>> Cheers
>> Oleg
>> 
>> 
>> On Nov 10, 2016, at 10:06 AM, Conrad Crampton 
>> wrote:
>> 
>> Hi,
>> The processor continues to write (to HDFS – the next processor in flow) and
>> doesn’t block any others coming into this processor (MergeContent), so not
>> quite the same observed behaviour as NIFI-2015.
>> If there is anything else you would like me to do to help with this more
>> than happy to help.
>> Regards
>> Conrad
>> 
>> From: Bryan Bende 
>> Reply-To: "users@nifi.apache.org" 
>> Date: Thursday, 10 November 2016 at 14:59
>> To: "users@nifi.apache.org" 
>> Subject: Re: NPE MergeContent processor
>> 
>> Conrad,
>> 
>> Thanks for reporting this. I wonder if this is also related to:
>> 
>> https://issues.apache.org/jira/browse/NIFI-2015
>> 
>> Seems like there is some case where the UUID is ending up as null.
>> 
>> -Bryan
>> 
>> 
>> On Wed, Nov 9, 2016 at 11:57 AM, Conrad Crampton
>>  wrote:
>> 
>> Hi,
>> I saw this error after I upgraded to 1.0.0 but thought it was maybe due to
>> the issues I had with that upgrade (entirely my fault it turns out!), but I
>> have seen it a number of times since so I turned debugging on to get a
>> better stacktrace. Relevant log section as below.
>> Nothing out of the ordinary, and I never saw this in v0.6.1 or below.
>> I would have raised a Jira issue, but after logging in to Jira it only let
>> me create a service desk request (which didn’t sound right).
>> Regards
>> Conrad
>> 
>

Re: NPE MergeContent processor

2016-11-11 Thread Joe Witt
that said even if it is due to crashes or even disk full cases we
should figure out what happened and make it not possible.  We must
always work to eliminate the possibility of corruption causing events
and work to recover well in the face of corruption...

On Fri, Nov 11, 2016 at 2:57 PM, Oleg Zhurakousky
 wrote:
> Conrad
>
> Is it possible that you may be dealing with corrupted repositories (swap,
> flow file etc.) due to your upgrades or may be even possible crashes?
>
> Cheers
> Oleg
>
> On Nov 11, 2016, at 3:11 AM, Conrad Crampton 
> wrote:
>
> Hi,
> This is the flow. The incoming flow is basically a syslog message which is
> parsed, enriched then saved to HDFS
> 1.   Parse (extracttext)
> 2.   Assign matching parts to attributes
> 3.   Enrich ip address location
> 4.   Assign attributes with geoenrichment
> 5.   Execute python script to parse useragent
> 6.   Create json from attributes
> 7.   Convert to avro (all strings)
> 8.   Convert to target avro schema (had to do 7 & 8 due to bug(?) where
> couldn’t go directly from json to avro with integers/longs)
> 9.   Merge into bins (see props below)
> 10.   Append ‘.avro’ to filenames (for reading in Spark subsequently)
> 11.   Save to HDFS
>
> Does this help at all?
> If you need anything else just shout.
> Regards
> Conrad
>
> 
>
>
> 
> additional out of shot
> · compression level : 1
> · Keep Path : false
>
>
> From: Oleg Zhurakousky 
> Reply-To: "users@nifi.apache.org" 
> Date: Thursday, 10 November 2016 at 18:40
> To: "users@nifi.apache.org" 
> Subject: Re: NPE MergeContent processor
>
> Conrad
>
> Any chance you an provide a bit more info about your flow?
> I was able to find a condition when something like this can happen, but it
> would have to be with some legacy NiFi distribution, so it’s a bit puzzling,
> but i really want o see if we can close the loop on this.
> In any event I think it is safe to raise JIRA on this one
>
> Cheers
> Oleg
>
>
> On Nov 10, 2016, at 10:06 AM, Conrad Crampton 
> wrote:
>
> Hi,
> The processor continues to write (to HDFS – the next processor in flow) and
> doesn’t block any others coming into this processor (MergeContent), so not
> quite the same observed behaviour as NIFI-2015.
> If there is anything else you would like me to do to help with this more
> than happy to help.
> Regards
> Conrad
>
> From: Bryan Bende 
> Reply-To: "users@nifi.apache.org" 
> Date: Thursday, 10 November 2016 at 14:59
> To: "users@nifi.apache.org" 
> Subject: Re: NPE MergeContent processor
>
> Conrad,
>
> Thanks for reporting this. I wonder if this is also related to:
>
> https://issues.apache.org/jira/browse/NIFI-2015
>
> Seems like there is some case where the UUID is ending up as null.
>
> -Bryan
>
>
> On Wed, Nov 9, 2016 at 11:57 AM, Conrad Crampton
>  wrote:
>
> Hi,
> I saw this error after I upgraded to 1.0.0 but thought it was maybe due to
> the issues I had with that upgrade (entirely my fault it turns out!), but I
> have seen it a number of times since so I turned debugging on to get a
> better stacktrace. Relevant log section as below.
> Nothing out of the ordinary, and I never saw this in v0.6.1 or below.
> I would have raised a Jira issue, but after logging in to Jira it only let
> me create a service desk request (which didn’t sound right).
> Regards
> Conrad
>
> 2016-11-09 16:43:46,413 DEBUG [Timer-Driven Process Thread-5]
> o.a.n.processors.standard.MergeContent
> MergeContent[id=12c0bec7-68b7-3b60-a020-afcc7b4599e7] has chosen to yield
> its resources; will not be scheduled to run again for 1000 milliseconds
> 2016-11-09 16:43:46,414 DEBUG [Timer-Driven Process Thread-5]
> o.a.n.processors.standard.MergeContent
> MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Binned 42 FlowFiles
> 2016-11-09 16:43:46,418 INFO [Timer-Driven Process Thread-5]
> o.a.n.processors.standard.MergeContent
> MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Merged
> [StandardFlowFileRecord[uuid=5e846136-0a7a-46fb-be96-8200d5cdd33d,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1475059643340-275849,
> container=default, section=393], offset=567158,
> length=2337],offset=0,name=17453303363322987,size=2337],
> StandardFlowFileRecord[uuid=a5f4bd55-82e3-40cb-9fa9-86b9e6816f67,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1475059643340-275849,
> container=default, section=393], offset=573643,
> length=2279],offset=0,name=17453303351196175,size=2279],
> StandardFlowFileRecord[uuid=c1ca745b-660a-49cd-82e5-fa8b9a2f4165,clai

Re: NPE MergeContent processor

2016-11-11 Thread Oleg Zhurakousky
Conrad

Is it possible that you may be dealing with corrupted repositories (swap, flow 
file etc.) due to your upgrades or may be even possible crashes?

Cheers
Oleg

On Nov 11, 2016, at 3:11 AM, Conrad Crampton 
mailto:conrad.cramp...@secdata.com>> wrote:

Hi,
This is the flow. The incoming flow is basically a syslog message which is 
parsed, enriched then saved to HDFS
1.   Parse (extracttext)
2.   Assign matching parts to attributes
3.   Enrich ip address location
4.   Assign attributes with geoenrichment
5.   Execute python script to parse useragent
6.   Create json from attributes
7.   Convert to avro (all strings)
8.   Convert to target avro schema (had to do 7 & 8 due to bug(?) where 
couldn’t go directly from json to avro with integers/longs)
9.   Merge into bins (see props below)
10.   Append ‘.avro’ to filenames (for reading in Spark subsequently)
11.   Save to HDFS

Does this help at all?
If you need anything else just shout.
Regards
Conrad





additional out of shot
• compression level : 1
• Keep Path : false


From: Oleg Zhurakousky 
mailto:ozhurakou...@hortonworks.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
mailto:users@nifi.apache.org>>
Date: Thursday, 10 November 2016 at 18:40
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
mailto:users@nifi.apache.org>>
Subject: Re: NPE MergeContent processor

Conrad

Any chance you an provide a bit more info about your flow?
I was able to find a condition when something like this can happen, but it 
would have to be with some legacy NiFi distribution, so it’s a bit puzzling, 
but i really want o see if we can close the loop on this.
In any event I think it is safe to raise JIRA on this one

Cheers
Oleg

On Nov 10, 2016, at 10:06 AM, Conrad Crampton 
mailto:conrad.cramp...@secdata.com>> wrote:

Hi,
The processor continues to write (to HDFS – the next processor in flow) and 
doesn’t block any others coming into this processor (MergeContent), so not 
quite the same observed behaviour as NIFI-2015.
If there is anything else you would like me to do to help with this more than 
happy to help.
Regards
Conrad

From: Bryan Bende mailto:bbe...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
mailto:users@nifi.apache.org>>
Date: Thursday, 10 November 2016 at 14:59
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
mailto:users@nifi.apache.org>>
Subject: Re: NPE MergeContent processor

Conrad,

Thanks for reporting this. I wonder if this is also related to:

https://issues.apache.org/jira/browse/NIFI-2015

Seems like there is some case where the UUID is ending up as null.

-Bryan


On Wed, Nov 9, 2016 at 11:57 AM, Conrad Crampton 
mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
I saw this error after I upgraded to 1.0.0 but thought it was maybe due to the 
issues I had with that upgrade (entirely my fault it turns out!), but I have 
seen it a number of times since so I turned debugging on to get a better 
stacktrace. Relevant log section as below.
Nothing out of the ordinary, and I never saw this in v0.6.1 or below.
I would have raised a Jira issue, but after logging in to Jira it only let me 
create a service desk request (which didn’t sound right).
Regards
Conrad

2016-11-09 16:43:46,413 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=12c0bec7-68b7-3b60-a020-afcc7b4599e7] has chosen to yield its 
resources; will not be scheduled to run again for 1000 milliseconds
2016-11-09 16:43:46,414 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Binned 42 FlowFiles
2016-11-09 16:43:46,418 INFO [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Merged 
[StandardFlowFileRecord[uuid=5e846136-0a7a-46fb-be96-8200d5cdd33d,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=567158, 
length=2337],offset=0,name=17453303363322987,size=2337], 
StandardFlowFileRecord[uuid=a5f4bd55-82e3-40cb-9fa9-86b9e6816f67,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=573643, 
length=2279],offset=0,name=17453303351196175,size=2279], 
StandardFlowFileRecord[uuid=c1ca745b-660a-49cd-82e5-fa8b9a2f4165,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=583957, 
length=2223],offset=0,name=17453303531879367,size=2223], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=595617, 
length=2356],offset=0,name=,size=235

Re: NPE MergeContent processor

2016-11-10 Thread Oleg Zhurakousky
Conrad

Any chance you an provide a bit more info about your flow?
I was able to find a condition when something like this can happen, but it 
would have to be with some legacy NiFi distribution, so it’s a bit puzzling, 
but i really want o see if we can close the loop on this.
In any event I think it is safe to raise JIRA on this one

Cheers
Oleg

On Nov 10, 2016, at 10:06 AM, Conrad Crampton 
mailto:conrad.cramp...@secdata.com>> wrote:

Hi,
The processor continues to write (to HDFS – the next processor in flow) and 
doesn’t block any others coming into this processor (MergeContent), so not 
quite the same observed behaviour as NIFI-2015.
If there is anything else you would like me to do to help with this more than 
happy to help.
Regards
Conrad

From: Bryan Bende mailto:bbe...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
mailto:users@nifi.apache.org>>
Date: Thursday, 10 November 2016 at 14:59
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
mailto:users@nifi.apache.org>>
Subject: Re: NPE MergeContent processor

Conrad,

Thanks for reporting this. I wonder if this is also related to:

https://issues.apache.org/jira/browse/NIFI-2015

Seems like there is some case where the UUID is ending up as null.

-Bryan


On Wed, Nov 9, 2016 at 11:57 AM, Conrad Crampton 
mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
I saw this error after I upgraded to 1.0.0 but thought it was maybe due to the 
issues I had with that upgrade (entirely my fault it turns out!), but I have 
seen it a number of times since so I turned debugging on to get a better 
stacktrace. Relevant log section as below.
Nothing out of the ordinary, and I never saw this in v0.6.1 or below.
I would have raised a Jira issue, but after logging in to Jira it only let me 
create a service desk request (which didn’t sound right).
Regards
Conrad

2016-11-09 16:43:46,413 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=12c0bec7-68b7-3b60-a020-afcc7b4599e7] has chosen to yield its 
resources; will not be scheduled to run again for 1000 milliseconds
2016-11-09 16:43:46,414 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Binned 42 FlowFiles
2016-11-09 16:43:46,418 INFO [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Merged 
[StandardFlowFileRecord[uuid=5e846136-0a7a-46fb-be96-8200d5cdd33d,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=567158, 
length=2337],offset=0,name=17453303363322987,size=2337], 
StandardFlowFileRecord[uuid=a5f4bd55-82e3-40cb-9fa9-86b9e6816f67,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=573643, 
length=2279],offset=0,name=17453303351196175,size=2279], 
StandardFlowFileRecord[uuid=c1ca745b-660a-49cd-82e5-fa8b9a2f4165,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=583957, 
length=2223],offset=0,name=17453303531879367,size=2223], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=595617, 
length=2356],offset=0,name=,size=2356], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=705637, 
length=2317],offset=0,name=,size=2317], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=725376, 
length=2333],offset=0,name=,size=2333], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=728703, 
length=2377],offset=0,name=,size=2377]] into 
StandardFlowFileRecord[uuid=1ef3e5a0-f8db-49eb-935d-ed3c991fd631,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1478709819819-416, container=default, 
section=416], offset=982498, 
length=4576],offset=0,name=3649103647775837,size=4576]
2016-11-09 16:43:46,418 ERROR [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] failed to process session 
due to java.lang.NullPointerException: java.lang.NullPointerException
2016-11-09 16:43:46,422 ERROR [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent
java.lang.NullPointerException: null
at 
org.apache.nifi.stream.io<http://org.apache.nifi.stream.io/>.DataOutputStream.writeUTF(DataOutputStream.java:300)
 ~[ni

Re: NPE MergeContent processor

2016-11-10 Thread Conrad Crampton
Hi,
The processor continues to write (to HDFS – the next processor in flow) and 
doesn’t block any others coming into this processor (MergeContent), so not 
quite the same observed behaviour as NIFI-2015.
If there is anything else you would like me to do to help with this more than 
happy to help.
Regards
Conrad

From: Bryan Bende 
Reply-To: "users@nifi.apache.org" 
Date: Thursday, 10 November 2016 at 14:59
To: "users@nifi.apache.org" 
Subject: Re: NPE MergeContent processor

Conrad,

Thanks for reporting this. I wonder if this is also related to:

https://issues.apache.org/jira/browse/NIFI-2015

Seems like there is some case where the UUID is ending up as null.

-Bryan


On Wed, Nov 9, 2016 at 11:57 AM, Conrad Crampton 
mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
I saw this error after I upgraded to 1.0.0 but thought it was maybe due to the 
issues I had with that upgrade (entirely my fault it turns out!), but I have 
seen it a number of times since so I turned debugging on to get a better 
stacktrace. Relevant log section as below.
Nothing out of the ordinary, and I never saw this in v0.6.1 or below.
I would have raised a Jira issue, but after logging in to Jira it only let me 
create a service desk request (which didn’t sound right).
Regards
Conrad

2016-11-09 16:43:46,413 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=12c0bec7-68b7-3b60-a020-afcc7b4599e7] has chosen to yield its 
resources; will not be scheduled to run again for 1000 milliseconds
2016-11-09 16:43:46,414 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Binned 42 FlowFiles
2016-11-09 16:43:46,418 INFO [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Merged 
[StandardFlowFileRecord[uuid=5e846136-0a7a-46fb-be96-8200d5cdd33d,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=567158, 
length=2337],offset=0,name=17453303363322987,size=2337], 
StandardFlowFileRecord[uuid=a5f4bd55-82e3-40cb-9fa9-86b9e6816f67,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=573643, 
length=2279],offset=0,name=17453303351196175,size=2279], 
StandardFlowFileRecord[uuid=c1ca745b-660a-49cd-82e5-fa8b9a2f4165,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=583957, 
length=2223],offset=0,name=17453303531879367,size=2223], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=595617, 
length=2356],offset=0,name=,size=2356], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=705637, 
length=2317],offset=0,name=,size=2317], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=725376, 
length=2333],offset=0,name=,size=2333], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=728703, 
length=2377],offset=0,name=,size=2377]] into 
StandardFlowFileRecord[uuid=1ef3e5a0-f8db-49eb-935d-ed3c991fd631,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1478709819819-416, container=default, 
section=416], offset=982498, 
length=4576],offset=0,name=3649103647775837,size=4576]
2016-11-09 16:43:46,418 ERROR [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] failed to process session 
due to java.lang.NullPointerException: java.lang.NullPointerException
2016-11-09 16:43:46,422 ERROR [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent
java.lang.NullPointerException: null
at 
org.apache.nifi.stream.io<http://org.apache.nifi.stream.io>.DataOutputStream.writeUTF(DataOutputStream.java:300)
 ~[nifi-utils-1.0.0.jar:1.0.0]
at 
org.apache.nifi.stream.io<http://org.apache.nifi.stream.io>.DataOutputStream.writeUTF(DataOutputStream.java:281)
 ~[nifi-utils-1.0.0.jar:1.0.0]
at 
org.apache.nifi.provenance.StandardRecordWriter.writeUUID(StandardRecordWriter.java:257)
 ~[na:na]
at 
org.apache.nifi.provenance.StandardRecordWriter.writeUUIDs(StandardRecordWriter.java:266)
 ~[na:na]
at 
org.apache.nifi.provenance.StandardRecordWriter.writeRecord(StandardRecordWriter.java:232)
 ~[na:na]
at 
org.apache.nifi.provenance.PersistentProvenanceRepository.persistRecord(PersistentP

Re: NPE MergeContent processor

2016-11-10 Thread Bryan Bende
Conrad,

Thanks for reporting this. I wonder if this is also related to:

https://issues.apache.org/jira/browse/NIFI-2015

Seems like there is some case where the UUID is ending up as null.

-Bryan


On Wed, Nov 9, 2016 at 11:57 AM, Conrad Crampton <
conrad.cramp...@secdata.com> wrote:

> Hi,
>
> I saw this error after I upgraded to 1.0.0 but thought it was maybe due to
> the issues I had with that upgrade (entirely my fault it turns out!), but I
> have seen it a number of times since so I turned debugging on to get a
> better stacktrace. Relevant log section as below.
>
> Nothing out of the ordinary, and I never saw this in v0.6.1 or below.
>
> I would have raised a Jira issue, but after logging in to Jira it only let
> me create a service desk request (which didn’t sound right).
>
> Regards
>
> Conrad
>
>
>
> 2016-11-09 16:43:46,413 DEBUG [Timer-Driven Process Thread-5]
> o.a.n.processors.standard.MergeContent 
> MergeContent[id=12c0bec7-68b7-3b60-a020-afcc7b4599e7]
> has chosen to yield its resources; will not be scheduled to run again for
> 1000 milliseconds
>
> 2016-11-09 16:43:46,414 DEBUG [Timer-Driven Process Thread-5]
> o.a.n.processors.standard.MergeContent 
> MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116]
> Binned 42 FlowFiles
>
> 2016-11-09 16:43:46,418 INFO [Timer-Driven Process Thread-5]
> o.a.n.processors.standard.MergeContent 
> MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116]
> Merged [StandardFlowFileRecord[uuid=5e846136-0a7a-46fb-be96-
> 8200d5cdd33d,claim=StandardContentClaim [resourceClaim=
> StandardResourceClaim[id=1475059643340-275849, container=default,
> section=393], offset=567158, 
> length=2337],offset=0,name=17453303363322987,size=2337],
> StandardFlowFileRecord[uuid=a5f4bd55-82e3-40cb-9fa9-86b9e6816f67,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1475059643340-275849,
> container=default, section=393], offset=573643, 
> length=2279],offset=0,name=17453303351196175,size=2279],
> StandardFlowFileRecord[uuid=c1ca745b-660a-49cd-82e5-fa8b9a2f4165,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1475059643340-275849,
> container=default, section=393], offset=583957, 
> length=2223],offset=0,name=17453303531879367,size=2223],
> StandardFlowFileRecord[uuid=,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1475059643340-275849,
> container=default, section=393], offset=595617, 
> length=2356],offset=0,name=,size=2356],
> StandardFlowFileRecord[uuid=,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1475059643340-275849,
> container=default, section=393], offset=705637, 
> length=2317],offset=0,name=,size=2317],
> StandardFlowFileRecord[uuid=,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1475059643340-275849,
> container=default, section=393], offset=725376, 
> length=2333],offset=0,name=,size=2333],
> StandardFlowFileRecord[uuid=,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1475059643340-275849,
> container=default, section=393], offset=728703, 
> length=2377],offset=0,name=,size=2377]]
> into StandardFlowFileRecord[uuid=1ef3e5a0-f8db-49eb-935d-
> ed3c991fd631,claim=StandardContentClaim [resourceClaim=
> StandardResourceClaim[id=1478709819819-416, container=default,
> section=416], offset=982498, length=4576],offset=0,name=
> 3649103647775837,size=4576]
>
> 2016-11-09 16:43:46,418 ERROR [Timer-Driven Process Thread-5]
> o.a.n.processors.standard.MergeContent 
> MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116]
> MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] failed to process
> session due to java.lang.NullPointerException:
> java.lang.NullPointerException
>
> 2016-11-09 16:43:46,422 ERROR [Timer-Driven Process Thread-5]
> o.a.n.processors.standard.MergeContent
>
> java.lang.NullPointerException: null
>
> at 
> org.apache.nifi.stream.io.DataOutputStream.writeUTF(DataOutputStream.java:300)
> ~[nifi-utils-1.0.0.jar:1.0.0]
>
> at 
> org.apache.nifi.stream.io.DataOutputStream.writeUTF(DataOutputStream.java:281)
> ~[nifi-utils-1.0.0.jar:1.0.0]
>
> at 
> org.apache.nifi.provenance.StandardRecordWriter.writeUUID(StandardRecordWriter.java:257)
> ~[na:na]
>
> at 
> org.apache.nifi.provenance.StandardRecordWriter.writeUUIDs(StandardRecordWriter.java:266)
> ~[na:na]
>
> at 
> org.apache.nifi.provenance.StandardRecordWriter.writeRecord(StandardRecordWriter.java:232)
> ~[na:na]
>
> at org.apache.nifi.provenance.PersistentProvenanceRepository
> .persistRecord(PersistentProvenanceRepository.java:766) ~[na:na]
>
> at org.apache.nifi.provenance.PersistentProvenanceRepository
> .registerEvents(PersistentProvenanceRepository.java:432) ~[na:na]
>
> at org.apache.nifi.controller.repository.StandardProcessSession.
> updateProvenanceRepo(StandardProcessSession.java:713)
> ~[nifi-framework-core-1.0.0.jar:1.0.0]
>
> at org.apache.nifi.controller.repository.
> StandardProcessSession.commit

NPE MergeContent processor

2016-11-09 Thread Conrad Crampton
Hi,
I saw this error after I upgraded to 1.0.0 but thought it was maybe due to the 
issues I had with that upgrade (entirely my fault it turns out!), but I have 
seen it a number of times since so I turned debugging on to get a better 
stacktrace. Relevant log section as below.
Nothing out of the ordinary, and I never saw this in v0.6.1 or below.
I would have raised a Jira issue, but after logging in to Jira it only let me 
create a service desk request (which didn’t sound right).
Regards
Conrad

2016-11-09 16:43:46,413 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=12c0bec7-68b7-3b60-a020-afcc7b4599e7] has chosen to yield its 
resources; will not be scheduled to run again for 1000 milliseconds
2016-11-09 16:43:46,414 DEBUG [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Binned 42 FlowFiles
2016-11-09 16:43:46,418 INFO [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] Merged 
[StandardFlowFileRecord[uuid=5e846136-0a7a-46fb-be96-8200d5cdd33d,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=567158, 
length=2337],offset=0,name=17453303363322987,size=2337], 
StandardFlowFileRecord[uuid=a5f4bd55-82e3-40cb-9fa9-86b9e6816f67,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=573643, 
length=2279],offset=0,name=17453303351196175,size=2279], 
StandardFlowFileRecord[uuid=c1ca745b-660a-49cd-82e5-fa8b9a2f4165,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=583957, 
length=2223],offset=0,name=17453303531879367,size=2223], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=595617, 
length=2356],offset=0,name=,size=2356], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=705637, 
length=2317],offset=0,name=,size=2317], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=725376, 
length=2333],offset=0,name=,size=2333], 
StandardFlowFileRecord[uuid=,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1475059643340-275849, 
container=default, section=393], offset=728703, 
length=2377],offset=0,name=,size=2377]] into 
StandardFlowFileRecord[uuid=1ef3e5a0-f8db-49eb-935d-ed3c991fd631,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1478709819819-416, container=default, 
section=416], offset=982498, 
length=4576],offset=0,name=3649103647775837,size=4576]
2016-11-09 16:43:46,418 ERROR [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] 
MergeContent[id=8db3bb68-0354-3116-96c5-dc80854ef116] failed to process session 
due to java.lang.NullPointerException: java.lang.NullPointerException
2016-11-09 16:43:46,422 ERROR [Timer-Driven Process Thread-5] 
o.a.n.processors.standard.MergeContent
java.lang.NullPointerException: null
at 
org.apache.nifi.stream.io.DataOutputStream.writeUTF(DataOutputStream.java:300) 
~[nifi-utils-1.0.0.jar:1.0.0]
at 
org.apache.nifi.stream.io.DataOutputStream.writeUTF(DataOutputStream.java:281) 
~[nifi-utils-1.0.0.jar:1.0.0]
at 
org.apache.nifi.provenance.StandardRecordWriter.writeUUID(StandardRecordWriter.java:257)
 ~[na:na]
at 
org.apache.nifi.provenance.StandardRecordWriter.writeUUIDs(StandardRecordWriter.java:266)
 ~[na:na]
at 
org.apache.nifi.provenance.StandardRecordWriter.writeRecord(StandardRecordWriter.java:232)
 ~[na:na]
at 
org.apache.nifi.provenance.PersistentProvenanceRepository.persistRecord(PersistentProvenanceRepository.java:766)
 ~[na:na]
at 
org.apache.nifi.provenance.PersistentProvenanceRepository.registerEvents(PersistentProvenanceRepository.java:432)
 ~[na:na]
at 
org.apache.nifi.controller.repository.StandardProcessSession.updateProvenanceRepo(StandardProcessSession.java:713)
 ~[nifi-framework-core-1.0.0.jar:1.0.0]
at 
org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:311)
 ~[nifi-framework-core-1.0.0.jar:1.0.0]
at 
org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:299)
 ~[nifi-framework-core-1.0.0.jar:1.0.0]
at 
org.apache.nifi.processor.util.bin.BinFiles.processBins(BinFiles.java:256) 
~[nifi-processor-utils-1.0.0.jar:1.0.0]
at 
org.apache.nifi.processor.util.bin.BinFiles.onTrigger(BinFiles.java:190) 
~[nifi-processor-utils-1.0.0