Re: NiFi - Flow Backup Strategy

2016-06-06 Thread Joe Witt
Tom,

I suspect some others might add some additional info here but to
achieve the back-up of the configuration such that you could quickly
restore you'd generally just need to save the conf directory content.
The flow configuration should port well to another node including the
sensitive property key.  You would need to change things like
hostnames in the properties files if you move to another box.  If you
simply are going to restore the configuration on the same box that had
the issue you can restore the conf directory and just delete the
various repository directories.

As a general rule you can safely get rid of the archive folders any
time because those are just holding items which are no longer actively
reachable in the flow but we keep around so you can click-to-content
from provenance data or initiate replay.

Thanks
Joe

On Mon, Jun 6, 2016 at 10:45 PM, Tom Stewart  wrote:
> I am curious as to what others are using to automate backups of the NiFi
> flow? Specifically,I am looking for steps on what file(s) to copy out of the
> NiFi cluster such that I can rebuild/restore if necessary.
>
> In NiFi 0.61 (cluster), I set this:
> nifi.properties:nifi.flow.configuration.archive.dir=/opt/nifi/archive/
> nifi.properties:nifi.content.repository.archive.max.retention.period=12
> hours
> nifi.properties:nifi.content.repository.archive.max.usage.percentage=50%
> nifi.properties:nifi.content.repository.archive.enabled=true
>
> When I go into NiFi Flow Settings and use Back-Up Flow, it writes to
> /opt/nifi/archive on the cluster nodes (not the NCM). Is there any way to
> automate this on a periodic basis? Wondering if anyone has constructed, for
> example, a NiFi flow to back itself up that could create this file and
> subsequently copy it off the Nifi cluster to another system (like Hadoop,
> S3, or some random backup server). Also curious on recommendation if I can
> drive any of that from automation on the NCM. If we just grab the active
> flow.tar file from the NCM, is that a sufficient point in time backup?
>
> I also don't really see anything in the documentation regarding restore
> procedures. At a minimum, even if we lose in-flight data we'd like to be
> able to quickly rebuild and restore the basic flow configuration of a NiFi
> cluster. I think I know we'd maybe need to re-enter some of the sensitive
> data value fields. But I am looking for how and where to put the flow.tar or
> flow.xml.gz file when restoring a cluster.
>
> Thanks,
> Tom
>
>
>
>
>
>


NiFi - Flow Backup Strategy

2016-06-06 Thread Tom Stewart
I am curious as to what others are using to automate backups of the NiFi flow? 
Specifically,I am looking for steps on what file(s) to copy out of the NiFi 
cluster such that I can rebuild/restore if necessary.
In NiFi 0.61 (cluster), I set 
this:nifi.properties:nifi.flow.configuration.archive.dir=/opt/nifi/archive/
nifi.properties:nifi.content.repository.archive.max.retention.period=12 hours
nifi.properties:nifi.content.repository.archive.max.usage.percentage=50%
nifi.properties:nifi.content.repository.archive.enabled=true

When I go into NiFi Flow Settings and use Back-Up Flow, it writes to 
/opt/nifi/archive on the cluster nodes (not the NCM). Is there any way to 
automate this on a periodic basis? Wondering if anyone has constructed, for 
example, a NiFi flow to back itself up that could create this file and 
subsequently copy it off the Nifi cluster to another system (like Hadoop, S3, 
or some random backup server). Also curious on recommendation if I can drive 
any of that from automation on the NCM. If we just grab the active flow.tar 
file from the NCM, is that a sufficient point in time backup?

I also don't really see anything in the documentation regarding restore 
procedures. At a minimum, even if we lose in-flight data we'd like to be able 
to quickly rebuild and restore the basic flow configuration of a NiFi cluster. 
I think I know we'd maybe need to re-enter some of the sensitive data value 
fields. But I am looking for how and where to put the flow.tar or flow.xml.gz 
file when restoring a cluster. 

 Thanks,Tom








Re: How to control NiFi logging

2016-06-06 Thread Pat Trainor
No wonder it looked like log4j! That's some pretty slick stuff, right there!

  

On Jun 6 2016, at 8:06 am, Andrew Psaltis 
wrote:  

> Hi Pat,

>

> It is all standard logback, described here: 



Re: Processor Question

2016-06-06 Thread Thad Guidry
Hi Joe and others,

I see a high-level problem with searchable documentation for Processors in
two areas.

1.  The current new-processor-dialog shown during Add Processor has a
search (filter) capability, but it only searches through the Tags and Type,
not Property Names, PropertyDescriptor, or even AllowableValue's
description text.  That's a shame, because doing a search for "line" brings
up nothing.

Solution:  I think a more comprehensive Search/Filter that also searches
PropertyDescriptor and AllowableValue's would be much better.  Showing a
more expanded Add Processor dialog that also shows the description text for
the Properties and AllowableValues descriptions as well.

Wanted Position: Make the dialog show more complete info for a Processor
when a user clicks on a processor.

2. AllowableValue descriptions are not ideally displayed on the docs page,
but instead are hidden in a small blue ? circle next to each value title,
such as
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.RouteText/index.html
has Matching Strategy with AllowableValue of Satisfies Expression, but key
information about it having variables Line and LineNo are hidden completely
and should not be. Its part of the full documentation, and all
documentation should be shown to a user visiting the docs pages.

Solution:  Sub table the Allowable Values column on nifi-docs and add a
column for "Allowable Values description" next to each AllowableValue (just
as the source code does).

I have added these comments to JIRA issue for enhanced filter on Add
Processor here:
https://issues.apache.org/jira/browse/NIFI-1115

-Thad

Thad
+ThadGuidry 

On Mon, Jun 6, 2016 at 10:47 AM, Joe Percivall 
wrote:

> For number one, you can also use RouteText[1] with the matching strategy
> "Satisfies Expression". Then as a dynamic property use this expression
> "${lineNo:le(10)}". This will route first 10 lines to the "matched"
> relationship (assuming "Route to each matching Property Name" is not
> selected). This option also allows you to route those unmatched lines
> elsewhere if you need (if not just auto-terminate the "unmatched"
> relationship).
>
> The for number two, instead of ReplaceText, you could also use RouteText.
> Set the matching strategy to "Matches Regular Expression". Then set the
> dynamic property to match everything and end with "unambiguously" (an
> example being "((\w|\W)*unambiguously)"). This will route all the text that
> matches the Regex apart from the end of the file and gives you the option
> to route the ending text differently if needed.
>
> [1]
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.RouteText/index.html
>
>
> Joe- - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joeperciv...@yahoo.com
>
>
>
> On Sunday, June 5, 2016 4:41 AM, Leslie Hartman  wrote:
>
>
>
> Matthew:
>
> The modifyBytes processor would be the best if it would allow
>one to
> specify the bytes to keep. I could calculate the number of bytes to
>delete,
> but when I try and place a variable in the End Offset it says it is
>not in the
>   format.
>
> As for SegmentContent and SplitText I have tried both of these.
>The problem
> is that it just takes the original file a splits it in to a bunch of
>little files. So if I wanted
> say 256 Bytes of a 30 meg file, after running out of memory it would
>give me
> 125 Million 829 Thousand 119 Files to get rid of.
>
> For the 2nd case ReplaceText should work, I'm just having
>problems getting
> the correct syntax. If someone could provide an example of the
>correct syntax
> I would appreciate it.
>
> Thank You.
>
> Leslie Hartman
>
>
> Matthew Clarke wrote:
>
> You may also want to look at using the modifyBytes processor for number 1.
> >
> >On Jun 4, 2016 1:49 PM, "Thad Guidry"  wrote:
> >
> >For your 1st case, you can use either SegmentContent by your 256 bytes
> (or perhaps you can even use SplitText)
> >>
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SegmentContent/index.html
> >>
> >>
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitText/index.html
> >>
> >>
> >>
> >>For your 2nd case, you can use ReplaceText
> >>
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceText/index.html
> >>
> >>
> >>
> >>Thad
> >>+ThadGuidry
> >>
> >>
>


Re: How to use GetMongo Projection

2016-06-06 Thread Joe Percivall
Hello Stephane,

Sorry no one has responded to this yet. To get a project you can construct the 
BSON similar to how is done on this MongoDB page[1]. For example to retrieve 
only field_1 and field_2  you would use this as the value of the "projection" 
property "{ "field_1" : "1", "field_2" : "1"}".

As for example values, you can find many different example values in templates 
that use the processor you're interested in. You can find them in various 
places, a couple being:
https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates
https://github.com/hortonworks-gallery/nifi-templates

[1] https://docs.mongodb.com/manual/tutorial/project-fields-from-query-results/

 
Hope that helps,Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Thursday, May 26, 2016 10:58 PM, Stéphane Maarek  
wrote:



Hi,

How do I construct a bson from the getMongo processor?
Basically, I want to retrieve field_1, field_2

Side note: I think your overall documentation is great, but it's missing 
"example values". This will users figure out much more quickly what values is 
expected from properties

Cheers
Stephane


Re: Best way to process the process the requests in batch

2016-06-06 Thread Aldrin Piri
Kumiko,

In terms of increasing throughput from the standpoint of the NiFi
framework, it is possible to increase the number of concurrent tasks on the
processor under the Scheduling tab when configuring.  This will allow more
processes to execute simultaneously providing greater throughput.  Along
these lines, you could then optionally perform a SplitText on your context
file and treat them as separate events to allow parallelization within the
processor.

More areas of improvement would likely be centered around the specific
API(s) with which you are interacting and your custom processor which we
could explore further if the above approaches do not work.

NiFi has some metrics in terms of a component's activity in the stats
displayed from the vantage point of FlowFiles, but does not have visibility
into your processor.  In your case, sending a context file as described
would lead to several requests.  The above mentioned SplitText approach
could aid with the idea of keeping track of a quota, wherein there is a
one-to-one mapping of endpoint request to FlowFile and used in conjunction
with a ControlRate processor [1].

[1]
http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ControlRate/index.html


On Thu, May 26, 2016 at 8:34 PM, Kumiko Yada  wrote:

> Hello,
>
>
>
> We implemented the custom process that are similar to the InvokeHTTP that
> the part of URL can be replace with the Context Data List, then write the
> weather to the flowfile.  For example, URL to get the weather feed have to
> include the zip code in URL, and the ZIP code is {0} in the URL and
> replaced the zip code from the Context Data List property.
>
>
>
> URL
>
> http://example{0}/weather
>
>
>
> Context Data List:
>
> 0
>
> 1
>
> 2
>
>
>
> Processor with make the following requests:
>
> http://example{0}/weather
>
>
>
> http://example0/weather
>
> http://example1/weather
>
> http://example2/weather
>
>
>
> This processor is processed in one request at a time and have a perf
> issue.  I’d like to modify to process in batches.  What are the best way to
> process in batches?  And also, would the Nifi keep track how many requests
> the processor is processed?  If so, how the Nifi keep track this and how
> long the Nifi keep track of data?  I’d like to add the quota priorities in
> this processor to keep track of quota.  For example, if the weather feeds
> can be requested only 100 requests a day, I don’t want to processor to
> executed once the quota is reached.
>
>
>
> Thanks
>
> Kumiko
>


Re: Processor Question

2016-06-06 Thread Joe Percivall
For number one, you can also use RouteText[1] with the matching strategy 
"Satisfies Expression". Then as a dynamic property use this expression 
"${lineNo:le(10)}". This will route first 10 lines to the "matched" 
relationship (assuming "Route to each matching Property Name" is not selected). 
This option also allows you to route those unmatched lines elsewhere if you 
need (if not just auto-terminate the "unmatched" relationship).
 
The for number two, instead of ReplaceText, you could also use RouteText. Set 
the matching strategy to "Matches Regular Expression". Then set the dynamic 
property to match everything and end with "unambiguously" (an example being 
"((\w|\W)*unambiguously)"). This will route all the text that matches the Regex 
apart from the end of the file and gives you the option to route the ending 
text differently if needed.

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.RouteText/index.html


Joe- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com



On Sunday, June 5, 2016 4:41 AM, Leslie Hartman  wrote:



Matthew:

The modifyBytes processor would be the best if it would allow
   one to
specify the bytes to keep. I could calculate the number of bytes to
   delete,
but when I try and place a variable in the End Offset it says it is
   not in the
  format.

As for SegmentContent and SplitText I have tried both of these.
   The problem
is that it just takes the original file a splits it in to a bunch of
   little files. So if I wanted
say 256 Bytes of a 30 meg file, after running out of memory it would
   give me
125 Million 829 Thousand 119 Files to get rid of.

For the 2nd case ReplaceText should work, I'm just having
   problems getting
the correct syntax. If someone could provide an example of the
   correct syntax
I would appreciate it.

Thank You.

Leslie Hartman


Matthew Clarke wrote:

You may also want to look at using the modifyBytes processor for number 1.
>
>On Jun 4, 2016 1:49 PM, "Thad Guidry"  wrote:
>
>For your 1st case, you can use either SegmentContent by your 256 bytes (or 
>perhaps you can even use SplitText)
>>https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SegmentContent/index.html
>>
>>https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitText/index.html
>>
>>
>>
>>For your 2nd case, you can use ReplaceText
>>https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceText/index.html
>>
>>
>>
>>Thad 
>>+ThadGuidry
>>
>>


Re: Site to Site UnknownHostException

2016-06-06 Thread Aldrin Piri
Hi Joe,

Are you able to reach the specified instance, gbrdcr00015n01, from the
machine where you are accessing this log via ping or similar?

On Tue, May 31, 2016 at 9:06 AM,  wrote:

> Hi
>
>
>
> I’m new to Nifi. I have been experimenting with it and trying implement
> Site to Site connections between different Nifi instances on a Windows box
> and a Linux box. I configure a remote procedure group and it can see the
> names of the input ports (somewhat confusingly named 9992 here) and can
> connect to them but when I try to transfer a file I get an
> UnknownHostException. Any help would be greatly appreciated.
>
>
>
> 2016-05-31 11:24:11,002 ERROR [Timer-Driven Process Thread-6]
> o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=9992,target=
> http://gbrdcr00015n01:9995/nifi] failed to communicate with
> http://gbrdcr00015n01:9995/nifi due to java.net.UnknownHostException
>
> 2016-05-31 11:24:11,002 ERROR [Timer-Driven Process Thread-6]
> o.a.nifi.remote.StandardRemoteGroupPort
>
> java.net.UnknownHostException: null
>
> at sun.nio.ch.Net.translateException(Unknown Source)
> ~[na:1.8.0_91]
>
> at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
> ~[na:1.8.0_91]
>
> at
> org.apache.nifi.remote.client.socket.EndpointConnectionPool.establishSiteToSiteConnection(EndpointConnectionPool.java:712)
> ~[nifi-site-to-site-client-0.6.1.jar:0.6.1]
>
> at
> org.apache.nifi.remote.client.socket.EndpointConnectionPool.establishSiteToSiteConnection(EndpointConnectionPool.java:685)
> ~[nifi-site-to-site-client-0.6.1.jar:0.6.1]
>
> at
> org.apache.nifi.remote.client.socket.EndpointConnectionPool.getEndpointConnection(EndpointConnectionPool.java:301)
> ~[nifi-site-to-site-client-0.6.1.jar:0.6.1]
>
> at
> org.apache.nifi.remote.client.socket.SocketClient.createTransaction(SocketClient.java:129)
> ~[nifi-site-to-site-client-0.6.1.jar:0.6.1]
>
> at
> org.apache.nifi.remote.StandardRemoteGroupPort.onTrigger(StandardRemoteGroupPort.java:171)
> ~[nifi-site-to-site-0.6.1.jar:0.6.1]
>
> at
> org.apache.nifi.controller.AbstractPort.onTrigger(AbstractPort.java:227)
> [nifi-framework-core-api-0.6.1.jar:0.6.1]
>
> at
> org.apache.nifi.controller.tasks.ContinuallyRunConnectableTask.call(ContinuallyRunConnectableTask.java:81)
> [nifi-framework-core-0.6.1.jar:0.6.1]
>
> at
> org.apache.nifi.controller.tasks.ContinuallyRunConnectableTask.call(ContinuallyRunConnectableTask.java:40)
> [nifi-framework-core-0.6.1.jar:0.6.1]
>
>   at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:123)
> [nifi-framework-core-0.6.1.jar:0.6.1]
>
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> [na:1.8.0_91]
>
> at java.util.concurrent.FutureTask.runAndReset(Unknown
> Source) [na:1.8.0_91]
>
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
> Source) [na:1.8.0_91]
>
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
> Source) [na:1.8.0_91]
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> [na:1.8.0_91]
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> [na:1.8.0_91]
>
> at java.lang.Thread.run(Unknown Source) [na:1.8.0_91]
>
>
>
> Thanks
>
> Joe
>
>
>
>
>
> ___
>
> This message is for information purposes only, it is not a recommendation,
> advice, offer or solicitation to buy or sell a product or service nor an
> official confirmation of any transaction. It is directed at persons who are
> professionals and is not intended for retail customer use. Intended for
> recipient only. This message is subject to the terms at:
> www.barclays.com/emaildisclaimer.
>
> For important disclosures, please see:
> www.barclays.com/salesandtradingdisclaimer regarding market commentary
> from Barclays Sales and/or Trading, who are active market participants; and
> in respect of Barclays Research, including disclosures relating to specific
> issuers, please see http://publicresearch.barclays.com.
>
> ___
>


Re: Install directory

2016-06-06 Thread Joe Witt
Hello

While i am not aware of any strong reason to keep it the way it is there
are no plans to change it at this time.  Some project do this and some
don't.

Thanks
Joe
On Jun 6, 2016 10:13 AM, "ski n"  wrote:

> Yes, I already have a soft link, but this was more a remark that the
> directory structure is not in line with the Apache convention.
>
> 2016-06-06 13:12 GMT+02:00 Pat Trainor :
> > Why not use softlinks?
> >
> > ln -s  
> >
> > ...as in:
> >
> > pat@wopr:~$ l /opt
> > total 76
> > [...]
> > lrwxrwxrwx  1 pat  pat 10 May 30 18:08 nifi -> nifi-0.6.1
> > drwxrwxr-x 20 pat  pat   4096 Jun  5 21:22 nifi-0.6.1
> > [...]
> >
> > Thanks!
> >
> > pat
> > ( ͡° ͜ʖ ͡°)
> > "A wise man can learn more from a foolish question than a fool can learn
> > from a wise answer". ~ Bruce Lee.
> >
> > On Jun 6 2016, at 3:53 am, ski n  wrote:
> >>
> >> NiFi is unpacked/installed by default in the directory: nifi-0.6.1
> >>
> >> My other Apache installations all are grouped together by the Apache
> >> "apache-" prefix.
> >>
> >> For example: apache-tomcat-7.0.69
> >>
> >> Will this change in future releases (now NiFi is still before the 1.0
> >> release)
>


Re: Install directory

2016-06-06 Thread ski n
Yes, I already have a soft link, but this was more a remark that the
directory structure is not in line with the Apache convention.

2016-06-06 13:12 GMT+02:00 Pat Trainor :
> Why not use softlinks?
>
> ln -s  
>
> ...as in:
>
> pat@wopr:~$ l /opt
> total 76
> [...]
> lrwxrwxrwx  1 pat  pat 10 May 30 18:08 nifi -> nifi-0.6.1
> drwxrwxr-x 20 pat  pat   4096 Jun  5 21:22 nifi-0.6.1
> [...]
>
> Thanks!
>
> pat
> ( ͡° ͜ʖ ͡°)
> "A wise man can learn more from a foolish question than a fool can learn
> from a wise answer". ~ Bruce Lee.
>
> On Jun 6 2016, at 3:53 am, ski n  wrote:
>>
>> NiFi is unpacked/installed by default in the directory: nifi-0.6.1
>>
>> My other Apache installations all are grouped together by the Apache
>> "apache-" prefix.
>>
>> For example: apache-tomcat-7.0.69
>>
>> Will this change in future releases (now NiFi is still before the 1.0
>> release)


Re: How to control NiFi logging

2016-06-06 Thread Andrew Psaltis
Hi Pat,
It is all standard logback, described here: http://logback.qos.ch/

On Mon, Jun 6, 2016 at 12:18 PM, Pat Trainor  wrote:

> Andrew,
> Is any of this not standard log4j, and nifi-specific?
> On Jun 6, 2016 2:34 AM, "Andrew Psaltis"  wrote:
>
>> You are correct, sorry about that Stephane!
>>
>> On Mon, Jun 6, 2016 at 8:23 AM, Stéphane Maarek <
>> stephane.maa...@gmail.com> wrote:
>>
>>> Actually, it seems I only needed to change one line (versus two)
>>>
>>> >> class="ch.qos.logback.core.rolling.RollingFileAppender">
>>> /mnt/xvdf/logs/nifi-bootstrap.log
>>> >> class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
>>> 
>>>
>>> ./logs/nifi-bootstrap_%d.log
>>>
>>> On Mon, Jun 6, 2016 at 4:20 PM Stéphane Maarek <
>>> stephane.maa...@gmail.com> wrote:
>>>
 Hi Andrew,

 After changing the file I'm getting the following error. I changed the
 lines you indicated.

 Failed to auto configure default logger context
 Reported exception:
 ch.qos.logback.core.joran.spi.JoranException: Problem parsing XML
 document. See previously reported errors.
 at
 ch.qos.logback.core.joran.event.SaxEventRecorder.recordEvents(SaxEventRecorder.java:67)
 at
 ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:134)
 at
 ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:99)
 at
 ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:49)
 at
 ch.qos.logback.classic.util.ContextInitializer.configureByResource(ContextInitializer.java:77)
 at
 ch.qos.logback.classic.util.ContextInitializer.autoConfig(ContextInitializer.java:152)
 at
 org.slf4j.impl.StaticLoggerBinder.init(StaticLoggerBinder.java:85)
 at
 org.slf4j.impl.StaticLoggerBinder.(StaticLoggerBinder.java:55)
 at org.slf4j.LoggerFactory.bind(LoggerFactory.java:141)
 at
 org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:120)
 at
 org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:331)
 at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:283)
 at org.apache.nifi.bootstrap.RunNiFi.(RunNiFi.java:117)
 at org.apache.nifi.bootstrap.RunNiFi.main(RunNiFi.java:199)
 Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber:
 1; Premature end of file.
 at
 com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
 at
 com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:642)
 at
 com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:326)
 at
 ch.qos.logback.core.joran.event.SaxEventRecorder.recordEvents(SaxEventRecorder.java:61)
 ... 13 more
 06:16:11,055 |-INFO in ch.qos.logback.classic.LoggerContext[default] -
 Could NOT find resource [logback.groovy]
 06:16:11,055 |-INFO in ch.qos.logback.classic.LoggerContext[default] -
 Could NOT find resource [logback-test.xml]
 06:16:11,055 |-INFO in ch.qos.logback.classic.LoggerContext[default] -
 Found resource [logback.xml] at [file:/home/ec2-user/nifi/conf/logback.xml]
 06:16:11,127 |-ERROR in
 ch.qos.logback.core.joran.event.SaxEventRecorder@682ea4b1 -
 XML_PARSING - Parsing fatal error on line 1 and column 1
 org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Premature
 end of file.
 at org.xml.sax.SAXParseException: Premature end of file.
 at  at
 com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)
 at  at
 com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
 at  at
 com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
 at  at
 com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
 at  at
 com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1438)
 at  at
 com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1019)
 at  at
 com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
 at  at
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:118)
 at  at
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
 at  at
 com.sun.org.apache.xerces.internal.

Re: Nifi & Parsey McParseface! RegEx in a Processor...

2016-06-06 Thread Conrad Crampton
Hi,
I’m not a NiFi expert by any stretch of the imagination and there others on 
this list far better informed than me that can speak with authority on many of 
the questions you raise, but I’ll have a go…

It is probably not necessary to  create a custom processor to do the parsing 
(using PMPF) – your executescript processor probably is sufficient. The one 
reason that this may not be desirable is if the Parsey model initialisation is 
expensive and therefore to do for each script invocation would cause a 
bottleneck in processing, if it isn’t then using ListenKafka -> ExecuteScript 
(Parsey) -> PutKafka would do what you want I would have though (conceptually).
However, what you are missing from this pipeline is the analysis of the Parsey 
output as you say. Now this may be something that a custom processor would be 
suitable – quite a simple text processing one using standard Java text 
processing / regexp to then write to a new flowfile for putting back on Kafka 
queue.

If however you feel the Parsey being run via an ExecuteScript processor isn’t 
suitable then I guess there are a number of options available – to make it 
thread safe etc. and available from each node in your Nifi cluster in a 
consistent way, I would be inclined to wrap Parsey up in an Http service and 
invoke via REST (as an idea) – posting in the data to parse and receiving 
output – could even do the analysis to format the output appropriately (as Json 
perhaps) to return back – invoked via GetHttp processor. This may all be able 
to be done in custom processor too and probably the best option IF you can 
understand the Parsey model initialisation within the custom processor.

In any case, my advice (for what it’s worth) would be turn to custom processors 
as last resort and try and leverage the built in processors where possible. 
Whilst it is (fairly) trivial (as you have found out) to write your own 
processor it comes with its own overhead over time in maintenance etc. whereas 
using the built in ones come with a reassurance that they are well tried and 
tested.

Sorry I can’t be more specific on your (very interesting) use case.

Regards
Conrad

From: Pat Trainor 
Reply-To: "users@nifi.apache.org" 
Date: Monday, 6 June 2016 at 12:02
To: "users@nifi.apache.org" 
Subject: Re: Nifi & Parsey McParseface! RegEx in a Processor...


Conrad,

Thanks for writing! You do get the gist of it. Last night I realized how easy 
it is to make a custom processor. I was a little confused at first why I needed 
to pass on a new Flowfile in my simple onTrigger function, but the error in the 
Nifi GUI about versions/timestamp made it obvious. I guess I wasn't thinking 
and didn't check the nifi logs!

Anyway, if I am correct, I might be able to add an attribute to an existing 
Flowfile from my little processor. As of late last night I could change one 
that was there already, but today I will try to create one. If I can, then this 
should go well.

Unfortunately, and tell me if I am wrong, this new processor will still need to 
be loaded each time a sentence needs to be analyzed by 'Parsey'. On a small 
scale, this is no big deal, but normally people would be hammering it.

In looking for a clean, fast and [hopefully] elegant solution to accessing 
running services from a processor, is it bad design to simply make my parser 
run as a service, and have it listen to Kafka for text to parse? It could send 
it back as well via another topic...

But that is only 1/2 the problem. The other 1/2 is parsing out the output from 
Parsey, and maybe for that I should make my processor-not getting text sent & 
returned from Parsey... Because storing the output of Parsey (text) isn't a 
direct operation (see the sample output text in prev/original email), it's 
output needs to be analyzed first.

So let me know if this plan is viable:

  1.  Make the Parsey interaction via a java loop (daemon/service).
  2.  This daemon loads the Parsey model chosen once, then waits for Kafka 
messages to process, outputting each on another Kafka topic. It expects to 
receive 3 things:

 *   Flowfile as text to parse.
 *   The Kafka Topic to listen to (processor can't configure this, but will 
reflect user's choice).
 *   The Kafka Topic to send it back on (this I can send to the java 
daemon, and configure each 'return' at runtime)

*   This way, I am imagining many processors can send to Parsey via one 
fixed topic, and they can each wait for the return data via a unique Topic for 
just that processor.
*   I cannot see a way to adjust the listening Topic at runtime, so the 
user would make one for all processors to use, then enter that as a processor 
attribute.

  1.  My simple processor sends a flowfile to it via the topic the user selects 
as a Processor attribute "Send Topic".
  2.  The parser, well, parses. Then it sends back the reply on  a Topic set in 
the processor as well as the "Receive Topic".

 *   Is it better to just do the Kafka 

Re: Install directory

2016-06-06 Thread Pat Trainor
Why not use softlinks?

  

ln -s  

  

...as in:

  

pat@wopr:~$ l /opt  
total 76  
[...]

**lrwxrwxrwx  1 pat  pat 10 May 30 18:08 nifi -> nifi-0.6.1  
**drwxrwxr-x 20 pat  pat   4096 Jun  5 21:22 nifi-0.6.1  
[...]

  

Thanks!  
  

[pat](http://about.me/PatTrainor)  

( ͡° ͜ʖ ͡°)  
"A wise man can learn more from a foolish _question _than a fool can learn
from a wise _answer_". ~ Bruce Lee.  

  

On Jun 6 2016, at 3:53 am, ski n  wrote:  

> NiFi is unpacked/installed by default in the directory: nifi-0.6.1

>

> My other Apache installations all are grouped together by the Apache  
"apache-" prefix.

>

> For example: apache-tomcat-7.0.69

>

> Will this change in future releases (now NiFi is still before the 1.0
release)



Re: Nifi & Parsey McParseface! RegEx in a Processor...

2016-06-06 Thread Pat Trainor
Conrad,

Thanks for writing! You do get the gist of it. Last night I realized how easy
it is to make a custom processor. I was a little confused at first why I
needed to pass on a new Flowfile in my simple onTrigger function, but the
error in the Nifi GUI about versions/timestamp made it obvious. I guess I
wasn't thinking and didn't check the nifi logs!

Anyway, if I am correct, I might be able to add an attribute to an existing
Flowfile from my little processor. As of late last night I could _change_ one
that was there already, but today I will try to _create_ one. If I can, then
this should go well.  

Unfortunately, and tell me if I am wrong, this new processor will still need
to be loaded each time a sentence needs to be analyzed by 'Parsey'. On a small
scale, this is no big deal, but normally people would be hammering it.

In looking for a clean, fast and [hopefully] elegant solution to accessing
running services from a processor, is it bad design to simply make my parser
run as a service, and have it listen to Kafka for text to parse? It could send
it back as well via another topic...

But that is only 1/2 the problem. The other 1/2 is parsing out the output from
Parsey, and maybe for that I should make my processor-not getting text sent
& returned from Parsey... Because storing the output of Parsey (text)
isn't a direct operation (see the sample output text in prev/original email),
it's output needs to be analyzed first.

So let me know if this plan is viable:

  1. Make the Parsey interaction via a java loop (daemon/service).
  2. This daemon loads the Parsey model chosen _once_, then waits for Kafka 
messages to process, outputting each on another Kafka topic. It expects to 
receive 3 things:
1. Flowfile as text to parse.
2. The Kafka Topic to listen to (processor can't configure this, but will 
reflect user's choice).
3. The Kafka Topic to send it back on (this I _can_ send to the java 
daemon, and configure each 'return' at runtime)
  1. This way, I am imagining many processors can send to Parsey via one 
fixed topic, and they can each wait for the return data via a unique Topic for 
just that processor.
  2. I cannot see a way to adjust the listening Topic at runtime, so the 
user would make one for all processors to use, then enter that as a processor 
attribute.
  3. My simple processor sends a flowfile to it via the topic the user selects 
as a Processor attribute "Send Topic".
  4. The parser, well, _parses_. Then it sends back the reply on  a Topic set 
in the processor as well as the "Receive Topic".
1. Is it better to just do the Kafka transfer in the processor, or hand it 
off to PutKafka & GetKafka? My thinking is that this would be harder to do, 
and I would need to write 2 processors... Thoughts?
  5. The custom processor I'm writing then has the parsed text, but not in a 
format that will allow it to be put into a [graph] database. Knowing a word is 
a NNP isn't enough-you must know which _branch_ on the tree it was (how 
important it is).
1. This is where the [X] extraction counts, or a better mechanism that I'm 
not thinking of.
  6. At this point, I am very tempted to keep going in this processor, but what 
if the user wants HDFS, Titan, ? Best here is to stop & put the results in 
it's own "relationship", with the original text that was parsed in another, and 
perhaps even the 'raw parsed' tree-looking text in another Relationship.
1. So 4 relationships:
  1. Submitted
  2. Post Parsey
  3. Indexed
  4. Failure (of any of 2 or 3)

  

I will make the (Indexed) output of this processor a standard, of sorts, which
another processor can change into a query for the DB of choice. The 'tree
level' could be used for logic like:

  1. NNP/NNPS at [1] is a vertex.
  2. NN/NNS > [2] are destination vertices of the above. 
  3. VBG at ROOT is an edge.
  4. ...

Would it be OK to leave cobbling together their query to INSERT into their DB
of choice to them? Once such a query crafted, they can use any standard Nifi
Put* processor, is my thinking...

  

Your feedback appreciated!

On Jun 6, 2016 3:18 AM, "Conrad Crampton"
<[conrad.cramp...@secdata.com](mailto:conrad.cramp...@secdata.com)>
wrote:  

> Hi,

>

> This may be a long shot as I don’t know how many combinations of the column
lengths with | and + there are, but you could try using ReplaceTextWithMapping
processor where you have all combinations of +--| etc. in a text file with
what they represent in term of counts e.g

>

> +--   [0]

>

> |  +--   [1]

>

> |  +--   [3]

>

> __ __

>

> etc. (tab separated)

>

> __ __

>

> Also, I’m not a particularly experienced in the area of sed, awk etc. but
I’m guessing some bash guru would be able to come up with some sort of script
that does this that could be called from ExcecuteScript processor.

>

> __ __

>

> Regards

>

> Conrad

>

> __ __

>

> **From:  **Pat Trainor
<[pat.trai...@

Re: How to control NiFi logging

2016-06-06 Thread Pat Trainor
Andrew,
Is any of this not standard log4j, and nifi-specific?
On Jun 6, 2016 2:34 AM, "Andrew Psaltis"  wrote:

> You are correct, sorry about that Stephane!
>
> On Mon, Jun 6, 2016 at 8:23 AM, Stéphane Maarek  > wrote:
>
>> Actually, it seems I only needed to change one line (versus two)
>>
>> > class="ch.qos.logback.core.rolling.RollingFileAppender">
>> /mnt/xvdf/logs/nifi-bootstrap.log
>> > class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
>> 
>>
>> ./logs/nifi-bootstrap_%d.log
>>
>> On Mon, Jun 6, 2016 at 4:20 PM Stéphane Maarek 
>> wrote:
>>
>>> Hi Andrew,
>>>
>>> After changing the file I'm getting the following error. I changed the
>>> lines you indicated.
>>>
>>> Failed to auto configure default logger context
>>> Reported exception:
>>> ch.qos.logback.core.joran.spi.JoranException: Problem parsing XML
>>> document. See previously reported errors.
>>> at
>>> ch.qos.logback.core.joran.event.SaxEventRecorder.recordEvents(SaxEventRecorder.java:67)
>>> at
>>> ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:134)
>>> at
>>> ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:99)
>>> at
>>> ch.qos.logback.core.joran.GenericConfigurator.doConfigure(GenericConfigurator.java:49)
>>> at
>>> ch.qos.logback.classic.util.ContextInitializer.configureByResource(ContextInitializer.java:77)
>>> at
>>> ch.qos.logback.classic.util.ContextInitializer.autoConfig(ContextInitializer.java:152)
>>> at
>>> org.slf4j.impl.StaticLoggerBinder.init(StaticLoggerBinder.java:85)
>>> at
>>> org.slf4j.impl.StaticLoggerBinder.(StaticLoggerBinder.java:55)
>>> at org.slf4j.LoggerFactory.bind(LoggerFactory.java:141)
>>> at
>>> org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:120)
>>> at
>>> org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:331)
>>> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:283)
>>> at org.apache.nifi.bootstrap.RunNiFi.(RunNiFi.java:117)
>>> at org.apache.nifi.bootstrap.RunNiFi.main(RunNiFi.java:199)
>>> Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber:
>>> 1; Premature end of file.
>>> at
>>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
>>> at
>>> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:642)
>>> at
>>> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:326)
>>> at
>>> ch.qos.logback.core.joran.event.SaxEventRecorder.recordEvents(SaxEventRecorder.java:61)
>>> ... 13 more
>>> 06:16:11,055 |-INFO in ch.qos.logback.classic.LoggerContext[default] -
>>> Could NOT find resource [logback.groovy]
>>> 06:16:11,055 |-INFO in ch.qos.logback.classic.LoggerContext[default] -
>>> Could NOT find resource [logback-test.xml]
>>> 06:16:11,055 |-INFO in ch.qos.logback.classic.LoggerContext[default] -
>>> Found resource [logback.xml] at [file:/home/ec2-user/nifi/conf/logback.xml]
>>> 06:16:11,127 |-ERROR in
>>> ch.qos.logback.core.joran.event.SaxEventRecorder@682ea4b1 - XML_PARSING
>>> - Parsing fatal error on line 1 and column 1 org.xml.sax.SAXParseException;
>>> lineNumber: 1; columnNumber: 1; Premature end of file.
>>> at org.xml.sax.SAXParseException: Premature end of file.
>>> at  at
>>> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)
>>> at  at
>>> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
>>> at  at
>>> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
>>> at  at
>>> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
>>> at  at
>>> com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1438)
>>> at  at
>>> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1019)
>>> at  at
>>> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
>>> at  at
>>> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:118)
>>> at  at
>>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
>>> at  at
>>> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
>>> at  at
>>> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
>>> at  at
>>> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)

Re: Release Dates

2016-06-06 Thread Pierre Villard
Hi,

regarding JIRA, there are currently issues with apache servers. This should
be fixed very soon.

2016-06-06 9:52 GMT+02:00 ski n :

> Today I was checking the release notes at:
>
> https://cwiki.apache.org/confluence/display/NIFI/Release+Notes
>
> Could the release date be added at the top of every new release version?
> Its
> easier for users to track how often the project is releasing and how
> long ago the latest version was released.
>
> When I check the jira link I got the following error (I am not behind
> a proxy by the way):
>
> The proxy server received an invalid response from an upstream server.
> The proxy server could not handle the request GET
> /jira/secure/ReleaseNote.jspa.
>


Install directory

2016-06-06 Thread ski n
NiFi is unpacked/installed by default in the directory: nifi-0.6.1

My other Apache installations all are grouped together by the Apache
"apache-" prefix.

For example: apache-tomcat-7.0.69

Will this change in future releases (now NiFi is still before the 1.0 release)


Release Dates

2016-06-06 Thread ski n
Today I was checking the release notes at:

https://cwiki.apache.org/confluence/display/NIFI/Release+Notes

Could the release date be added at the top of every new release version? Its
easier for users to track how often the project is releasing and how
long ago the latest version was released.

When I check the jira link I got the following error (I am not behind
a proxy by the way):

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /jira/secure/ReleaseNote.jspa.


Re: Nifi & Parsey McParseface! RegEx in a Processor...

2016-06-06 Thread Conrad Crampton
Hi,
This may be a long shot as I don’t know how many combinations of the column 
lengths with | and + there are, but you could try using ReplaceTextWithMapping 
processor where you have all combinations of +--| etc. in a text file with what 
they represent in term of counts e.g
+--   [0]
|  +--   [1]
|  +--   [3]

etc. (tab separated)

Also, I’m not a particularly experienced in the area of sed, awk etc. but I’m 
guessing some bash guru would be able to come up with some sort of script that 
does this that could be called from ExcecuteScript processor.

Regards
Conrad

From: Pat Trainor 
Reply-To: "users@nifi.apache.org" 
Date: Sunday, 5 June 2016 at 18:33
To: "users@nifi.apache.org" 
Subject: Nifi & Parsey McParseface! RegEx in a Processor...

I have had success with using ReplaceText processor out of the box to modify 
the output of a nifi-called script. I'm applying nifi to running the parsey 
mcparseface system (Syntaxnet) from google. The ouput of the application looks 
like this:

---
Input: It is to two English scholars , father and son , Edward Pococke , senior 
and junior , that the world is indebted for the knowledge of one of the most 
charming productions Arabian philosophy can boast of .
Parse:
is VBZ ROOT
+-- It PRP nsubj
+-- to IN prep
|   +-- scholars NNS pobj
|   +-- two CD num
|   +-- English JJ amod
|   +-- , , punct
|   +-- father NN conj
|   |   +-- and CC cc
|   |   +-- son NN conj
|   +-- Pococke NNP appos
[...]
---

As you can see, my ExecuteProcessorStream is working fine. But there is a bit 
of importance that needs to be taken from this text. My ReplaceText Processor 
used (the first one) is shown in the attached. It only removes characters.

How many 'spaces' each of the '+' signs is is important. Simply removing 
leading spaces, + and | characters moves the first word in each line to the 
first column, without telling you how many columns over the words started in 
the original input.

WHat is needed is a way to count the number of columns in the beginning of each 
line that precedes the first alphanumeric. It doesn't matter if the same 
processor can also clean things out to my present efforts:

Input: It is to two English scholars , father and son , Edward Pococke , senior 
and junior , that the world is indebted for the knowledge of one of the most 
charming productions Arabian philosophy can boast of .
Parse:
is VBZ ROOT
It PRP nsubj
to IN prep
[...]

I am hoping to somehow use the expressions (a la ${line:blah...) in Nifi, or 
another mechanism I'm not aware of, to gather the column count, making it 
available for later processing/storage.

[0]is VBZ ROOT
[1]It PRP nsubj
[1]to IN prep
[2] ...

With the [X] being the # of columns over from the left that the alpha-numeric 
character was.

The reasoning for this is that the position signifies how 'important' that 
attribute is in the sentence. It looks like a tree, but the numer (indentation) 
is the length of the branch the word is on.

Is there a clever way to accomplish most/all of this, either with () regex or 
named attributes, in Nifi?

Thanks!
pat
( ͡° ͜ʖ ͡°)

"A wise man can learn more from a foolish question than a fool can learn from a 
wise answer". ~ Bruce Lee.


***This email originated outside SecureData***

Click here to report 
this email as spam.


SecureData, combating cyber threats
__ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT