Re: Nifi taking forever to start

2017-02-15 Thread Andrew Grande
I'm not sure piggy-backing on the host entropy will work reliably. I have
seen this issue in ec2, openstack boxes, etc. A newly spun up box will
exhibit this issue often.

Andrew

On Wed, Feb 15, 2017, 10:09 AM Bryan Rosander  wrote:

> Hey Arnaud,
>
> Andy's solution is definitely the right answer for Java applications in
> general (on docker or in vm or anywhere with more limited entropy).
>
> A more general way to take care of entropy issues in docker containers
> (applicable beyond NiFi) is to mount the host's /dev/random or /dev/urandom
> as the container's /dev/random. [1]
>
> If you want to use the host's /dev/random, the host machine will likely
> have significantly more entropy:
> -v /dev/random:/dev/random
>
> If you just want to force the container to use your host's /dev/urandom so
> it will never block for entropy (should be fine in the majority of cases
> [2]):
> -v /dev/urandom:/dev/random
>
> [1]
> http://stackoverflow.com/questions/26021181/not-enough-entropy-to-support-dev-random-in-docker-containers-running-in-boot2d#answer-26024403
> [2] http://www.2uo.de/myths-about-urandom/
>
> On Wed, Feb 15, 2017 at 5:15 AM, Andy LoPresto 
> wrote:
>
> Glad this fixed it and sorry it happened in the first place. This one is a
> personal antagonist of mine and I’ll be happy when it’s fixed for everyone.
> Good luck using the project.
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Feb 15, 2017, at 2:09 AM, Arnaud G  wrote:
>
> Hi Andy,
>
> Thank you very much, and indeed it seems that you pointed the right
> problem. The docker is running in a VM and it seems that I had a lack of
> entroy.
>
> I changed the entropy source to /dev/urandom and Nifi was able to start
> immediately.
>
> Thank you very much for your help
>
> Arnaud
>
> On Wed, Feb 15, 2017 at 10:41 AM, Andy LoPresto 
> wrote:
>
> Hi Arnaud,
>
> I’m sorry you are having trouble getting NiFi going. We want to minimize
> any inconvenience and get you up and running quickly.
>
> Are you by any chance running on a VM that does not have access to any
> physical inputs to generate entropy for secure random seeding? There is a
> known issue [1] (being worked on for the next release) where this can cause
> the application to block because insufficient entropy is available (without
> the physical inputs, there is not enough random data to properly seed
> secure operations).
>
> I recommend you check if this the case (run this command in your terminal
> — if it hangs, this is the cause):
>
> head -n 1 /dev/random
>
> If it hangs, follow the instructions on this page [2] to modify the Java
> secure random source (ignore the warning that this is “less secure” — this
> is an urban legend propagated by a misunderstanding in the Linux kernel
> manual pages [3]).
>
> Modify $JAVA_HOME/jre/lib/security/java.security to change
> securerandom.source=file:/dev/random to
> securerandom.source=file:/dev/urandom
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-3313
> [2]
> https://docs.oracle.com/cd/E13209_01/wlcp/wlss30/configwlss/jvmrand.html
> [3] http://www.2uo.de/myths-about-urandom/
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Feb 15, 2017, at 1:29 AM, Arnaud G  wrote:
>
> Hi guys!
>
> I'm trying to play with nifi (1.1.1) in a docker image. I tried different
> configuration (cluster, single node, secured, etc.), however whatever I
> try, Nifi takes forever to start (like 30-45 minutes). This not related to
> data as I observe this behavior even when I instantiate the docker image
> for the first time.
>
> In the log it stops here:
>
> nifi-bootstrap.log
> 2017-02-14 08:52:34,624 INFO [NiFi Bootstrap Command Listener]
> org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for
> Bootstrap requests on port 46553
>
> nifi-app.log
> 2017-02-14 08:53:11,225 INFO [main]
> o.a.nifi.properties.NiFiPropertiesLoader Loaded 121 properties from
> /opt/nifi/./conf/nifi.properties
>
> and then wait for boostraping (if I set up debug log level)
>
> Any idea what may cause this?
>
> Thanks in advance!
>
> AG
>
>
>
>
>
>
>
>


Re: Outputting flowfiles to disk

2017-02-15 Thread Russell Bateman

Or, 3?

   -> MergeContent
  +-> PutFile
   AttributesToJson > MergeContent


Or, 4?

   Join the ranks of custom processor writers and write one to do
   exactly what you want--good idea if this a pretty permanent part of
   your roadmap.


Hope this helps.

Russ

On 02/15/2017 03:18 PM, Kiran wrote:

Hello,
Within my NiFi flows for the error scenarios I would really like the 
option of outputting the flow file to an error directory (the 
outputted file contains the flow file contents and well as the 
attributes).
This way once the error has been resolved I can replay the FlowFile by 
reading it back in which would read the contents as well as the flow 
file attributes.
From looking through the processor list the only way I can see to do 
this is by:
1. Using the AttributesToJson to output the attributes and separately 
output the flowfile contents. Then read both the contents and JSON 
file back in and parse the JSON back into attributes.
2. Use a groovy script within the ExecuteScript processor to combine 
the attributes and contents together. Then output the results to disk. 
When reading it back in use another groovy script to parse the file 
and populate the attributes and contents.

My preferred option is number 2.
Can I confirm that I haven't missed anything obvious.
Thanks,
Brian

 
	Virus-free. www.avast.com 
 







Re: Integration between Apache NiFi and Parquet or Workaround?

2017-02-15 Thread Carlos Paradis
Thank you, both Bryan and Giovanni for giving me so much insight on this
matter.

I see why you would strongly prefer Kite over this, now that I landed on one
tutorial

on kite-dataset and their documentation page .
(thanks for pointing the name out).

I also noticed NiFi-238

 (Pull Request )
has incorporated Kite into Nifi back in 2015 and NiFi-1193
 to Hive in 2016 and made
available 3 processors, but I am confused since they are no longer
available in the documentation ,
rather I only see StoreInKiteDataset
,
which appear to be a new version of what was called 'KiteStorageProcessor'
in the Github, but I don't see the other two.


My original goal was to have one HDFS storage dedicated to raw data alone,
and a second HDFS dedicated to storing pre-processed data and analysis. If
I were to do this with Kite and NiFi, the way I am currently seeing this
being done is:

-
*Raw Data HDFS:*

   - - Apache Nifi
  - A set of GetFile and GetHTTP processors to acquire the data from
  multiple sources we have.
  - A PutHDFS to store the raw data in HDFS.

*Pre-Processed & Analysis HDFS: *


   - - ApacheNifi
  - A set of GetHDFS to get data from *Raw Data HDFS*.
  - A set of ExecuteScript to convert XML files to JSON or CSV.
  - A set of ConvertCSVToAvro and ConvertJSONToAvro as the Kite
  processor requires AVRO format.
  - StoreInKiteDataset to store all data in either Avro or Parquet
  format.
  - - Apache Spark
  - Perform batch jobs to pre-process the data into data analysis sets
  to be exported elsewhere (dashboard, machine learning, etc).

-

However, a few things I am still confused about: (1) is this the best way
to go about storing the raw data? (2) Would ExecuteScript allow for map
reduce? Or would this become a bottleneck? (3) I originally considered
using the Apache Spark Stream module for mini-batches integrated with
ni-fi to at least pre-process the data as it arrives, but I am a bit
unclear on how to go about this now. Would create a port through Ni-Fi be
the way to go? (That is the only way I saw it being done on tutorials).

Thank you,




On Wed, Feb 15, 2017 at 7:02 AM, Giovanni Lanzani <
giovannilanz...@godatadriven.com> wrote:

> Hi Carlos,
>
>
>
> I’m just chiming in, but if I wouldn’t use Kite (disclaimer: I would in
> this case) the workflow would look like this:
>
>
>
> - do stuff with NiFi
>
> - convert flowfiles to Avro
>
> - (optional: merge Avro files)
>
> - PutHDFS into a temp folder
>
> - periodically run Spark on that temp folder to convert to Parquet.
>
>
>
> I believe you can work out the first four points by yourself. The last
> point would just be a Python file that looks like this:
>
>
>
> from pyspark.sql import SparkSession
>
>
>
> spark = (SparkSession.builder
>
>  .appName("Python Spark SQL basic example")
>
>  .config("spark.some.config.option", "some-value")
>
>  .getOrCreate())
>
>
>
> (spark.read.format('com.databricks.spark.avro').load('
> /tmp/path/dataset.avro')
>
>   .write.format('parquet')
>
>   .mode('append')
>
>   .save('/path/to/outfile'))
>
>
>
> You can then periodically invoke this file with spark-submit filename.py
>
>
>
> For optimal usage, I’d explore the options of having the temporary path
> folder partitioned by hour (or day) and then invoke the above script once
> per temporary folder.
>
>
>
> That said, a few remarks:
>
> - this is a rather complicated flow for something so simple. Kite-dataset
> would work better;
>
> - however if you need more complicated processing, you have all the
> options to do so
>
> - as Parquet is columnar storage, having little files is useless. So when
> you’re merging them, make sure you have enough data (>~ 50MB and to several
> tens of GB’s) in the final file;
>
> - The above code is trivially portable to Scala if you prefer, as I’m
> using Python as a mere DSL on top of Spark (no serializations outside the
> JVM).
>
>
>
> Cheers,
>
>
>
> Giovanni
>
>
>
> *From:* Carlos Paradis [mailto:c...@hawaii.edu]
> *Sent:* Tuesday, February 14, 2017 11:12 PM
> *To:* users@nifi.apache.org
> *Subject:* Re: Integration between Apache NiFi and Parquet or Workaround?
>
>
>
> Hi James,
>
>
>
> Thank you for pointing the issue out! :-) I wanted to point out another
> alternative solution to Kite I observed, to hear if you had any insight on
> this approach too if you don't mind.
>

Re: Outputting flowfiles to disk

2017-02-15 Thread Joe Witt
Hello Brian

A good way to do this pattern is to use 'MergeContent' and set the Merge
Format to Flow File Stream, v3.  This way the errors are bundled together
nicely/efficiently.  Ensure it gets a unique filename whenever it is dumped
on disk too.

When you read that bundle/file back off disk you put it through
UnpackContent with flowfile stream v3.

There are a lot of strategies for this sort of dead-letter queue behavior
so if you want to talk through it more feel free to do so.  But for what
you asked the above will do it.

Thanks
Joe

On Wed, Feb 15, 2017 at 5:18 PM, Kiran 
wrote:

> Hello,
>
> Within my NiFi flows for the error scenarios I would really like the
> option of outputting the flow file to an error directory (the outputted
> file contains the flow file contents and well as the attributes).
>
> This way once the error has been resolved I can replay the FlowFile by
> reading it back in which would read the contents as well as the flow file
> attributes.
>
> From looking through the processor list the only way I can see to do this
> is by:
> 1. Using the AttributesToJson to output the attributes and separately
> output the flowfile contents. Then read both the contents and JSON file
> back in and parse the JSON back into attributes.
> 2. Use a groovy script within the ExecuteScript processor to combine the
> attributes and contents together. Then output the results to disk. When
> reading it back in use another groovy script to parse the file and populate
> the attributes and contents.
>
> My preferred option is number 2.
>
> Can I confirm that I haven't missed anything obvious.
>
> Thanks,
>
> Brian
>
>
>
>
> 
>  Virus-free.
> www.avast.com
> 
>


Outputting flowfiles to disk

2017-02-15 Thread Kiran

Hello,

Within my NiFi flows for the error scenarios I would really like the
option of outputting the flow file to an error directory (the outputted
file contains the flow file contents and well as the attributes).

This way once the error has been resolved I can replay the FlowFile by
reading it back in which would read the contents as well as the flow
file attributes.


From looking through the processor list the only way I can see to do

this is by:
1. Using the AttributesToJson to output the attributes and separately
output the flowfile contents. Then read both the contents and JSON file
back in and parse the JSON back into attributes.
2. Use a groovy script within the ExecuteScript processor to combine the
attributes and contents together. Then output the results to disk. When
reading it back in use another groovy script to parse the file and
populate the attributes and contents.

My preferred option is number 2.

Can I confirm that I haven't missed anything obvious.

Thanks,

Brian



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


Re[2]: MergeContent across a NiFi cluster

2017-02-15 Thread Kiran

Thanks for the reply Joe.

I'm glad I wasn't missing something obvious. I'm afraid I'm stuck with
file size limitation but I'll have a word with the guys who configure
the load balancer to see what affinity options they have.

Thanks

Brian

-- Original Message --
From: "Joe Witt" 
To: users@nifi.apache.org; "Kiran" 
Sent: 15/02/2017 21:36:41
Subject: Re: MergeContent across a NiFi cluster


Brian,

Great use case and you're right we don't have an easy way of handling
this now.  If you do indeed have a load balancer in front of the
receiving nifi cluster and it can support affinity of some kind then it
is possible you can set a header in HTTP Post I believe which would
come from a flowfile attribute which would be on each split and would
be the hash of its full object.  If the load balancer ensured all
splits (based on that header matching) were on the same machine then
you'd be in business.  There are some load balancers that do this (i'm
thinking of a commercial one).  But, I admit that is a lot of moving
parts to keep in mind.  We need to improve our site-to-site feature to
do things like automatically split content for you and handle the
partitioning/affinity logic I suggested.  You might also consider
avoiding the splitting for now to keep things super simple though I
recognize that exposes alternative tradeoffs.

Great case for us to work on/rally around though.

Thanks
Joe

On Wed, Feb 15, 2017 at 4:29 PM, Kiran 
wrote:

Hello,

I need to send data from one organisation to another but there are
data
size limits between them (this isn't my choice and has been enforced
on
me). I've got a 4 node NiFi cluster in each organisation.

The sending NiFi cluster has the following data flow:
Ingest the data by various means
   -> Compress Data using CompressContent
 -> If file size > X amount I use SplitContent
   -> HTTPS POST to load balancer sitting in front of the NiFi
cluster in the other organisation

On the receiving NiFi cluster I wanted to:
-> Receive the data
   -> MergeContent
 -> Do what ever else with the data...

The problem I can't get round is that if I split the content into 3
fragments and send them to the receiving NiFi instance because it's
behind a load balancer I can't guarantee that the 3 fragments are
received by the same node.

Q1) I'm assuming that for MergeContent to work all the fragments of a
single piece of data have to arrive on the same NiFi node or is there
a
option to have it working across a cluster?

Q2) How long does the MergeContent processor wait for all the
fragments?
If one of the fragments gets lost does it timeout after a certain
period?

I was thinking one way to solve this of to have the HTTPListener on
the
receiving NiFi only listening on the primary node which would ensure
all
the fragments arrive on the same node. The downside would be that I
end
up with idle NiFi nodes.

Is there anything obvious that I'm missed that would solve my issue?

Thanks in advance,

Brian

Virus-free. www.avast.com




---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


Re: MergeContent across a NiFi cluster

2017-02-15 Thread Joe Witt
Brian,

Great use case and you're right we don't have an easy way of handling this
now.  If you do indeed have a load balancer in front of the receiving nifi
cluster and it can support affinity of some kind then it is possible you
can set a header in HTTP Post I believe which would come from a flowfile
attribute which would be on each split and would be the hash of its full
object.  If the load balancer ensured all splits (based on that header
matching) were on the same machine then you'd be in business.  There are
some load balancers that do this (i'm thinking of a commercial one).  But,
I admit that is a lot of moving parts to keep in mind.  We need to improve
our site-to-site feature to do things like automatically split content for
you and handle the partitioning/affinity logic I suggested.  You might also
consider avoiding the splitting for now to keep things super simple though
I recognize that exposes alternative tradeoffs.

Great case for us to work on/rally around though.

Thanks
Joe

On Wed, Feb 15, 2017 at 4:29 PM, Kiran 
wrote:

> Hello,
>
> I need to send data from one organisation to another but there are data
> size limits between them (this isn't my choice and has been enforced on
> me). I've got a 4 node NiFi cluster in each organisation.
>
> The sending NiFi cluster has the following data flow:
> Ingest the data by various means
>-> Compress Data using CompressContent
>  -> If file size > X amount I use SplitContent
>-> HTTPS POST to load balancer sitting in front of the NiFi
> cluster in the other organisation
>
> On the receiving NiFi cluster I wanted to:
> -> Receive the data
>-> MergeContent
>  -> Do what ever else with the data...
>
> The problem I can't get round is that if I split the content into 3
> fragments and send them to the receiving NiFi instance because it's
> behind a load balancer I can't guarantee that the 3 fragments are
> received by the same node.
>
> Q1) I'm assuming that for MergeContent to work all the fragments of a
> single piece of data have to arrive on the same NiFi node or is there a
> option to have it working across a cluster?
>
> Q2) How long does the MergeContent processor wait for all the fragments?
> If one of the fragments gets lost does it timeout after a certain
> period?
>
> I was thinking one way to solve this of to have the HTTPListener on the
> receiving NiFi only listening on the primary node which would ensure all
> the fragments arrive on the same node. The downside would be that I end
> up with idle NiFi nodes.
>
> Is there anything obvious that I'm missed that would solve my issue?
>
> Thanks in advance,
>
> Brian
>
>
> 
>  Virus-free.
> www.avast.com
> 
>


MergeContent across a NiFi cluster

2017-02-15 Thread Kiran

Hello,

I need to send data from one organisation to another but there are data
size limits between them (this isn't my choice and has been enforced on
me). I've got a 4 node NiFi cluster in each organisation.

The sending NiFi cluster has the following data flow:
Ingest the data by various means
   -> Compress Data using CompressContent
 -> If file size > X amount I use SplitContent
   -> HTTPS POST to load balancer sitting in front of the NiFi
cluster in the other organisation

On the receiving NiFi cluster I wanted to:
-> Receive the data
   -> MergeContent
 -> Do what ever else with the data...

The problem I can't get round is that if I split the content into 3
fragments and send them to the receiving NiFi instance because it's
behind a load balancer I can't guarantee that the 3 fragments are
received by the same node.

Q1) I'm assuming that for MergeContent to work all the fragments of a
single piece of data have to arrive on the same NiFi node or is there a
option to have it working across a cluster?

Q2) How long does the MergeContent processor wait for all the fragments?
If one of the fragments gets lost does it timeout after a certain
period?

I was thinking one way to solve this of to have the HTTPListener on the
receiving NiFi only listening on the primary node which would ensure all
the fragments arrive on the same node. The downside would be that I end
up with idle NiFi nodes.

Is there anything obvious that I'm missed that would solve my issue?

Thanks in advance,

Brian

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


RE: Integration between Apache NiFi and Parquet or Workaround?

2017-02-15 Thread Giovanni Lanzani
Hi Carlos,

I’m just chiming in, but if I wouldn’t use Kite (disclaimer: I would in this 
case) the workflow would look like this:

- do stuff with NiFi
- convert flowfiles to Avro
- (optional: merge Avro files)
- PutHDFS into a temp folder
- periodically run Spark on that temp folder to convert to Parquet.

I believe you can work out the first four points by yourself. The last point 
would just be a Python file that looks like this:

from pyspark.sql import SparkSession

spark = (SparkSession.builder
 .appName("Python Spark SQL basic example")
 .config("spark.some.config.option", "some-value")
 .getOrCreate())

(spark.read.format('com.databricks.spark.avro').load('/tmp/path/dataset.avro')
  .write.format('parquet')
  .mode('append')
  .save('/path/to/outfile'))

You can then periodically invoke this file with spark-submit filename.py

For optimal usage, I’d explore the options of having the temporary path folder 
partitioned by hour (or day) and then invoke the above script once per 
temporary folder.

That said, a few remarks:
- this is a rather complicated flow for something so simple. Kite-dataset would 
work better;
- however if you need more complicated processing, you have all the options to 
do so
- as Parquet is columnar storage, having little files is useless. So when 
you’re merging them, make sure you have enough data (>~ 50MB and to several 
tens of GB’s) in the final file;
- The above code is trivially portable to Scala if you prefer, as I’m using 
Python as a mere DSL on top of Spark (no serializations outside the JVM).

Cheers,

Giovanni

From: Carlos Paradis [mailto:c...@hawaii.edu]
Sent: Tuesday, February 14, 2017 11:12 PM
To: users@nifi.apache.org
Subject: Re: Integration between Apache NiFi and Parquet or Workaround?

Hi James,

Thank you for pointing the issue out! :-) I wanted to point out another 
alternative solution to Kite I observed, to hear if you had any insight on this 
approach too if you don't mind.

When I saw a presentation of Ni-Fi and Parquet being used in a guest project, 
although not many details implementation wise were discussed, it was mentioned 
using also Apache Spark (apparently only) leaving a port from Ni-Fi to read in 
the data. Someone in Hortonworks posted a tutorial on 
it
 
(github)
 on Jan 2016 that seems to head towards that direction.

The configuration looked as follows according to the tutorial's image:

https://community.hortonworks.com/storage/attachments/1669-screen-shot-2016-01-31-at-21029-pm.png

The group presentation also used Spark, but I am not sure if they used the same 
port approach, this is all I have:

PackageToParquetRunner <-> getFilePaths() <-> datalake [RDD ]

PackageToParquetRunner -> FileProcessorClass -> RDD Filter -> RDDflatMap -> 
RDDMap -> RDD  -> PackageToParquetRunner -> Create Data Frame (SQL 
Context) -> Write Parquet (DataFrame).

When you say,

then running periodic jobs to build Parquet data sets.

Would such Spark setup be the case as period jobs? I am minimally acquainted on 
how Spark goes about MapReduce using RDDs, but I am not certain to what extent 
it would support the NiFi pipeline for such purpose (not to mention, on the way 
it appears, seems to leave a hole in NiFi diagram as a port, which makes it 
unable to monitor for data provenance).

---

Do you think these details and Kite details would be worth mentioning as a 
comment on the JIRA issue you pointed out?

Thanks!


On Tue, Feb 14, 2017 at 11:46 AM, James Wing 
mailto:jvw...@gmail.com>> wrote:
Carlos,
Welcome to NiFi!  I believe the Kite dataset is currently the most direct, 
built-in solution for writing Parquet files from NiFi.

I'm not an expert on Parquet, but I understand columnar formats like Parquet 
and ORC are not easily written to in the incremental, streaming fashion that 
NiFi excels at (I hope writing this will prompt expert correction).  Other 
alternatives typically involve NiFi writing to more stream-friendly data stores 
or formats directly, then running periodic jobs to build Parquet data sets.  
Hive, Drill, and similar tools can do this.

You are certainly not alone in wanting better Parquet support, there is at 
least one JIRA ticket for it as well:

Add processors for Google Cloud Storage Fetch/Put/Delete
https://issues.apache.org/jira/browse/NIFI-2725
You might want to chime in with some details of your use case, or create a new 
ticket if that's not a fit for you.

Thanks,
James

On Mon, Feb 13, 2017 at 3:13 PM, Carlos Paradis 
mailto:c...@hawaii.edu>> wrote:
Hi,

Our group has recently started trying to prototype a setup of 
Hadoop+Spark+NiFi+Parquet and I have been having trouble finding any 
documentation other than a scarce discussion on using Kite as a workaround 

Re: NiFi Users: Powered by NiFi page

2017-02-15 Thread Aldrin Piri
Hi Giovanni,

GoDataDriven has been included.

Thanks!

On Wed, Feb 15, 2017 at 11:48 AM, Giovanni Lanzani <
giovannilanz...@godatadriven.com> wrote:

> Hi Joe,
>
> You can put GoDataDriven (https://godatadriven.com)
> Summary: GoDataDriven, a Dutch service company in the data science and
> engineering space, helps customers ingest and process data in real time
> from the most disparate devices (including but not limited to trains!).
>
> Cheers,
>
> Giovanni
>
> > -Original Message-
> > From: Joe Witt [mailto:joe.w...@gmail.com]
> > Sent: Tuesday, February 14, 2017 8:07 PM
> > To: users@nifi.apache.org
> > Subject: NiFi Users: Powered by NiFi page
> >
> > NiFi Users
> >
> > I just realized we have a 'powered by nifi' page.  It looks a
> little...light :-).  So
> > wanted to reach out and offer to anyone interested that if you reply
> back on
> > this thread with your company/organization that you'd like referenced on
> there
> > I'd be happy to put in the change to the site for you.
> >
> > We are aware of a very large user base and obviously can see quite a bit
> of this
> > on the Internet but I don't think we can/should put that unless folks
> volunteer
> > to have this on the page.  So please let us know if you're interested in
> your
> > company/organizational use of NiFi being mentioned there.
> >
> > Thanks
> > Joe
>


RE: NiFi Users: Powered by NiFi page

2017-02-15 Thread Giovanni Lanzani
Hi Joe,

You can put GoDataDriven (https://godatadriven.com)
Summary: GoDataDriven, a Dutch service company in the data science and 
engineering space, helps customers ingest and process data in real time from 
the most disparate devices (including but not limited to trains!).

Cheers,

Giovanni

> -Original Message-
> From: Joe Witt [mailto:joe.w...@gmail.com]
> Sent: Tuesday, February 14, 2017 8:07 PM
> To: users@nifi.apache.org
> Subject: NiFi Users: Powered by NiFi page
> 
> NiFi Users
> 
> I just realized we have a 'powered by nifi' page.  It looks a little...light 
> :-).  So
> wanted to reach out and offer to anyone interested that if you reply back on
> this thread with your company/organization that you'd like referenced on there
> I'd be happy to put in the change to the site for you.
> 
> We are aware of a very large user base and obviously can see quite a bit of 
> this
> on the Internet but I don't think we can/should put that unless folks 
> volunteer
> to have this on the page.  So please let us know if you're interested in your
> company/organizational use of NiFi being mentioned there.
> 
> Thanks
> Joe


Re: Nifi taking forever to start

2017-02-15 Thread Bryan Rosander
Hey Arnaud,

Andy's solution is definitely the right answer for Java applications in
general (on docker or in vm or anywhere with more limited entropy).

A more general way to take care of entropy issues in docker containers
(applicable beyond NiFi) is to mount the host's /dev/random or /dev/urandom
as the container's /dev/random. [1]

If you want to use the host's /dev/random, the host machine will likely
have significantly more entropy:
-v /dev/random:/dev/random

If you just want to force the container to use your host's /dev/urandom so
it will never block for entropy (should be fine in the majority of cases
[2]):
-v /dev/urandom:/dev/random

[1]
http://stackoverflow.com/questions/26021181/not-enough-entropy-to-support-dev-random-in-docker-containers-running-in-boot2d#answer-26024403
[2] http://www.2uo.de/myths-about-urandom/

On Wed, Feb 15, 2017 at 5:15 AM, Andy LoPresto  wrote:

> Glad this fixed it and sorry it happened in the first place. This one is a
> personal antagonist of mine and I’ll be happy when it’s fixed for everyone.
> Good luck using the project.
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Feb 15, 2017, at 2:09 AM, Arnaud G  wrote:
>
> Hi Andy,
>
> Thank you very much, and indeed it seems that you pointed the right
> problem. The docker is running in a VM and it seems that I had a lack of
> entroy.
>
> I changed the entropy source to /dev/urandom and Nifi was able to start
> immediately.
>
> Thank you very much for your help
>
> Arnaud
>
> On Wed, Feb 15, 2017 at 10:41 AM, Andy LoPresto 
> wrote:
>
>> Hi Arnaud,
>>
>> I’m sorry you are having trouble getting NiFi going. We want to minimize
>> any inconvenience and get you up and running quickly.
>>
>> Are you by any chance running on a VM that does not have access to any
>> physical inputs to generate entropy for secure random seeding? There is a
>> known issue [1] (being worked on for the next release) where this can cause
>> the application to block because insufficient entropy is available (without
>> the physical inputs, there is not enough random data to properly seed
>> secure operations).
>>
>> I recommend you check if this the case (run this command in your terminal
>> — if it hangs, this is the cause):
>>
>> head -n 1 /dev/random
>>
>> If it hangs, follow the instructions on this page [2] to modify the Java
>> secure random source (ignore the warning that this is “less secure” — this
>> is an urban legend propagated by a misunderstanding in the Linux kernel
>> manual pages [3]).
>>
>> Modify $JAVA_HOME/jre/lib/security/java.security to change
>> securerandom.source=file:/dev/random to securerandom.
>> source=file:/dev/urandom
>>
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-3313
>> [2] https://docs.oracle.com/cd/E13209_01/wlcp/wlss30/configw
>> lss/jvmrand.html
>> [3] http://www.2uo.de/myths-about-urandom/
>>
>> Andy LoPresto
>> alopre...@apache.org
>> *alopresto.apa...@gmail.com *
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> On Feb 15, 2017, at 1:29 AM, Arnaud G  wrote:
>>
>> Hi guys!
>>
>> I'm trying to play with nifi (1.1.1) in a docker image. I tried different
>> configuration (cluster, single node, secured, etc.), however whatever I
>> try, Nifi takes forever to start (like 30-45 minutes). This not related to
>> data as I observe this behavior even when I instantiate the docker image
>> for the first time.
>>
>> In the log it stops here:
>>
>> nifi-bootstrap.log
>> 2017-02-14 08:52:34,624 INFO [NiFi Bootstrap Command Listener]
>> org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening
>> for Bootstrap requests on port 46553
>>
>> nifi-app.log
>> 2017-02-14 08:53:11,225 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader
>> Loaded 121 properties from /opt/nifi/./conf/nifi.properties
>>
>> and then wait for boostraping (if I set up debug log level)
>>
>> Any idea what may cause this?
>>
>> Thanks in advance!
>>
>> AG
>>
>>
>>
>>
>>
>
>


Re: Returning Responses to Http Requests

2017-02-15 Thread Mark Payne
Jim,

When you configure your HandleHttpRequest processor, there is a property for 
the HttpContextMap to use.
Within the Standard Http Context Map you can configure a property named 
"Request Expiration". By default,
it is set to 1 minute. If any request is not handled within that time limit, it 
will automatically be responded to
with a "500: Service Unavailable" response. If you expect your service to 
return a response more quickly than
1 minute it may make sense to change that to a smaller value so that they 
receive the response more quickly.
Of course, if your dataflow is pretty expensive to operate and takes longer 
than that, you may be rejecting
requests that are still processing.

If your engineers are not getting a response back, then perhaps there's a bug 
somewhere. However, if they are
simply timing out, then it may be that they are not waiting long enough. They 
could either extend their timeouts,
or you can reduce the Request Expiration to a shorter period.

Thanks
-Mark


> On Feb 15, 2017, at 9:51 AM, Bryan Bende  wrote:
> 
> During this time when some of the steps are stopped, could just
> connect your HandleHttpRequest to a different path through the flow
> that returns an unavailable, and then when everything is back to
> normal connect it back to the regular path?
> 
> 
> 
> On Mon, Feb 13, 2017 at 8:15 AM, James McMahon  wrote:
>> Thank you for replying Bryan. Yes sir, I can elaborate. Applications are
>> trying to post content to my NiFi instance to have it processed to
>> downstream systems. Occasionally there are legitimate reasons why we might
>> have the steps stopped. Routine administration, that sort of activity. In
>> such cases - NiFi is up, we still have jetty running, but a processor or the
>> processors in a workflow are stopped, isn't there a way to intercept the
>> Http POST request at the jetty server level to send a clean reply, such as
>> Unavailable?
>> 
>> The software engineers responsible for the POSTing applications are asking
>> for this. Apparently it is cleaner and preferred to have their software
>> field a standard Http Reply to their POST than to wait for an indefinite
>> period of time for the equivalent of a timeout error.
>> 
>> Thanks again for your help. -Jim
>> 
>> On Thu, Feb 9, 2017 at 10:34 AM, Bryan Bende  wrote:
>>> 
>>> Hello,
>>> 
>>> I'm not sure I fully understand the question...
>>> 
>>> You would need HandletHttpRequest -> some processors ->
>>> HandleHttpResponse all in a running state in order for someone to
>>> receive a response.
>>> 
>>> Can you elaborate on what you are trying to do?
>>> 
>>> Thanks,
>>> 
>>> Bryan
>>> 
>>> On Thu, Feb 9, 2017 at 4:32 AM, James McMahon 
>>> wrote:
 Good morning. The first processor in my workflow is a HandleHttpRequest.
 How
 do we set up to send a HandleHttpResponse if that processor is stopped
 and
 so not in a running state?
 
 Thank you. -Jim Mc.
>> 
>> 



Re: Returning Responses to Http Requests

2017-02-15 Thread Bryan Bende
During this time when some of the steps are stopped, could just
connect your HandleHttpRequest to a different path through the flow
that returns an unavailable, and then when everything is back to
normal connect it back to the regular path?



On Mon, Feb 13, 2017 at 8:15 AM, James McMahon  wrote:
> Thank you for replying Bryan. Yes sir, I can elaborate. Applications are
> trying to post content to my NiFi instance to have it processed to
> downstream systems. Occasionally there are legitimate reasons why we might
> have the steps stopped. Routine administration, that sort of activity. In
> such cases - NiFi is up, we still have jetty running, but a processor or the
> processors in a workflow are stopped, isn't there a way to intercept the
> Http POST request at the jetty server level to send a clean reply, such as
> Unavailable?
>
> The software engineers responsible for the POSTing applications are asking
> for this. Apparently it is cleaner and preferred to have their software
> field a standard Http Reply to their POST than to wait for an indefinite
> period of time for the equivalent of a timeout error.
>
> Thanks again for your help. -Jim
>
> On Thu, Feb 9, 2017 at 10:34 AM, Bryan Bende  wrote:
>>
>> Hello,
>>
>> I'm not sure I fully understand the question...
>>
>> You would need HandletHttpRequest -> some processors ->
>> HandleHttpResponse all in a running state in order for someone to
>> receive a response.
>>
>> Can you elaborate on what you are trying to do?
>>
>> Thanks,
>>
>> Bryan
>>
>> On Thu, Feb 9, 2017 at 4:32 AM, James McMahon 
>> wrote:
>> > Good morning. The first processor in my workflow is a HandleHttpRequest.
>> > How
>> > do we set up to send a HandleHttpResponse if that processor is stopped
>> > and
>> > so not in a running state?
>> >
>> > Thank you. -Jim Mc.
>
>


Re: Sentry & NIFI

2017-02-15 Thread Bryan Bende
Hello,

Are you talking about sentry.io?

>From Googling, it looks like they have logback support [1], and NiFi
uses logback for logging, so theoretically it could work.

You would have to add the raven-logback JAR and all of its transitive
dependencies to the lib directory of NiFi, and then configure an
appender in logback.xml as shown in the sentry docs.

This could be dangerous in that any JARs in the lib directory of NiFi
will impact all NARs. For example, raven has a dependency on
jackson-core-2.5.0.jar, so by putting that in the lib all NARs will
now see Jackson 2.5.0 before seeing the Jackson that might be bundled
in the NAR which will likely be problematic.

Thanks,

Bryan

[1] https://docs.sentry.io/clients/java/modules/logback/

On Wed, Feb 15, 2017 at 5:24 AM, Alessio Palma
 wrote:
> Hello all,
>
>
> is there a simple way to connect nifi to sentry using the log appender
> facility?
>
>


Sentry & NIFI

2017-02-15 Thread Alessio Palma
Hello all,


is there a simple way to connect nifi to sentry using the log appender facility?



Re: How to avoid this splitting of single line as multi lines in SplitText?

2017-02-15 Thread prabhu Mahendran
Andy,

I have used following properties in ReplaceText processor.

Search Value:"(.*?)(\n)(.*?)"

Replacement Value:"$1\\n$3"

Character Set:UTF-8

MaximumBuffer Size:1MB

Replacement Strategy:Regex Replace

Evaluation Mode:Entire Text


Result of this processor same as like input.It could n't perform any change.

Thanks,
prabhu

On Wed, Feb 15, 2017 at 12:35 PM, Andy LoPresto 
wrote:

> Prabhu,
>
> I answered this on Stack Overflow [1] but I think you could do it with
> ReplaceText before the SplitText using a regex like
>
> "(.*?)(\n)(.*?)" replaced with "$1\\n$3"
>
> [1] http://stackoverflow.com/a/42242665/70465
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Feb 14, 2017, at 10:52 PM, Lee Laim  wrote:
>
> Prabhu,
>
> You need to remove the new lines from within the last field.  I'd
> recommend using awk in an execute stream command processor first, then
> splitting the text.  Alternatively, you could write a custom processor to
> specifically handle the incoming data.
>
> Lee
>
> On Feb 14, 2017, at 11:01 PM, prabhu Mahendran 
> wrote:
>
> I have CSV file which contains following line.
>
> No,NAme,ID,Description
> 1,Stack,232,"ABCDEFGHIJKLMNO
>  -- Jiuaslkm asdasdasd"
>
> used below processor structure GetFile-->SplitText
>
> In SplitText i have given header and line split count as 1.
>
> So i think it could be split row as below..,
>
>  No,NAme,ID,Description
> 1,Stack,232,"ABCDEFGHIJKLMNO
>  -- Jiuaslkm asdasdasd:"
>
> But it actually split the csv as "2" splits like below.,
>
> *First SPlit:*
>
> No,NAme,ID,Description
> 1,Stack,232,"ABCDEFGHIJKLMNO
>
> *Second Split:*
>
> No,NAme,ID,Description
> -- Jiuaslkm asdasdasd"
>
> So i have faced data handling missed something.
>
> *GOal:Now i need to handle those data lines as single line.*
>
> Any one help me to resolve this?
>
>
>


Re: Nifi taking forever to start

2017-02-15 Thread Andy LoPresto
Glad this fixed it and sorry it happened in the first place. This one is a 
personal antagonist of mine and I’ll be happy when it’s fixed for everyone. 
Good luck using the project.

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Feb 15, 2017, at 2:09 AM, Arnaud G  wrote:
> 
> Hi Andy,
> 
> Thank you very much, and indeed it seems that you pointed the right problem. 
> The docker is running in a VM and it seems that I had a lack of entroy.
> 
> I changed the entropy source to /dev/urandom and Nifi was able to start 
> immediately.
> 
> Thank you very much for your help
> 
> Arnaud
> 
> On Wed, Feb 15, 2017 at 10:41 AM, Andy LoPresto  > wrote:
> Hi Arnaud,
> 
> I’m sorry you are having trouble getting NiFi going. We want to minimize any 
> inconvenience and get you up and running quickly.
> 
> Are you by any chance running on a VM that does not have access to any 
> physical inputs to generate entropy for secure random seeding? There is a 
> known issue [1] (being worked on for the next release) where this can cause 
> the application to block because insufficient entropy is available (without 
> the physical inputs, there is not enough random data to properly seed secure 
> operations).
> 
> I recommend you check if this the case (run this command in your terminal — 
> if it hangs, this is the cause):
> 
> head -n 1 /dev/random
> If it hangs, follow the instructions on this page [2] to modify the Java 
> secure random source (ignore the warning that this is “less secure” — this is 
> an urban legend propagated by a misunderstanding in the Linux kernel manual 
> pages [3]).
> 
> Modify $JAVA_HOME/jre/lib/security/java.security to change 
> securerandom.source=file:/dev/random to securerandom.source=file:/dev/urandom
> 
> 
> [1] https://issues.apache.org/jira/browse/NIFI-3313 
> 
> [2] https://docs.oracle.com/cd/E13209_01/wlcp/wlss30/configwlss/jvmrand.html 
> 
> [3] http://www.2uo.de/myths-about-urandom/ 
> 
> 
> Andy LoPresto
> alopre...@apache.org 
> alopresto.apa...@gmail.com 
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Feb 15, 2017, at 1:29 AM, Arnaud G > > wrote:
>> 
>> Hi guys!
>> 
>> I'm trying to play with nifi (1.1.1) in a docker image. I tried different 
>> configuration (cluster, single node, secured, etc.), however whatever I try, 
>> Nifi takes forever to start (like 30-45 minutes). This not related to data 
>> as I observe this behavior even when I instantiate the docker image for the 
>> first time.
>> 
>> In the log it stops here:
>> 
>> nifi-bootstrap.log
>> 2017-02-14 08:52:34,624 INFO [NiFi Bootstrap Command Listener] 
>> org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for 
>> Bootstrap requests on port 46553
>> 
>> nifi-app.log
>> 2017-02-14 08:53:11,225 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader 
>> Loaded 121 properties from /opt/nifi/./conf/nifi.properties
>> 
>> and then wait for boostraping (if I set up debug log level)
>> 
>> Any idea what may cause this?
>> 
>> Thanks in advance!
>> 
>> AG
>> 
>> 
>> 
> 
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Nifi taking forever to start

2017-02-15 Thread Arnaud G
Hi Andy,

Thank you very much, and indeed it seems that you pointed the right
problem. The docker is running in a VM and it seems that I had a lack of
entroy.

I changed the entropy source to /dev/urandom and Nifi was able to start
immediately.

Thank you very much for your help

Arnaud

On Wed, Feb 15, 2017 at 10:41 AM, Andy LoPresto 
wrote:

> Hi Arnaud,
>
> I’m sorry you are having trouble getting NiFi going. We want to minimize
> any inconvenience and get you up and running quickly.
>
> Are you by any chance running on a VM that does not have access to any
> physical inputs to generate entropy for secure random seeding? There is a
> known issue [1] (being worked on for the next release) where this can cause
> the application to block because insufficient entropy is available (without
> the physical inputs, there is not enough random data to properly seed
> secure operations).
>
> I recommend you check if this the case (run this command in your terminal
> — if it hangs, this is the cause):
>
> head -n 1 /dev/random
>
> If it hangs, follow the instructions on this page [2] to modify the Java
> secure random source (ignore the warning that this is “less secure” — this
> is an urban legend propagated by a misunderstanding in the Linux kernel
> manual pages [3]).
>
> Modify $JAVA_HOME/jre/lib/security/java.security to change
> securerandom.source=file:/dev/random to securerand
> om.source=file:/dev/urandom
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-3313
> [2] https://docs.oracle.com/cd/E13209_01/wlcp/wlss30/
> configwlss/jvmrand.html
> [3] http://www.2uo.de/myths-about-urandom/
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Feb 15, 2017, at 1:29 AM, Arnaud G  wrote:
>
> Hi guys!
>
> I'm trying to play with nifi (1.1.1) in a docker image. I tried different
> configuration (cluster, single node, secured, etc.), however whatever I
> try, Nifi takes forever to start (like 30-45 minutes). This not related to
> data as I observe this behavior even when I instantiate the docker image
> for the first time.
>
> In the log it stops here:
>
> nifi-bootstrap.log
> 2017-02-14 08:52:34,624 INFO [NiFi Bootstrap Command Listener]
> org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening
> for Bootstrap requests on port 46553
>
> nifi-app.log
> 2017-02-14 08:53:11,225 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader
> Loaded 121 properties from /opt/nifi/./conf/nifi.properties
>
> and then wait for boostraping (if I set up debug log level)
>
> Any idea what may cause this?
>
> Thanks in advance!
>
> AG
>
>
>
>
>


Re: Exporting and importing workflows, and restoring from backup snapshots

2017-02-15 Thread Andy LoPresto
Jim,

If you have or create an account with Confluence, you can receive 
notifications. There is a “Watch” button at the top right of each page which 
will subscribe you to email notifications when the page changes.

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Feb 13, 2017, at 5:19 AM, James McMahon  wrote:
> 
> Thank you very much Bryan. I will look more closely for archives, and into 
> transfers for the time being using flow.xml.gz. I also hope to see if there's 
> a way to be alerted to changes on the Confluence page you cited in your 
> reply. -Jim
> 
> On Thu, Feb 9, 2017 at 10:40 AM, Bryan Bende  > wrote:
> Hello,
> 
> Moving flows between NiFi instances is definitely something the
> community is working to improve [1].
> 
> Currently there are two main options: templates, or moving the whole
> flow.xml.gz.
> 
> You can create templates of process groups and then import them into
> another instance, some people have gone as far as automating this with
> scripts using the REST API.
> 
> If you want to whole sale move the flow you can just copy
> conf/flow.xml.gz from one instance to another, keeping in mind that if
> you have processors with sensitive properties, you will need the same
> encryption key in both environments.
> 
> I believe the backups work similarly to copying the flow.xml.gz in
> that there should be an archive directory with copies of the
> flow.xml.gz from different snapshots, and you can stop NiFi and copy
> one of them into conf as the flow.xml.gz. Maybe someone can correct me
> if I am wrong here.
> 
> Thanks.
> 
> Bryan
> 
> 
> [1] 
> https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
>  
> 
> 
> On Thu, Feb 9, 2017 at 4:36 AM, James McMahon  > wrote:
> > Good morning. How can we export and import our workflow backup snapshots
> > from one established NiFi server instance to a new instance of NiFi stood up
> > on a new server?
> >
> > Also, how does one restore a previous version of a workflow backup snapshot
> > created from the administrative tool?
> >
> > Thank you. -Jim
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Nifi taking forever to start

2017-02-15 Thread Andy LoPresto
If this is not the issue, can you try starting NiFi and then run the following 
command to generate a thread dump and provide that to the lists? It will 
greatly help us determine the issue you are encountering. Thanks.

$ jcmd Thread.print

or

$ ./bin/nifi.sh dump filewithdump.txt

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Feb 15, 2017, at 1:41 AM, Andy LoPresto  wrote:
> 
> Hi Arnaud,
> 
> I’m sorry you are having trouble getting NiFi going. We want to minimize any 
> inconvenience and get you up and running quickly.
> 
> Are you by any chance running on a VM that does not have access to any 
> physical inputs to generate entropy for secure random seeding? There is a 
> known issue [1] (being worked on for the next release) where this can cause 
> the application to block because insufficient entropy is available (without 
> the physical inputs, there is not enough random data to properly seed secure 
> operations).
> 
> I recommend you check if this the case (run this command in your terminal — 
> if it hangs, this is the cause):
> 
> head -n 1 /dev/random
> If it hangs, follow the instructions on this page [2] to modify the Java 
> secure random source (ignore the warning that this is “less secure” — this is 
> an urban legend propagated by a misunderstanding in the Linux kernel manual 
> pages [3]).
> 
> Modify $JAVA_HOME/jre/lib/security/java.security to change 
> securerandom.source=file:/dev/random to securerandom.source=file:/dev/urandom
> 
> 
> [1] https://issues.apache.org/jira/browse/NIFI-3313 
> 
> [2] https://docs.oracle.com/cd/E13209_01/wlcp/wlss30/configwlss/jvmrand.html 
> 
> [3] http://www.2uo.de/myths-about-urandom/ 
> 
> 
> Andy LoPresto
> alopre...@apache.org 
> alopresto.apa...@gmail.com 
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Feb 15, 2017, at 1:29 AM, Arnaud G > > wrote:
>> 
>> Hi guys!
>> 
>> I'm trying to play with nifi (1.1.1) in a docker image. I tried different 
>> configuration (cluster, single node, secured, etc.), however whatever I try, 
>> Nifi takes forever to start (like 30-45 minutes). This not related to data 
>> as I observe this behavior even when I instantiate the docker image for the 
>> first time.
>> 
>> In the log it stops here:
>> 
>> nifi-bootstrap.log
>> 2017-02-14 08:52:34,624 INFO [NiFi Bootstrap Command Listener] 
>> org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for 
>> Bootstrap requests on port 46553
>> 
>> nifi-app.log
>> 2017-02-14 08:53:11,225 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader 
>> Loaded 121 properties from /opt/nifi/./conf/nifi.properties
>> 
>> and then wait for boostraping (if I set up debug log level)
>> 
>> Any idea what may cause this?
>> 
>> Thanks in advance!
>> 
>> AG
>> 
>> 
>> 
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Nifi taking forever to start

2017-02-15 Thread Andy LoPresto
Hi Arnaud,

I’m sorry you are having trouble getting NiFi going. We want to minimize any 
inconvenience and get you up and running quickly.

Are you by any chance running on a VM that does not have access to any physical 
inputs to generate entropy for secure random seeding? There is a known issue 
[1] (being worked on for the next release) where this can cause the application 
to block because insufficient entropy is available (without the physical 
inputs, there is not enough random data to properly seed secure operations).

I recommend you check if this the case (run this command in your terminal — if 
it hangs, this is the cause):

head -n 1 /dev/random
If it hangs, follow the instructions on this page [2] to modify the Java secure 
random source (ignore the warning that this is “less secure” — this is an urban 
legend propagated by a misunderstanding in the Linux kernel manual pages [3]).

Modify $JAVA_HOME/jre/lib/security/java.security to change 
securerandom.source=file:/dev/random to securerandom.source=file:/dev/urandom


[1] https://issues.apache.org/jira/browse/NIFI-3313 

[2] https://docs.oracle.com/cd/E13209_01/wlcp/wlss30/configwlss/jvmrand.html 

[3] http://www.2uo.de/myths-about-urandom/ 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Feb 15, 2017, at 1:29 AM, Arnaud G  wrote:
> 
> Hi guys!
> 
> I'm trying to play with nifi (1.1.1) in a docker image. I tried different 
> configuration (cluster, single node, secured, etc.), however whatever I try, 
> Nifi takes forever to start (like 30-45 minutes). This not related to data as 
> I observe this behavior even when I instantiate the docker image for the 
> first time.
> 
> In the log it stops here:
> 
> nifi-bootstrap.log
> 2017-02-14 08:52:34,624 INFO [NiFi Bootstrap Command Listener] 
> org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for 
> Bootstrap requests on port 46553
> 
> nifi-app.log
> 2017-02-14 08:53:11,225 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader 
> Loaded 121 properties from /opt/nifi/./conf/nifi.properties
> 
> and then wait for boostraping (if I set up debug log level)
> 
> Any idea what may cause this?
> 
> Thanks in advance!
> 
> AG
> 
> 
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Nifi taking forever to start

2017-02-15 Thread Arnaud G
Hi guys!

I'm trying to play with nifi (1.1.1) in a docker image. I tried different
configuration (cluster, single node, secured, etc.), however whatever I
try, Nifi takes forever to start (like 30-45 minutes). This not related to
data as I observe this behavior even when I instantiate the docker image
for the first time.

In the log it stops here:

nifi-bootstrap.log
2017-02-14 08:52:34,624 INFO [NiFi Bootstrap Command Listener]
org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for
Bootstrap requests on port 46553

nifi-app.log
2017-02-14 08:53:11,225 INFO [main]
o.a.nifi.properties.NiFiPropertiesLoader Loaded 121 properties from
/opt/nifi/./conf/nifi.properties

and then wait for boostraping (if I set up debug log level)

Any idea what may cause this?

Thanks in advance!

AG