FetchS3Object processor

2015-10-08 Thread Chakrader Dewaragatla
Nifi users  - FetchS3Object processor do not have option to specify Object file 
path, how do you set that ?

Eg : s3:

Thanks,
-Chakri

The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of the person(s) 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message.



Nifi & Spark receiver performance configuration

2015-10-08 Thread Aurélien DEHAY
Hello.



I’m doing some experimentations on Apache Nifi to see where we can use it.



One idea is to use nifi to feed a spark cluster. So I’m doing some simple test 
(GenerateFlowFile => spark output port and a simple word count on spark side.



I was pretty unhappy with the performance out of the box, so I looked on the 
net and found almost nothing.



So I looked at nifi.properties, and found that some of the following properties 
have a huge impact on how many messages / second were processed to Spark :



nifi.queue.swap.threshold=2

nifi.swap.in.period=1 sec

nifi.swap.in.threads=1

nifi.swap.out.period=1 sec

nifi.swap.out.threads=4


The documentation seems unclear on this point for output ports, is anyone have 
a pointer for me ?

Thanks.

Aurélien.


Re: FetchS3Object processor

2015-10-08 Thread Joe Skora
Chakri,

The Amazon docs explain here
 that
S3 doesn't support pathes within a bucket, but you can embed path-like
naming into the "Object Key" parameter to FetchS3Object.

For your example, you would use bucket="s3://" and
key="/" to incorporate the path as you store the object.

I hope that helps.

Regards,
Joe

On Thu, Oct 8, 2015 at 2:39 AM, Chakrader Dewaragatla <
chakrader.dewaraga...@lifelock.com> wrote:

> Nifi users  - FetchS3Object processor do not have option to specify Object
> file path, how do you set that ?
>
> Eg : s3:
>
> Thanks,
> -Chakri
> --
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> --
>


Re: Nifi 0.3.0 on java-1.8.0 (S3 processor error)

2015-10-08 Thread Mark Payne
Chakri,

Great, thanks for the update!

It looks like this is a known issue with the version of the AWS SDK that we are 
using. The
bug ticket is available at: https://github.com/aws/aws-sdk-java/issues/444 


I have submitted a bug ticket against NiFi to update to a new version of the 
AWS SDK
in order to resolve this issue, so that it will work with Java 8. The NiFi bug 
ticket is
available at: https://issues.apache.org/jira/browse/NIFI-1025 


Thanks!
-Mark


> On Oct 8, 2015, at 1:55 AM, Chakrader Dewaragatla 
>  wrote:
> 
> Downgraded to java 7u80 and s3 processor works.
> 
> Thanks,
> -Chakri
> 
> From: Chakrader Dewaragatla  >
> Date: Wednesday, October 7, 2015 at 5:46 PM
> To: "users@nifi.apache.org " 
> >
> Subject: Re: Nifi 0.3.0 on java-1.8.0 (S3 processor error)
> 
> Mark – Yes, I have the right date/time with ntp client configured.
> 
> Thanks,
> -Chakri
> 
> From: Mark Payne >
> Reply-To: "users@nifi.apache.org " 
> >
> Date: Wednesday, October 7, 2015 at 5:34 PM
> To: "users@nifi.apache.org " 
> >
> Subject: Re: Nifi 0.3.0 on java-1.8.0 (S3 processor error)
> 
> Chakri,
> 
> Can you check that your system's date/time is accurate? I have not tried this 
> with S3, but I know
> that Twitter sends back a pretty similar response when we use GetTwitter on a 
> node that has
> the wrong date/time. It has to do with the authentication protocol that is 
> used requiring an accurate
> timestamp.
> 
> Please advise if your system date/time is correct.
> 
> Thanks
> -Mark
> 
> 
>> On Oct 7, 2015, at 8:29 PM, Chakrader Dewaragatla 
>> > > wrote:
>> 
>> Any one notice Nifi S3 processor broken with latest java version ? 
>> (java-1.8.0-openjdk-1.8.0.60-14.b27)
>> S3 put error: 
>> 
>> 2015-10-08 00:28:30,745 ERROR [Timer-Driven Process Thread-6] 
>> o.a.nifi.processors.aws.s3.PutS3Object 
>> PutS3Object[id=1c0f0c5c-cbb9-4ed9-891c-0bd48ae69366] Failed to put 
>> StandardFlowFileRecord[uuid=4a2455b1-d47f-4d6e-9847-a93a5e66d7a5,claim=StandardContentClaim
>>  [resourceClaim=StandardResourceClaim[id=1444264110525-192, 
>> container=default, section=192], offset=0, 
>> length=1000],offset=0,name=c2.txt,size=1000] to Amazon S3 due to 
>> com.amazonaws.services.s3.model.AmazonS3Exception: AWS authentication 
>> requires a valid Date or x-amz-date header (Service: Amazon S3; Status Code: 
>> 403; Error Code: AccessDenied; Request ID: 254510F6EE5E85B5), S3 Extended 
>> Request ID: 
>> +Q5lfeKce1HMGUjw9BCp8lg8wmIy/eIHKH62xsWteJzKmqbsWRSeGLo4p8h6R1BLd5KIFXjKOO4=:
>>  com.amazonaws.services.s3.model.AmazonS3Exception: AWS authentication 
>> requires a valid Date or x-amz-date header (Service: Amazon S3; Status Code: 
>> 403; Error Code: AccessDenied; Request ID: 254510F6EE5E85B5), S3 Extended 
>> Request ID: 
>> +Q5lfeKce1HMGUjw9BCp8lg8wmIy/eIHKH62xsWteJzKmqbsWRSeGLo4p8h6R1BLd5KIFXjKOO4=
>> 
>> 
>> It works on other node with java version 
>> (java-1.8.0-openjdk-1.8.0.45-40.b14).
>> 
>> Thanks,
>> -Chakri
>>  
>> 
>> 
>> The information contained in this transmission may contain privileged and 
>> confidential information. It is intended only for the use of the person(s) 
>> named above. If you are not the intended recipient, you are hereby notified 
>> that any review, dissemination, distribution or duplication of this 
>> communication is strictly prohibited. If you are not the intended recipient, 
>> please contact the sender by reply email and destroy all copies of the 
>> original message.
> 
> The information contained in this transmission may contain privileged and 
> confidential information. It is intended only for the use of the person(s) 
> named above. If you are not the intended recipient, you are hereby notified 
> that any review, dissemination, distribution or duplication of this 
> communication is strictly prohibited. If you are not the intended recipient, 
> please contact the sender by reply email and destroy all copies of the 
> original message.



Re: Need help in nifi- flume processor

2015-10-08 Thread Bryan Bende
Hi Parul,

It is possible to deploy a custom Flume source/sink to NiFi, but due to the
way the Flume processors load the classes for the sources and sinks, the
jar you deploy to the lib directory also needs to include the other
dependencies your source/sink needs (or they each need to individually be
in lib/ directly).

So here is a sample project I created that makes a shaded jar:
https://github.com/bbende/my-flume-source

It will contain the custom source and following dependencies all in one jar:

org.apache.flume:my-flume-source:jar:1.0-SNAPSHOT
+- org.apache.flume:flume-ng-sdk:jar:1.6.0:compile
+- org.apache.flume:flume-ng-core:jar:1.6.0:compile
+- org.apache.flume:flume-ng-configuration:jar:1.6.0:compile
+- org.apache.flume:flume-ng-auth:jar:1.6.0:compile
  \- com.google.guava:guava:jar:11.0.2:compile
 \- com.google.code.findbugs:jsr305:jar:1.3.9:compile

I copied that to NiFi lib, restarted, created an ExecuteFlumeSource
processor with the following config:

Source Type = org.apache.flume.MySource
Agent Name = a1
Source Name = r1
Flume Configuration = a1.sources = r1

And I was getting the output in nifi/logs/nifi-bootstrap.log

Keep in mind that this could become risky because any classes found in the
lib directory would be accessible to all NARs in NiFi and would be found
before classes within a NAR because the parent is checked first during
class loading. This example isn't too risky because we are only bringing in
flume jars and one guava jar, but for example if another nar uses a
different version of guava this is going to cause a problem.

If you plan to use NiFi for the long term, it might be worth investing in
converting your custom Flume components to NiFi processors. We can help you
get started if you need any guidance going that route.

-Bryan


On Thu, Oct 8, 2015 at 2:30 AM, Parul Agrawal 
wrote:

> Hello Bryan,
>
> Thank you very much for your response.
>
> Is it possible to have customized flume source and sink in Nifi?
> I have my own customized source and sink? I followed below steps to add my
> own customized source but it did not work.
>
> 1) Created Maven project and added customized source. (flume.jar was
> created after this step)
> 2) Added flume.jar file to nifi-0.3.0/lib folder.
> 3) Added flume source processor with the below configuration
>
> Property   Value
> Source Type com.flume.source.Source
> Agent Name  a1
> Source Name k1.
>
> But I am getting the below error in Flume Source Processor.
> "Failed to run validation due to java.lang.NoClassDefFoundError :
> /org/apache/flume/PollableSource."
>
> Can you please help me in this regard. Any step/configuration I missed.
>
> Thanks and Regards,
> Parul
>
>
> On Wed, Oct 7, 2015 at 6:57 PM, Bryan Bende  wrote:
>
>> Hello,
>>
>> The NiFi Flume processors are for running Flume sources and sinks with in
>> NiFi. They don't communicate with an external Flume process.
>>
>> In your example you would need an ExecuteFlumeSource configured to run
>> the netcat source, connected to a ExecuteFlumeSink configured with the
>> logger.
>>
>> -Bryan
>>
>> On Wednesday, October 7, 2015, Parul Agrawal 
>> wrote:
>>
>>> Hi,
>>>
>>> I was trying to run Nifi Flume processor with the below mentioned
>>> details but not could bring it up.
>>>
>>> I already started flume with the sample configuration file
>>> =
>>> # example.conf: A single-node Flume configuration
>>>
>>> # Name the components on this agent
>>> a1.sources = r1
>>> a1.sinks = k1
>>> a1.channels = c1
>>>
>>> # Describe/configure the source
>>> a1.sources.r1.type = netcat
>>> a1.sources.r1.bind = localhost
>>> a1.sources.r1.port = 4
>>>
>>> # Describe the sink
>>> a1.sinks.k1.type = logger
>>>
>>> # Use a channel which buffers events in memory
>>> a1.channels.c1.type = memory
>>> a1.channels.c1.capacity = 1000
>>> a1.channels.c1.transactionCapacity = 100
>>>
>>> # Bind the source and sink to the channel
>>> a1.sources.r1.channels = c1
>>> a1.sinks.k1.channel = c1
>>> =
>>>
>>> Command used to start flume : $ bin/flume-ng agent --conf conf
>>> --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console
>>>
>>> In the Nifi browser of ExecuteFlumeSink following configuration was done:
>>> Property   Value
>>> Sink Type logger
>>> Agent Name  a1
>>> Sink Name k1.
>>>
>>> Event is sent to the flume using:
>>> $ telnet localhost 4
>>> Trying 127.0.0.1...
>>> Connected to localhost.localdomain (127.0.0.1).
>>> Escape character is '^]'.
>>> Hello world! 
>>> OK
>>>
>>> But I could not get any data in the nifi flume processor. Request your
>>> help in this.
>>> Do i need to change the example.conf file of flume so that Nifi Flume
>>> Sink should get the data.
>>>
>>> Thanks and Regards,
>>> Parul
>>>
>>
>>
>> --
>> Sent from Gmail Mobile
>>
>
>


Re: Nifi & Spark receiver performance configuration

2015-10-08 Thread Bryan Bende
Hello,

When you say you were unhappy with the performance, can you give some more
information about what was not performing well?

Was the NiFi Spark Receiver not pulling messages in fast enough and they
were queuing up in NiFi?
Was NiFi not producing messages as fast as you expected?
What kind of environment were you running this? All on a local machine for
testing?

-Bryan

On Thu, Oct 8, 2015 at 6:52 AM, Aurélien DEHAY 
wrote:

> Hello.
>
>
>
> I’m doing some experimentations on Apache Nifi to see where we can use it.
>
>
>
> One idea is to use nifi to feed a spark cluster. So I’m doing some simple
> test (GenerateFlowFile => spark output port and a simple word count on
> spark side.
>
>
>
> I was pretty unhappy with the performance out of the box, so I looked on
> the net and found almost nothing.
>
>
>
> So I looked at nifi.properties, and found that some of the following
> properties have a huge impact on how many messages / second were processed
> to Spark :
>
>
>
> nifi.queue.swap.threshold=2
>
> nifi.swap.in.period=1 sec
>
> nifi.swap.in.threads=1
>
> nifi.swap.out.period=1 sec
>
> nifi.swap.out.threads=4
>
>
>
> The documentation seems unclear on this point for output ports, is anyone
> have a pointer for me ?
>
>
>
> Thanks.
>
>
>
> Aurélien.
>


RE: Nifi & Spark receiver performance configuration

2015-10-08 Thread Aurélien DEHAY
Hello.


I'm testing on a VM 8vCPU (E5606 2.13Ghz) / 16Go.


I just have a GenerateFLowFIle which send data to an output port for Spark. 
Here, the performance is very good, I can generate a huge number of flow files.

My spark job is configured as local[4], and use 3 receivers. It just doing a 
simple word count on the stream, with a streaming context at 2 seconds.

My test is simple, I generate about 200k messages (about 200MB) from 
generateFLowFile, not starting the output port, in order  to queue  the data. 
Then I stop the processor, and start the output port and my spark job (see 
attached file) with:
 bin/spark-shell --master local[4] --packages 
"org.apache.nifi:nifi-spark-receiver:0.3.0" -i nifi.scala


with:
nifi.queue.swap.threshold=2
nifi.swap.in.period=10 sec
nifi.swap.in.threads=1
nifi.swap.out.period=1 sec
nifi.swap.out.threads=4

I get the result in 2015-10-08 17_07_04 screenshot file.



With
nifi.queue.swap.threshold=2
nifi.swap.in.period=1 sec
nifi.swap.in.threads=1
nifi.swap.out.period=1 sec
nifi.swap.out.threads=4

results in 2015-10-08 17_13_28 screenshot.


With
nifi.queue.swap.threshold=20
nifi.swap.in.period=1 sec
nifi.swap.in.threads=1
nifi.swap.out.period=1 sec
nifi.swap.out.threads=4

I see result in 2015-10-08 17_21_09 screenshot.

Many questions:
- Why nifi limits the number of flow file sent to at most the swap threshold?
- Why nifi waits swap.in.period to send batch of flowfile?

It's not I'm not happy with nifi perf and/or the spark receiver, but the 
configuration or doc should be more clear on tuning.

Regards.

De : Bryan Bende 

Envoyé : jeudi 8 octobre 2015 16:52
À : users@nifi.apache.org
Objet : Re: Nifi & Spark receiver performance configuration

Hello,

When you say you were unhappy with the performance, can you give some more 
information about what was not performing well?

Was the NiFi Spark Receiver not pulling messages in fast enough and they were 
queuing up in NiFi?
Was NiFi not producing messages as fast as you expected?
What kind of environment were you running this? All on a local machine for 
testing?

-Bryan

On Thu, Oct 8, 2015 at 6:52 AM, Aurélien DEHAY 
> wrote:

Hello.



I'm doing some experimentations on Apache Nifi to see where we can use it.



One idea is to use nifi to feed a spark cluster. So I'm doing some simple test 
(GenerateFlowFile => spark output port and a simple word count on spark side.



I was pretty unhappy with the performance out of the box, so I looked on the 
net and found almost nothing.



So I looked at nifi.properties, and found that some of the following properties 
have a huge impact on how many messages / second were processed to Spark :



nifi.queue.swap.threshold=2

nifi.swap.in.period=1 sec

nifi.swap.in.threads=1

nifi.swap.out.period=1 sec

nifi.swap.out.threads=4


The documentation seems unclear on this point for output ports, is anyone have 
a pointer for me ?

Thanks.

Aurélien.



nifi.scala
Description: nifi.scala


Re: Need help in nifi- flume processor

2015-10-08 Thread Joey Echeverria
> If you plan to use NiFi for the long term, it might be worth investing in 
> converting your custom Flume components to NiFi processors. We can help you 
> get started if you need any guidance going that route.

+1. Running Flume sources/sinks is meant as a transition step. It's
really useful if you have a complex Flume flow and want to migrate
only parts of it over to NiFi at a time. I would port any custom
sources and sinks to NiFi once you knew that it would meet your needs
well. NiFi has a lot of documentation on writing processors and the
concepts map pretty well if you're already familiar with Flume's
execution model.

-Joey

On Thu, Oct 8, 2015 at 9:48 AM, Bryan Bende  wrote:
>
> Hi Parul,
>
> It is possible to deploy a custom Flume source/sink to NiFi, but due to the 
> way the Flume processors load the classes for the sources and sinks, the jar 
> you deploy to the lib directory also needs to include the other dependencies 
> your source/sink needs (or they each need to individually be in lib/ 
> directly).
>
> So here is a sample project I created that makes a shaded jar:
> https://github.com/bbende/my-flume-source
>
> It will contain the custom source and following dependencies all in one jar:
>
> org.apache.flume:my-flume-source:jar:1.0-SNAPSHOT
> +- org.apache.flume:flume-ng-sdk:jar:1.6.0:compile
> +- org.apache.flume:flume-ng-core:jar:1.6.0:compile
> +- org.apache.flume:flume-ng-configuration:jar:1.6.0:compile
> +- org.apache.flume:flume-ng-auth:jar:1.6.0:compile
>   \- com.google.guava:guava:jar:11.0.2:compile
>  \- com.google.code.findbugs:jsr305:jar:1.3.9:compile
>
> I copied that to NiFi lib, restarted, created an ExecuteFlumeSource processor 
> with the following config:
>
> Source Type = org.apache.flume.MySource
> Agent Name = a1
> Source Name = r1
> Flume Configuration = a1.sources = r1
>
> And I was getting the output in nifi/logs/nifi-bootstrap.log
>
> Keep in mind that this could become risky because any classes found in the 
> lib directory would be accessible to all NARs in NiFi and would be found 
> before classes within a NAR because the parent is checked first during class 
> loading. This example isn't too risky because we are only bringing in flume 
> jars and one guava jar, but for example if another nar uses a different 
> version of guava this is going to cause a problem.
>
> If you plan to use NiFi for the long term, it might be worth investing in 
> converting your custom Flume components to NiFi processors. We can help you 
> get started if you need any guidance going that route.
>
> -Bryan
>
>
> On Thu, Oct 8, 2015 at 2:30 AM, Parul Agrawal  
> wrote:
>>
>> Hello Bryan,
>>
>> Thank you very much for your response.
>>
>> Is it possible to have customized flume source and sink in Nifi?
>> I have my own customized source and sink? I followed below steps to add my 
>> own customized source but it did not work.
>>
>> 1) Created Maven project and added customized source. (flume.jar was created 
>> after this step)
>> 2) Added flume.jar file to nifi-0.3.0/lib folder.
>> 3) Added flume source processor with the below configuration
>>
>> Property   Value
>> Source Type com.flume.source.Source
>> Agent Name  a1
>> Source Name k1.
>>
>> But I am getting the below error in Flume Source Processor.
>> "Failed to run validation due to java.lang.NoClassDefFoundError : 
>> /org/apache/flume/PollableSource."
>>
>> Can you please help me in this regard. Any step/configuration I missed.
>>
>> Thanks and Regards,
>> Parul
>>
>>
>> On Wed, Oct 7, 2015 at 6:57 PM, Bryan Bende  wrote:
>>>
>>> Hello,
>>>
>>> The NiFi Flume processors are for running Flume sources and sinks with in 
>>> NiFi. They don't communicate with an external Flume process.
>>>
>>> In your example you would need an ExecuteFlumeSource configured to run the 
>>> netcat source, connected to a ExecuteFlumeSink configured with the logger.
>>>
>>> -Bryan
>>>
>>> On Wednesday, October 7, 2015, Parul Agrawal  
>>> wrote:

 Hi,

 I was trying to run Nifi Flume processor with the below mentioned
 details but not could bring it up.

 I already started flume with the sample configuration file
 =
 # example.conf: A single-node Flume configuration

 # Name the components on this agent
 a1.sources = r1
 a1.sinks = k1
 a1.channels = c1

 # Describe/configure the source
 a1.sources.r1.type = netcat
 a1.sources.r1.bind = localhost
 a1.sources.r1.port = 4

 # Describe the sink
 a1.sinks.k1.type = logger

 # Use a channel which buffers events in memory
 a1.channels.c1.type = memory
 a1.channels.c1.capacity = 1000
 a1.channels.c1.transactionCapacity = 100

 # Bind the source and sink to the channel
 a1.sources.r1.channels = c1
 a1.sinks.k1.channel = c1

Re: logging GetFile processes?

2015-10-08 Thread Joe Witt
Ron,

Yep makes sense.  We'll try to put together a flow template for you to
check out.  Basic gist is

 - > ReplaceText -> PutMongo

In the ReplaceText processor you can use the expression language to
make new content which would be a JSON document containing the
filename and entry time and whatever other values you'd like.  Then
drive those new JSON documents into Mongo.

Does that get you on the path or would the template help?

Thanks
Joe

On Thu, Oct 8, 2015 at 12:59 PM, Ron Sawyer  wrote:
> Would prefer to log them in a mongodb collection, just the file name and date 
> processed is all.
>
> I know this is being logged in the app log I think but need something that is 
> publicly viewable so others can know files are in fact being picked up and 
> processed.
>
>
> v/r,
> Ron Sawyer
>
> 
> On Thu, 10/8/15, David Wynne  wrote:
>
>  Subject: Re: logging GetFile processes?
>  To: "Ron Sawyer" 
>  Cc: "users@nifi.apache.org" 
>  Date: Thursday, October 8, 2015, 12:43 PM
>
>  Ron,
>  How
>  do you want to keep the file names? In a file on disk? In a
>  database?
>
>
>
>
>  David.
>
>
>
>  On 10/8/15, 12:38 PM, "Ron Sawyer"
>  
>  wrote:
>
>  >Hopefully this
>  isn't asking too much, but I have several getFile
>  processes running and would like to log the file names and
>  times they process into a mongo collection to easily track
>  what and when files have been handled, can someone tell how
>  to do this or point me in a direction?
>  >
>  >Thanks.
>  >
>  >v/r,
>  >Ron Sawyer
>  >
>


Re: logging GetFile processes?

2015-10-08 Thread Joe Witt
"what would be the flow to write the filename and date to a text file?"

You can use ReplaceText to write out the data in whatever text
oriented format you like (CSV, JSON, etc..).

Then it is a matter of getting that object to Mongo.  I don't know
Mongo well enough myself to know the best options there.

Thanks
Joe

On Thu, Oct 8, 2015 at 1:45 PM, Ron Sawyer  wrote:
> Joe,
> it appears that the version I'm using doesn't currently support putMongo.  So 
> until I can fix that, what would be the flow to write the filename and date 
> to a text file?   ExecuteScript?
>
> v/r,
> Ron Sawyer
>
> 
> On Thu, 10/8/15, Joe Witt  wrote:
>
>  Subject: Re: logging GetFile processes?
>  To: users@nifi.apache.org
>  Date: Thursday, October 8, 2015, 1:04 PM
>
>  Ron,
>
>  Yep makes sense.  We'll try to put
>  together a flow template for you to
>  check
>  out.  Basic gist is
>
>   - > ReplaceText ->
>  PutMongo
>
>  In the ReplaceText
>  processor you can use the expression language to
>  make new content which would be a JSON document
>  containing the
>  filename and entry time and
>  whatever other values you'd like.  Then
>  drive those new JSON documents into Mongo.
>
>  Does that get you on the path
>  or would the template help?
>
>  Thanks
>  Joe
>
>  On
>  Thu, Oct 8, 2015 at 12:59 PM, Ron Sawyer 
>  wrote:
>  > Would prefer to log them in a
>  mongodb collection, just the file name and date processed is
>  all.
>  >
>  > I know this
>  is being logged in the app log I think but need something
>  that is publicly viewable so others can know files are in
>  fact being picked up and processed.
>  >
>  >
>  > v/r,
>  > Ron Sawyer
>  >
>  >
>  
>  > On Thu, 10/8/15, David Wynne 
>  wrote:
>  >
>  >  Subject:
>  Re: logging GetFile processes?
>  >  To:
>  "Ron Sawyer" 
>  >  Cc: "users@nifi.apache.org"
>  
>  >  Date: Thursday, October 8, 2015, 12:43
>  PM
>  >
>  >  Ron,
>  >  How
>  >  do you want
>  to keep the file names? In a file on disk? In a
>  >  database?
>  >
>  >
>  >
>  >
>  >
>  David.
>  >
>  >
>  >
>  >  On 10/8/15, 12:38
>  PM, "Ron Sawyer"
>  >  
>  >  wrote:
>  >
>  >  >Hopefully this
>  >  isn't asking too much, but I have
>  several getFile
>  >  processes running and
>  would like to log the file names and
>  >
>  times they process into a mongo collection to easily
>  track
>  >  what and when files have been
>  handled, can someone tell how
>  >  to do
>  this or point me in a direction?
>  >
>  >
>  >  >Thanks.
>  >  >
>  >  >v/r,
>  >  >Ron Sawyer
>  >
>  >
>  >
>


Re: logging GetFile processes?

2015-10-08 Thread Ron Sawyer
Yep, that's it, thank you!!!  

v/r,
Ron Sawyer


On Thu, 10/8/15, Joe Witt  wrote:

 Subject: Re: logging GetFile processes?
 To: users@nifi.apache.org
 Date: Thursday, October 8, 2015, 1:04 PM
 
 Ron,
 
 Yep makes sense.  We'll try to put
 together a flow template for you to
 check
 out.  Basic gist is
 
  - > ReplaceText ->
 PutMongo
 
 In the ReplaceText
 processor you can use the expression language to
 make new content which would be a JSON document
 containing the
 filename and entry time and
 whatever other values you'd like.  Then
 drive those new JSON documents into Mongo.
 
 Does that get you on the path
 or would the template help?
 
 Thanks
 Joe
 
 On
 Thu, Oct 8, 2015 at 12:59 PM, Ron Sawyer 
 wrote:
 > Would prefer to log them in a
 mongodb collection, just the file name and date processed is
 all.
 >
 > I know this
 is being logged in the app log I think but need something
 that is publicly viewable so others can know files are in
 fact being picked up and processed.
 >
 >
 > v/r,
 > Ron Sawyer
 >
 >
 
 > On Thu, 10/8/15, David Wynne 
 wrote:
 >
 >  Subject:
 Re: logging GetFile processes?
 >  To:
 "Ron Sawyer" 
 >  Cc: "users@nifi.apache.org"
 
 >  Date: Thursday, October 8, 2015, 12:43
 PM
 >
 >  Ron,
 >  How
 >  do you want
 to keep the file names? In a file on disk? In a
 >  database?
 >
 >
 >
 >
 >         
 David.
 >
 >
 >
 >  On 10/8/15, 12:38
 PM, "Ron Sawyer"
 >  
 >  wrote:
 >
 >  >Hopefully this
 >  isn't asking too much, but I have
 several getFile
 >  processes running and
 would like to log the file names and
 > 
 times they process into a mongo collection to easily
 track
 >  what and when files have been
 handled, can someone tell how
 >  to do
 this or point me in a direction?
 > 
 >
 >  >Thanks.
 >  >
 >  >v/r,
 >  >Ron Sawyer
 > 
 >
 >



Re: logging GetFile processes?

2015-10-08 Thread Ron Sawyer
Joe,
it appears that the version I'm using doesn't currently support putMongo.  So 
until I can fix that, what would be the flow to write the filename and date to 
a text file?   ExecuteScript?

v/r,
Ron Sawyer


On Thu, 10/8/15, Joe Witt  wrote:

 Subject: Re: logging GetFile processes?
 To: users@nifi.apache.org
 Date: Thursday, October 8, 2015, 1:04 PM
 
 Ron,
 
 Yep makes sense.  We'll try to put
 together a flow template for you to
 check
 out.  Basic gist is
 
  - > ReplaceText ->
 PutMongo
 
 In the ReplaceText
 processor you can use the expression language to
 make new content which would be a JSON document
 containing the
 filename and entry time and
 whatever other values you'd like.  Then
 drive those new JSON documents into Mongo.
 
 Does that get you on the path
 or would the template help?
 
 Thanks
 Joe
 
 On
 Thu, Oct 8, 2015 at 12:59 PM, Ron Sawyer 
 wrote:
 > Would prefer to log them in a
 mongodb collection, just the file name and date processed is
 all.
 >
 > I know this
 is being logged in the app log I think but need something
 that is publicly viewable so others can know files are in
 fact being picked up and processed.
 >
 >
 > v/r,
 > Ron Sawyer
 >
 >
 
 > On Thu, 10/8/15, David Wynne 
 wrote:
 >
 >  Subject:
 Re: logging GetFile processes?
 >  To:
 "Ron Sawyer" 
 >  Cc: "users@nifi.apache.org"
 
 >  Date: Thursday, October 8, 2015, 12:43
 PM
 >
 >  Ron,
 >  How
 >  do you want
 to keep the file names? In a file on disk? In a
 >  database?
 >
 >
 >
 >
 >         
 David.
 >
 >
 >
 >  On 10/8/15, 12:38
 PM, "Ron Sawyer"
 >  
 >  wrote:
 >
 >  >Hopefully this
 >  isn't asking too much, but I have
 several getFile
 >  processes running and
 would like to log the file names and
 > 
 times they process into a mongo collection to easily
 track
 >  what and when files have been
 handled, can someone tell how
 >  to do
 this or point me in a direction?
 > 
 >
 >  >Thanks.
 >  >
 >  >v/r,
 >  >Ron Sawyer
 > 
 >
 >



Re: S3 processor with Proxy option - Feature request

2015-10-08 Thread Joe Witt
Chakri,

That sounds great.  Are you interested in contributing back that
modification that meets the need?

We have a contributor guide that should be of help and we're happy to
help otherwise.

Thanks
Joe

On Thu, Oct 8, 2015 at 5:40 PM, Chakrader Dewaragatla
 wrote:
> Nifi Team – It would be helpful to have Proxy options in S3 processors
> (Fetch/Put). We modified nifi aws module to support Proxy as a work around
> and it works like charm.
>
> Thanks,
> -Chakri
> 
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended recipient,
> please contact the sender by reply email and destroy all copies of the
> original message.
> 


Re: S3 processor with Proxy option - Feature request

2015-10-08 Thread Chakrader Dewaragatla
Thanks Joe, I will let my engineering team know.

On 10/8/15, 2:51 PM, "Joe Witt"  wrote:

>Chakri,
>
>That sounds great.  Are you interested in contributing back that
>modification that meets the need?
>
>We have a contributor guide that should be of help and we're happy to
>help otherwise.
>
>Thanks
>Joe
>
>On Thu, Oct 8, 2015 at 5:40 PM, Chakrader Dewaragatla
> wrote:
>> Nifi Team ­ It would be helpful to have Proxy options in S3 processors
>> (Fetch/Put). We modified nifi aws module to support Proxy as a work
>>around
>> and it works like charm.
>>
>> Thanks,
>> -Chakri
>> 
>> The information contained in this transmission may contain privileged
>>and
>> confidential information. It is intended only for the use of the
>>person(s)
>> named above. If you are not the intended recipient, you are hereby
>>notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended
>>recipient,
>> please contact the sender by reply email and destroy all copies of the
>> original message.
>> 


 The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of the person(s) 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message.



Fwd: GetKafka Processor Issue

2015-10-08 Thread indus well
Hello NiFi Team:

This issue keeps happening. Please see the attached log file for the full
stack dump.

Thanks,

Indus

On Sun, Oct 4, 2015 at 11:31 PM, indus well  wrote:

> Thanks, Joe. Not sure what happened, but it appeared to be working when I
> turned both on. I will keep monitoring the flows and will get the stack
> dump for you if it happens again.
>
> On Sun, Oct 4, 2015 at 9:34 PM, Joe Witt  wrote:
>
>> Indus,
>>
>> Can you please have them both running to get it back into the bad
>> state then run /bin/nifi.sh dump
>>
>> This will write out the stack dump to the application log.  Can you
>> send us this?  The stack trace could be quite telling.
>>
>> Thanks
>> Joe
>>
>> On Sun, Oct 4, 2015 at 10:30 PM, indus well  wrote:
>> > Hello NiFi Experts:
>> >
>> > I have two workflows using GetKafka processors to consume two different
>> > topics from a single node Kafka server and encountering an issue. If I
>> turn
>> > on both workflows, no message from the two topics would flow through.
>> All
>> > statuses show 0 byte or no activity. However, as soon as I stop one
>> > workflow, messages would start streaming in. So I currently can only
>> consume
>> > one Kafka topic at a time.
>> >
>> > Have you seen this issue? Am I missing something in the settings? Please
>> > advise.
>> >
>> > Thanks,
>> >
>> > Indus Well
>> >
>>
>
>


nifi-bootstrap.log
Description: Binary data


Re: Fwd: GetKafka Processor Issue

2015-10-08 Thread Joe Witt
Indus

Sorry for the slow response.  Is there anyone that can look into this?

Thanks
Joe
On Oct 8, 2015 8:37 PM, "indus well"  wrote:

> Hello NiFi Team:
>
> This issue keeps happening. Please see the attached log file for the full
> stack dump.
>
> Thanks,
>
> Indus
>
> On Sun, Oct 4, 2015 at 11:31 PM, indus well  wrote:
>
>> Thanks, Joe. Not sure what happened, but it appeared to be working when I
>> turned both on. I will keep monitoring the flows and will get the stack
>> dump for you if it happens again.
>>
>> On Sun, Oct 4, 2015 at 9:34 PM, Joe Witt  wrote:
>>
>>> Indus,
>>>
>>> Can you please have them both running to get it back into the bad
>>> state then run /bin/nifi.sh dump
>>>
>>> This will write out the stack dump to the application log.  Can you
>>> send us this?  The stack trace could be quite telling.
>>>
>>> Thanks
>>> Joe
>>>
>>> On Sun, Oct 4, 2015 at 10:30 PM, indus well  wrote:
>>> > Hello NiFi Experts:
>>> >
>>> > I have two workflows using GetKafka processors to consume two different
>>> > topics from a single node Kafka server and encountering an issue. If I
>>> turn
>>> > on both workflows, no message from the two topics would flow through.
>>> All
>>> > statuses show 0 byte or no activity. However, as soon as I stop one
>>> > workflow, messages would start streaming in. So I currently can only
>>> consume
>>> > one Kafka topic at a time.
>>> >
>>> > Have you seen this issue? Am I missing something in the settings?
>>> Please
>>> > advise.
>>> >
>>> > Thanks,
>>> >
>>> > Indus Well
>>> >
>>>
>>
>>
>
>


Re: FetchS3Object processor

2015-10-08 Thread Aldrin Piri
Chakri,

The FetchS3Object processor does not act as a source directly and is
instead driven by incoming FlowFiles.  The use case was that one could
utilize Expression Language to interact with buckets and paths in a dynamic
fashion.  Alternatively, if you are looking to just grab a specific file,
you can use GenerateFlowFile to "trigger" the FetchS3Object with its static
configuration.

On Thu, Oct 8, 2015 at 1:45 PM, Chakrader Dewaragatla <
chakrader.dewaraga...@lifelock.com> wrote:

> Thanks Joe.
> I have S3fetch and putfile processors setup, files are not fetching
> despite files exist on s3. Am I missing anything ? Does S3fetch processor
> need any additional processor to feed data ?
>
> Thanks,
> -Chakri
>
>
> From: Joe Skora 
> Reply-To: "users@nifi.apache.org" 
> Date: Thursday, October 8, 2015 at 3:38 AM
> To: "users@nifi.apache.org" 
> Subject: Re: FetchS3Object processor
>
> Chakri,
>
> The Amazon docs explain here
> 
> that S3 doesn't support pathes within a bucket, but you can embed path-like
> naming into the "Object Key" parameter to FetchS3Object.
>
> For your example, you would use bucket="s3://" and
> key="/" to incorporate the path as you store the object.
>
> I hope that helps.
>
> Regards,
> Joe
>
> On Thu, Oct 8, 2015 at 2:39 AM, Chakrader Dewaragatla <
> chakrader.dewaraga...@lifelock.com> wrote:
>
>> Nifi users  - FetchS3Object processor do not have option to specify
>> Object file path, how do you set that ?
>>
>> Eg : s3:
>>
>> Thanks,
>> -Chakri
>> --
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended
>> recipient, please contact the sender by reply email and destroy all copies
>> of the original message.
>> --
>>
>
> --
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> --
>


Re: FetchS3Object processor

2015-10-08 Thread Chakrader Dewaragatla
Thanks Aldrin. GenerateFlowFile helped to trigger.

Thanks,
-Chakri

From: Aldrin Piri >
Reply-To: "users@nifi.apache.org" 
>
Date: Thursday, October 8, 2015 at 11:22 AM
To: "users@nifi.apache.org" 
>
Subject: Re: FetchS3Object processor

Chakri,

The FetchS3Object processor does not act as a source directly and is instead 
driven by incoming FlowFiles.  The use case was that one could utilize 
Expression Language to interact with buckets and paths in a dynamic fashion.  
Alternatively, if you are looking to just grab a specific file, you can use 
GenerateFlowFile to "trigger" the FetchS3Object with its static configuration.

On Thu, Oct 8, 2015 at 1:45 PM, Chakrader Dewaragatla 
> 
wrote:
Thanks Joe.
I have S3fetch and putfile processors setup, files are not fetching despite 
files exist on s3. Am I missing anything ? Does S3fetch processor need any 
additional processor to feed data ?

Thanks,
-Chakri


From: Joe Skora >
Reply-To: "users@nifi.apache.org" 
>
Date: Thursday, October 8, 2015 at 3:38 AM
To: "users@nifi.apache.org" 
>
Subject: Re: FetchS3Object processor

Chakri,

The Amazon docs explain 
here that 
S3 doesn't support pathes within a bucket, but you can embed path-like naming 
into the "Object Key" parameter to FetchS3Object.

For your example, you would use bucket="s3://" and 
key="/" to incorporate the path as you store the object.

I hope that helps.

Regards,
Joe

On Thu, Oct 8, 2015 at 2:39 AM, Chakrader Dewaragatla 
> 
wrote:
Nifi users  - FetchS3Object processor do not have option to specify Object file 
path, how do you set that ?

Eg : s3:

Thanks,
-Chakri

The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of the person(s) 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message.



The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of the person(s) 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message.



The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of the person(s) 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message.



Re: FetchS3Object processor

2015-10-08 Thread Joe Skora
There is already a ticket (NIFI-840
) in the hopper to create a
ListS3Objects processor that can track bucket contents and trigger
FetchS3Object.

In the meantime, if you want it triggered automatically (as opposed to
generate flow file) you may be able to accomplish something similar with a
combination of HTTP and XML processors.

On Thu, Oct 8, 2015 at 2:32 PM, Chakrader Dewaragatla <
chakrader.dewaraga...@lifelock.com> wrote:

> Thanks Aldrin. GenerateFlowFile helped to trigger.
>
> Thanks,
> -Chakri
>
> From: Aldrin Piri 
> Reply-To: "users@nifi.apache.org" 
> Date: Thursday, October 8, 2015 at 11:22 AM
>
> To: "users@nifi.apache.org" 
> Subject: Re: FetchS3Object processor
>
> Chakri,
>
> The FetchS3Object processor does not act as a source directly and is
> instead driven by incoming FlowFiles.  The use case was that one could
> utilize Expression Language to interact with buckets and paths in a dynamic
> fashion.  Alternatively, if you are looking to just grab a specific file,
> you can use GenerateFlowFile to "trigger" the FetchS3Object with its static
> configuration.
>
> On Thu, Oct 8, 2015 at 1:45 PM, Chakrader Dewaragatla <
> chakrader.dewaraga...@lifelock.com> wrote:
>
>> Thanks Joe.
>> I have S3fetch and putfile processors setup, files are not fetching
>> despite files exist on s3. Am I missing anything ? Does S3fetch processor
>> need any additional processor to feed data ?
>>
>> Thanks,
>> -Chakri
>>
>>
>> From: Joe Skora 
>> Reply-To: "users@nifi.apache.org" 
>> Date: Thursday, October 8, 2015 at 3:38 AM
>> To: "users@nifi.apache.org" 
>> Subject: Re: FetchS3Object processor
>>
>> Chakri,
>>
>> The Amazon docs explain here
>> 
>> that S3 doesn't support pathes within a bucket, but you can embed path-like
>> naming into the "Object Key" parameter to FetchS3Object.
>>
>> For your example, you would use bucket="s3://" and
>> key="/" to incorporate the path as you store the object.
>>
>> I hope that helps.
>>
>> Regards,
>> Joe
>>
>> On Thu, Oct 8, 2015 at 2:39 AM, Chakrader Dewaragatla <
>> chakrader.dewaraga...@lifelock.com> wrote:
>>
>>> Nifi users  - FetchS3Object processor do not have option to specify
>>> Object file path, how do you set that ?
>>>
>>> Eg : s3:
>>>
>>> Thanks,
>>> -Chakri
>>> --
>>> The information contained in this transmission may contain privileged
>>> and confidential information. It is intended only for the use of the
>>> person(s) named above. If you are not the intended recipient, you are
>>> hereby notified that any review, dissemination, distribution or duplication
>>> of this communication is strictly prohibited. If you are not the intended
>>> recipient, please contact the sender by reply email and destroy all copies
>>> of the original message.
>>> --
>>>
>>
>> --
>> The information contained in this transmission may contain privileged and
>> confidential information. It is intended only for the use of the person(s)
>> named above. If you are not the intended recipient, you are hereby notified
>> that any review, dissemination, distribution or duplication of this
>> communication is strictly prohibited. If you are not the intended
>> recipient, please contact the sender by reply email and destroy all copies
>> of the original message.
>> --
>>
>
> --
> The information contained in this transmission may contain privileged and
> confidential information. It is intended only for the use of the person(s)
> named above. If you are not the intended recipient, you are hereby notified
> that any review, dissemination, distribution or duplication of this
> communication is strictly prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> --
>


Re: FetchS3Object processor

2015-10-08 Thread Chakrader Dewaragatla
Thanks Joe, I was about to request a feature like this (ListS3Objects).

From: Joe Skora >
Reply-To: "users@nifi.apache.org" 
>
Date: Thursday, October 8, 2015 at 11:39 AM
To: "users@nifi.apache.org" 
>
Subject: Re: FetchS3Object processor

There is already a ticket 
(NIFI-840) in the hopper to 
create a ListS3Objects processor that can track bucket contents and trigger 
FetchS3Object.

In the meantime, if you want it triggered automatically (as opposed to generate 
flow file) you may be able to accomplish something similar with a combination 
of HTTP and XML processors.

On Thu, Oct 8, 2015 at 2:32 PM, Chakrader Dewaragatla 
> 
wrote:
Thanks Aldrin. GenerateFlowFile helped to trigger.

Thanks,
-Chakri

From: Aldrin Piri >
Reply-To: "users@nifi.apache.org" 
>
Date: Thursday, October 8, 2015 at 11:22 AM

To: "users@nifi.apache.org" 
>
Subject: Re: FetchS3Object processor

Chakri,

The FetchS3Object processor does not act as a source directly and is instead 
driven by incoming FlowFiles.  The use case was that one could utilize 
Expression Language to interact with buckets and paths in a dynamic fashion.  
Alternatively, if you are looking to just grab a specific file, you can use 
GenerateFlowFile to "trigger" the FetchS3Object with its static configuration.

On Thu, Oct 8, 2015 at 1:45 PM, Chakrader Dewaragatla 
> 
wrote:
Thanks Joe.
I have S3fetch and putfile processors setup, files are not fetching despite 
files exist on s3. Am I missing anything ? Does S3fetch processor need any 
additional processor to feed data ?

Thanks,
-Chakri


From: Joe Skora >
Reply-To: "users@nifi.apache.org" 
>
Date: Thursday, October 8, 2015 at 3:38 AM
To: "users@nifi.apache.org" 
>
Subject: Re: FetchS3Object processor

Chakri,

The Amazon docs explain 
here that 
S3 doesn't support pathes within a bucket, but you can embed path-like naming 
into the "Object Key" parameter to FetchS3Object.

For your example, you would use bucket="s3://" and 
key="/" to incorporate the path as you store the object.

I hope that helps.

Regards,
Joe

On Thu, Oct 8, 2015 at 2:39 AM, Chakrader Dewaragatla 
> 
wrote:
Nifi users  - FetchS3Object processor do not have option to specify Object file 
path, how do you set that ?

Eg : s3:

Thanks,
-Chakri

The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of the person(s) 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message.



The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of the person(s) 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message.



The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of the person(s) 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message.



The information contained in this transmission may contain privileged and 
confidential information. It is intended only for the use of 

Re: Nifi & Spark receiver performance configuration

2015-10-08 Thread Mark Payne
Aurelien,

The way that swapping works in NiFi is that when the number of FlowFiles in a 
particular queue builds up
past a certain point, NiFi will write those files to disk and drop them from 
the Java heap in order to avoid
running out of heap space. Then, when the number of FlowFiles in the queue 
drops below a certain number,
those swap files are swapped back in so that the queue can start serving them 
up again.

Unfortunately, the current implementation certainly needs some work. Right now, 
this happens in a background
thread, so when you see only 20,000 FlowFiles being sent, that's because 20,000 
FlowFiles are in memory,
and then it is waiting for the background thread to swap those back in.

This is certainly something that we want to address, so that rather than having 
the background thread doing this,
the queue itself will be responsible for pulling those back in, and that should 
alleviate this issue. We've just not
yet gotten to the point of being able to reimplement this yet.

Thanks
-Mark


> On Oct 8, 2015, at 11:22 AM, Aurélien DEHAY  wrote:
> 
> Hello.
> 
> I'm testing on a VM 8vCPU (E5606 2.13Ghz) / 16Go.
> 
> I just have a GenerateFLowFIle which send data to an output port for Spark. 
> Here, the performance is very good, I can generate a huge number of flow 
> files.
> 
> My spark job is configured as local[4], and use 3 receivers. It just doing a 
> simple word count on the stream, with a streaming context at 2 seconds.
> 
> My test is simple, I generate about 200k messages (about 200MB) from 
> generateFLowFile, not starting the output port, in order  to queue  the data. 
> Then I stop the processor, and start the output port and my spark job (see 
> attached file) with:
>  bin/spark-shell --master local[4] --packages 
> "org.apache.nifi:nifi-spark-receiver:0.3.0" -i nifi.scala
> 
> 
> with:
> nifi.queue.swap.threshold=2
> nifi.swap.in.period=10 sec
> nifi.swap.in.threads=1
> nifi.swap.out.period=1 sec
> nifi.swap.out.threads=4
> 
> I get the result in 2015-10-08 17_07_04 screenshot file.
> 
> 
> 
> With
> nifi.queue.swap.threshold=2
> nifi.swap.in.period=1 sec
> nifi.swap.in.threads=1
> nifi.swap.out.period=1 sec
> nifi.swap.out.threads=4
> 
> results in 2015-10-08 17_13_28 screenshot.
> 
> 
> With
> nifi.queue.swap.threshold=20
> nifi.swap.in.period=1 sec
> nifi.swap.in.threads=1
> nifi.swap.out.period=1 sec
> nifi.swap.out.threads=4
> 
> I see result in 2015-10-08 17_21_09 screenshot.
> 
> Many questions:
> - Why nifi limits the number of flow file sent to at most the swap threshold?
> - Why nifi waits swap.in.period to send batch of flowfile?
> 
> It's not I'm not happy with nifi perf and/or the spark receiver, but the 
> configuration or doc should be more clear on tuning.
> 
> Regards.
> De : Bryan Bende 
> 
> Envoyé : jeudi 8 octobre 2015 16:52
> À : users@nifi.apache.org
> Objet : Re: Nifi & Spark receiver performance configuration
>  
> Hello,
> 
> When you say you were unhappy with the performance, can you give some more 
> information about what was not performing well?
> 
> Was the NiFi Spark Receiver not pulling messages in fast enough and they were 
> queuing up in NiFi?
> Was NiFi not producing messages as fast as you expected?
> What kind of environment were you running this? All on a local machine for 
> testing?
> 
> -Bryan
> 
> On Thu, Oct 8, 2015 at 6:52 AM, Aurélien DEHAY  > wrote:
> Hello.
>  
> I’m doing some experimentations on Apache Nifi to see where we can use it. 
>  
> One idea is to use nifi to feed a spark cluster. So I’m doing some simple 
> test (GenerateFlowFile => spark output port and a simple word count on spark 
> side.
>  
> I was pretty unhappy with the performance out of the box, so I looked on the 
> net and found almost nothing.
>  
> So I looked at nifi.properties, and found that some of the following 
> properties have a huge impact on how many messages / second were processed to 
> Spark :
>  
> nifi.queue.swap.threshold=2
> nifi.swap.in.period=1 sec
> nifi.swap.in.threads=1
> nifi.swap.out.period=1 sec
> nifi.swap.out.threads=4
>  
> The documentation seems unclear on this point for output ports, is anyone 
> have a pointer for me ?
>  
> Thanks.
>  
> Aurélien.
> 
> <2015-10-08 17_07_04-Spark shell - Streaming 
> Statistics.png><2015-10-08 17_13_28-Spark shell - Streaming 
> Statistics.png><2015-10-08 17_21_09-Spark shell - Streaming Statistics.png>