[GitHub] nifi issue #497: NIFI-1857: HTTPS Site-to-Site

2016-06-08 Thread ijokarumawak
Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/497
  
@markap14 I'm not 100% sure if the change will solve the problem since I 
couldn't reproduce the issue myself.

However, I've added additional buffer draining code after receiving EOF 
from channel.
In addition to that, a failure detection code was added to check If it 
fails sending data as much as expected, when it does, it throws 
RuntimeException so that we can investigate further.

Also, I've changed the default nifi.remote.input.secure to `false`.

Could you give it a try once again? If it sill fails, please share your 
flow template. Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #503: NIFI-1978: Restore RPG yield duration.

2016-06-08 Thread ijokarumawak
Github user ijokarumawak closed the pull request at:

https://github.com/apache/nifi/pull/503


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #503: NIFI-1978: Restore RPG yield duration.

2016-06-08 Thread ijokarumawak
Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/503
  
@bbende Thanks for reviewing and merging!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #492: NIFI-1975 - Processor for parsing evtx files

2016-06-08 Thread brosander
Github user brosander commented on a diff in the pull request:

https://github.com/apache/nifi/pull/492#discussion_r66348434
  
--- Diff: 
nifi-nar-bundles/nifi-evtx-bundle/nifi-evtx-processors/src/test/java/org/apache/nifi/processors/evtx/ParseEvtxTest.java
 ---
@@ -0,0 +1,481 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.processors.evtx;
+
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.flowfile.attributes.CoreAttributes;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.io.OutputStreamCallback;
+import org.apache.nifi.processors.evtx.parser.ChunkHeader;
+import org.apache.nifi.processors.evtx.parser.FileHeader;
+import org.apache.nifi.processors.evtx.parser.FileHeaderFactory;
+import org.apache.nifi.processors.evtx.parser.MalformedChunkException;
+import org.apache.nifi.processors.evtx.parser.Record;
+import org.apache.nifi.processors.evtx.parser.bxml.RootNode;
+import org.apache.nifi.util.MockFlowFile;
+import org.apache.nifi.util.TestRunner;
+import org.apache.nifi.util.TestRunners;
+import org.junit.Before;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.mockito.Mock;
+import org.mockito.runners.MockitoJUnitRunner;
+import org.w3c.dom.Document;
+import org.w3c.dom.Element;
+import org.w3c.dom.Node;
+import org.w3c.dom.NodeList;
+import org.xml.sax.SAXException;
+
+import javax.xml.parsers.DocumentBuilderFactory;
+import javax.xml.parsers.ParserConfigurationException;
+import javax.xml.stream.XMLStreamException;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import static org.mockito.Mockito.any;
+import static org.mockito.Mockito.anyString;
+import static org.mockito.Mockito.eq;
+import static org.mockito.Mockito.isA;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.verifyNoMoreInteractions;
+import static org.mockito.Mockito.when;
+
+@RunWith(MockitoJUnitRunner.class)
+public class ParseEvtxTest {
+public static final DocumentBuilderFactory DOCUMENT_BUILDER_FACTORY = 
DocumentBuilderFactory.newInstance();
+public static final String USER_DATA = "UserData";
+public static final String EVENT_DATA = "EventData";
+public static final Set DATA_TAGS = new 
HashSet<>(Arrays.asList(EVENT_DATA, USER_DATA));
+
+@Mock
+FileHeaderFactory fileHeaderFactory;
+
+@Mock
+MalformedChunkHandler malformedChunkHandler;
+
+@Mock
+RootNodeHandlerFactory rootNodeHandlerFactory;
+
+@Mock
+ResultProcessor resultProcessor;
+
+@Mock
+ComponentLog componentLog;
+
+@Mock
+InputStream in;
+
+@Mock
+OutputStream out;
+
+@Mock
+FileHeader fileHeader;
+
+ParseEvtx parseEvtx;
+
+@Before
+public void setup() throws XMLStreamException, IOException {
+parseEvtx = new ParseEvtx(fileHeaderFactory, 
malformedChunkHandler, rootNodeHandlerFactory, resultProcessor);
+when(fileHeaderFactory.create(in, 
componentLog)).thenReturn(fileHeader);
+}
+
+@Test
+public void testGetNameFile() {
+String basename = "basename";
+assertEquals(basename + ".xml", parseEvtx.getName(basename, null, 
null, ParseEvtx.XML_EXTENSION));
+}
+
+@Test
+public void testGetNameFileChunk() {
+String 

[GitHub] nifi issue #397: NIFI-1815

2016-06-08 Thread olegz
Github user olegz commented on the issue:

https://github.com/apache/nifi/pull/397
  
I stepped away from it as I am trying to finish something else that is very 
involved. Will get back to it once I am finished, but I am committed to getting 
it in to both 0.7.0 and 1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Limiting a queue

2016-06-08 Thread Shaine Berube
Thanks for your responses,

As far as possibly contributing these back to the community, I need a good
way for them to connect to any database and generate that database's
specific flavor of SQL, so when I get that functionality built out I would
be glad to.

As far as the memory goes, step two sends an insert query per flow file
(for migration), and each query is designed to pull out 1000 records, if
that makes more sense.  But good to know that Nifi can handle a lot of flow
files, with the back-pressure configured it should wait for the queue ahead
to clear out before starting the next table.
Also forgot to mention, this is interacting with two live databases, but is
going through my VM, so in other words, it can actually be faster if placed
on the machine the target database is running on.

Fun facts - I'm running bench marking now, the speeds I'm seeing are
because of the concurrent processing functionality of Nifi
562,000 records inserted from 1 source table into 1 target table - speed: 8
minutes 14 seconds (I'm being throttled by the source database).
At that speed, it is approximately 1,137 records per second.

Step two is running 1 thread, step three is running 60 threads, step 4 is
running 30 threads.

On Wed, Jun 8, 2016 at 1:23 PM, Mark Payne  wrote:

> Shaine,
>
> This is a really cool set of functionality! Any chance you would be
> interested in contributing
> these processors back to the NiFi community?
>
> Regardless, one thing to consider here is that with NiFi, because of the
> way that the repositories
> are structured, the way that we think about heap utilization is a little
> different than with most projects.
> As Bryan pointed out, you will want to stream the content directly to the
> FlowFile, rather than buffering
> in memory. The framework will handle the rest. Where we will be more
> concerned about heap utilization
> is actually in the number of FlowFiles that are held in memory at any one
> time, not the size of those FlowFiles.
> So you will be better off keeping a smaller number of FlowFiles, each
> having larger content. So I would
> recommend making the number of records per FlowFile configurable, perhaps
> with a default value of
> 25,000. This would also result in far fewer JDBC calls, which should be
> beneficial performance-wise.
> NiFi will handle swapping FlowFiles to disk when they are queued up, so
> you can certainly queue up
> millions of FlowFiles in a single queue without exhausting your heap
> space. However, if you are buffering
> up all of those FlowFiles in your processor, you may run into problems, so
> using a smaller number of
> FlowFiles, each with many thousand records will likely provide the best
> heap utilization.
>
> Does this help?
>
> Thanks
> -Mark
>
>
> > On Jun 8, 2016, at 2:05 PM, Bryan Bende  wrote:
> >
> > Thank you for the detailed explanation! It sounds like you have built
> > something very cool here.
> >
> > I'm still digesting the different steps and thinking of what can be done,
> > but something that initially jumped out at me was
> > when you mentioned considering how much memory NiFi has and not wanting
> to
> > go over 1000 records per chunk...
> >
> > You should be able to read and write the chunks in a streaming fashion
> and
> > never have the entire chunk in memory. For example,
> > when creating the chunk you would be looping over a ResultSet from the
> > database and writing each record to the OutputStream of the
> > FlowFile, never having all 1000 records in memory. On the down stream
> > processor you would read the record from the  InputStream of the
> > FlowFile, sending each one to the destination database, again not having
> > all 1000 records in memory. If you can operate like this then having
> > 1000 records per chunk, or 100,000 records per chunk, shouldn't change
> the
> > memory requirement for NiFi.
> >
> > An example of what we do for ExecuteSQL and QueryDatabaseTable is in the
> > JdbcCommon util where it converts the ResultSet to Avro records by
> writing
> > to the OutputStream:
> >
> https://github.com/apache/nifi/blob/e4b7e47836edf47042973e604005058c28eed23b/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/util/JdbcCommon.java#L80
> >
> > Another point is that it is not necessarily a bad thing to have say
> 10,000
> > Flow Files in a queue. The queue is not actually holding the content of
> > those FlowFiles, it is only holding pointers to where the content is, and
> > only when
> > the next processor does a session.read(flowFile, ...) does it then read
> in
> > the content as a stream. In general NiFi should be able to handle 10s of
> > thousands, or even 100s of thousands of Flow Files sitting in a queue.
> >
> > With your current approach have you seen a specific issue, such as out of
> > memory exceptions? or were you just concerned by the number of flow files
> > in the queue continuing to grow?
> >

Re: Consuming web services through NiFi

2016-06-08 Thread saikrishnat
Hi Matt,
Thank you for the reply , when i tried to use the example
"Working_With_CSV.xml" i am getting

InvokeHTTP[id=a3aab33d-76dd-4169-9a29-fd0aeae219f3] Yielding processor due
to exception encountered as a source processor:
java.net.SocketTimeoutException: connect timed out

i could go to the URL when i try from my browser directly. but not thru
NiFi.
i tried other sites thru GetHTTP process , getting same error..

any idea.??



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Consuming-web-services-through-NiFi-tp11190p11249.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


[GitHub] nifi issue #439: NIFI-1866 ProcessException handling in StandardProcessSessi...

2016-06-08 Thread markap14
Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/439
  
@pvillard31 I got this merged into both master and 0.x baselines. Thanks 
for knocking this out!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #439: NIFI-1866 ProcessException handling in StandardProce...

2016-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/439


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #503: NIFI-1978: Restore RPG yield duration.

2016-06-08 Thread bbende
Github user bbende commented on the issue:

https://github.com/apache/nifi/pull/503
  
@ijokarumawak I merged to 0.x but GitHub doesn't auto-close from that 
branch, can you close this PR when you have a chance? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #397: NIFI-1815

2016-06-08 Thread jdye64
Github user jdye64 commented on the issue:

https://github.com/apache/nifi/pull/397
  
Olegz any luck getting your local install of Tesseract to work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #503: NIFI-1978: Restore RPG yield duration.

2016-06-08 Thread bbende
Github user bbende commented on the issue:

https://github.com/apache/nifi/pull/503
  
+1 good find, verified the yield duration is maintained across restarts, 
will merge to 0.x


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #503: NIFI-1978: Restore RPG yield duration.

2016-06-08 Thread bbende
Github user bbende commented on the issue:

https://github.com/apache/nifi/pull/503
  
Reviewing...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #499: NIFI-1052: Added "Ghost" Processors, Reporting Tasks...

2016-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/499


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Limiting a queue

2016-06-08 Thread Mark Payne
Shaine,

This is a really cool set of functionality! Any chance you would be interested 
in contributing
these processors back to the NiFi community?

Regardless, one thing to consider here is that with NiFi, because of the way 
that the repositories
are structured, the way that we think about heap utilization is a little 
different than with most projects.
As Bryan pointed out, you will want to stream the content directly to the 
FlowFile, rather than buffering
in memory. The framework will handle the rest. Where we will be more concerned 
about heap utilization
is actually in the number of FlowFiles that are held in memory at any one time, 
not the size of those FlowFiles.
So you will be better off keeping a smaller number of FlowFiles, each having 
larger content. So I would
recommend making the number of records per FlowFile configurable, perhaps with 
a default value of
25,000. This would also result in far fewer JDBC calls, which should be 
beneficial performance-wise.
NiFi will handle swapping FlowFiles to disk when they are queued up, so you can 
certainly queue up
millions of FlowFiles in a single queue without exhausting your heap space. 
However, if you are buffering
up all of those FlowFiles in your processor, you may run into problems, so 
using a smaller number of
FlowFiles, each with many thousand records will likely provide the best heap 
utilization.

Does this help?

Thanks
-Mark


> On Jun 8, 2016, at 2:05 PM, Bryan Bende  wrote:
> 
> Thank you for the detailed explanation! It sounds like you have built
> something very cool here.
> 
> I'm still digesting the different steps and thinking of what can be done,
> but something that initially jumped out at me was
> when you mentioned considering how much memory NiFi has and not wanting to
> go over 1000 records per chunk...
> 
> You should be able to read and write the chunks in a streaming fashion and
> never have the entire chunk in memory. For example,
> when creating the chunk you would be looping over a ResultSet from the
> database and writing each record to the OutputStream of the
> FlowFile, never having all 1000 records in memory. On the down stream
> processor you would read the record from the  InputStream of the
> FlowFile, sending each one to the destination database, again not having
> all 1000 records in memory. If you can operate like this then having
> 1000 records per chunk, or 100,000 records per chunk, shouldn't change the
> memory requirement for NiFi.
> 
> An example of what we do for ExecuteSQL and QueryDatabaseTable is in the
> JdbcCommon util where it converts the ResultSet to Avro records by writing
> to the OutputStream:
> https://github.com/apache/nifi/blob/e4b7e47836edf47042973e604005058c28eed23b/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/util/JdbcCommon.java#L80
> 
> Another point is that it is not necessarily a bad thing to have say 10,000
> Flow Files in a queue. The queue is not actually holding the content of
> those FlowFiles, it is only holding pointers to where the content is, and
> only when
> the next processor does a session.read(flowFile, ...) does it then read in
> the content as a stream. In general NiFi should be able to handle 10s of
> thousands, or even 100s of thousands of Flow Files sitting in a queue.
> 
> With your current approach have you seen a specific issue, such as out of
> memory exceptions? or were you just concerned by the number of flow files
> in the queue continuing to grow?
> 
> I'll continue to think about this more, and maybe someone else on the list
> has additional idea/thoughts.
> 
> -Bryan
> 
> 
> 
> On Wed, Jun 8, 2016 at 12:29 PM, Shaine Berube <
> shaine.ber...@perfectsearchcorp.com> wrote:
> 
>> Perhaps I need to explain a little more about the data flow as a whole.
>> But yes, Processor A is a custom built processor.
>> 
>> In explanation:
>> The data flow that I've built out is basically a 4 to 6 step process (6
>> steps because the first and last processors are optional).  In the four
>> step process, step one gathers information from the source and target
>> databases in preparation for the migration of data from source to target,
>> this includes table names, primary keys, and record counts, step one then
>> produces a flow file per table which in this case is 24 flow files.
>> 
>> Step two of the process would be the equivalent of processor A, in step two
>> I'm taking in a flow file and generating the SQL queries that are going to
>> be run.  The reason the back pressure doesn't work therefore is because the
>> processor is working on one file, which corresponds to a table, which said
>> table will be split into 1000 record chunks with the SQL query splitting.
>> A fair few of these tables however, are over 10 million records, which
>> means that on a single execution, this processor will generate over 10,000
>> flow files (1000 record chunks).  As far as it goes, I cannot 

[GitHub] nifi issue #499: NIFI-1052: Added "Ghost" Processors, Reporting Tasks, Contr...

2016-06-08 Thread bbende
Github user bbende commented on the issue:

https://github.com/apache/nifi/pull/499
  
+1 code looks good, build passes, tested a missing processor, controller 
service, and reporting task and the app still starts up, awesome stuff! Will 
merge into master shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #511: NIFI-1850 - JSON-to-JSON Schema Converter Editor

2016-06-08 Thread YolandaMDavis
GitHub user YolandaMDavis opened a pull request:

https://github.com/apache/nifi/pull/511

NIFI-1850 - JSON-to-JSON Schema Converter Editor 

This is a merge from 0.7 of the Json-to-Json (Jolt) Editor with refactoring 
for masterless clustering and bower dependency support.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/YolandaMDavis/nifi NIFI-1850-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/511.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #511


commit 8d6b9a454b5fee312e843aea3fcefcc794087ecf
Author: Yolanda M. Davis 
Date:   2016-06-08T04:53:38Z

NIFI-1850 - Initial Commit for JSON-to-JSON Schema Converter Editor (merge 
from 0.7.0 - refactor for masterless cluster)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #499: NIFI-1052: Added "Ghost" Processors, Reporting Tasks, Contr...

2016-06-08 Thread bbende
Github user bbende commented on the issue:

https://github.com/apache/nifi/pull/499
  
Reviewing...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #239: Nifi 1540 - AWS Kinesis Get and Put Processors

2016-06-08 Thread jvwing
Github user jvwing commented on the issue:

https://github.com/apache/nifi/pull/239
  
@joewitt, would you please help us with the licensing/notice requirements 
for using the Kinesis Client Library and Kinesis Producer Library?  The Kinesis 
libraries are licensed under the [Amazon Software 
License](https://aws.amazon.com/asl/).  This does not appear on the published 
[list of Apache-compatible 
licenses](http://www.apache.org/legal/resolved.html#category-a).

The Apache Spark project includes comparable use of the Kinesis library, 
although they have chosen to [present their Kinesis integration as an optional 
add-on](http://spark.apache.org/docs/latest/streaming-kinesis-integration.html).
  Comparable code is in fact [checked into the Spark 
repo](https://github.com/apache/spark/tree/master/external), but I did not find 
mention of the ASL in a NOTICE file. I was really hoping to copy and paste.  I 
found a [JIRA issue raised by the Spark team for the license 
discussion](https://issues.apache.org/jira/browse/LEGAL-198) which discusses 
the add-on nature of the component, but not specific referencing language.

Is this OK?  How can we determine what needs to be added to the NOTICE file 
in nifi-aws-nar?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Limiting a queue

2016-06-08 Thread Bryan Bende
Thank you for the detailed explanation! It sounds like you have built
something very cool here.

I'm still digesting the different steps and thinking of what can be done,
but something that initially jumped out at me was
when you mentioned considering how much memory NiFi has and not wanting to
go over 1000 records per chunk...

You should be able to read and write the chunks in a streaming fashion and
never have the entire chunk in memory. For example,
when creating the chunk you would be looping over a ResultSet from the
database and writing each record to the OutputStream of the
FlowFile, never having all 1000 records in memory. On the down stream
processor you would read the record from the  InputStream of the
FlowFile, sending each one to the destination database, again not having
all 1000 records in memory. If you can operate like this then having
1000 records per chunk, or 100,000 records per chunk, shouldn't change the
memory requirement for NiFi.

An example of what we do for ExecuteSQL and QueryDatabaseTable is in the
JdbcCommon util where it converts the ResultSet to Avro records by writing
to the OutputStream:
https://github.com/apache/nifi/blob/e4b7e47836edf47042973e604005058c28eed23b/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/util/JdbcCommon.java#L80

Another point is that it is not necessarily a bad thing to have say 10,000
Flow Files in a queue. The queue is not actually holding the content of
those FlowFiles, it is only holding pointers to where the content is, and
only when
the next processor does a session.read(flowFile, ...) does it then read in
the content as a stream. In general NiFi should be able to handle 10s of
thousands, or even 100s of thousands of Flow Files sitting in a queue.

With your current approach have you seen a specific issue, such as out of
memory exceptions? or were you just concerned by the number of flow files
in the queue continuing to grow?

I'll continue to think about this more, and maybe someone else on the list
has additional idea/thoughts.

-Bryan



On Wed, Jun 8, 2016 at 12:29 PM, Shaine Berube <
shaine.ber...@perfectsearchcorp.com> wrote:

> Perhaps I need to explain a little more about the data flow as a whole.
> But yes, Processor A is a custom built processor.
>
> In explanation:
> The data flow that I've built out is basically a 4 to 6 step process (6
> steps because the first and last processors are optional).  In the four
> step process, step one gathers information from the source and target
> databases in preparation for the migration of data from source to target,
> this includes table names, primary keys, and record counts, step one then
> produces a flow file per table which in this case is 24 flow files.
>
> Step two of the process would be the equivalent of processor A, in step two
> I'm taking in a flow file and generating the SQL queries that are going to
> be run.  The reason the back pressure doesn't work therefore is because the
> processor is working on one file, which corresponds to a table, which said
> table will be split into 1000 record chunks with the SQL query splitting.
> A fair few of these tables however, are over 10 million records, which
> means that on a single execution, this processor will generate over 10,000
> flow files (1000 record chunks).  As far as it goes, I cannot save this
> information directly to the VM or server that I'm running the data flow on,
> because the information can contain extremely sensitive and secure data.
> That being said, I need to consider how much memory the Nifi process has to
> run, so I don't want to go over 1000 records in a chunk.
>
> Step three of the process takes each individual flow file from the queue,
> pulls the SQL query out of the flow file contents, runs it against source,
> and then puts the results in either a CSV or an XML format into the
> contents of a flow file and sends it to the next queue.
>
> Step four of the process takes the results out of the flow file contents,
> sticks them into an SQL query and runs it against target.
>
> Keep in mind: this data flow has been built to handle migration, but also
> is attempting to keep up to date (incrementor/listener), with the source
> database.  Given that we don't have full access to the source database, I'm
> basically limited to running select queries against it and gathering the
> information I need to put into target.  But this data flow is configured to
> handle INSERT and UPDATE SQL queries, with DELETE queries coming some time
> in the future.  The data flow is configured so that step one can either be
> the migrator (full data dump), or the incrementor (incremental data dump,
> use incrementor after migrator has been run).
>
> Now, the six step process adds a step before step one that allows step one
> to be multi-threaded, and it adds a step after step four that runs the
> queries (basically step four turns into the step that generates queries 

[GitHub] nifi pull request #400: Fix for NIFI-1838 & NIFI-1152 & Code modification fo...

2016-06-08 Thread PuspenduBanerjee
Github user PuspenduBanerjee closed the pull request at:

https://github.com/apache/nifi/pull/400


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #492: NIFI-1975 - Processor for parsing evtx files

2016-06-08 Thread mattyb149
Github user mattyb149 commented on a diff in the pull request:

https://github.com/apache/nifi/pull/492#discussion_r66292230
  
--- Diff: 
nifi-nar-bundles/nifi-evtx-bundle/nifi-evtx-processors/src/test/java/org/apache/nifi/processors/evtx/ParseEvtxTest.java
 ---
@@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.processors.evtx;
+
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.flowfile.attributes.CoreAttributes;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.io.OutputStreamCallback;
+import org.apache.nifi.processors.evtx.parser.ChunkHeader;
+import org.apache.nifi.processors.evtx.parser.FileHeader;
+import org.apache.nifi.processors.evtx.parser.FileHeaderFactory;
+import org.apache.nifi.processors.evtx.parser.MalformedChunkException;
+import org.apache.nifi.processors.evtx.parser.Record;
+import org.apache.nifi.processors.evtx.parser.bxml.RootNode;
+import org.junit.Before;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.mockito.Mock;
+import org.mockito.runners.MockitoJUnitRunner;
+
+import javax.xml.stream.XMLStreamException;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.concurrent.atomic.AtomicReference;
+
+import static org.junit.Assert.assertEquals;
+import static org.mockito.Mockito.any;
+import static org.mockito.Mockito.anyString;
+import static org.mockito.Mockito.eq;
+import static org.mockito.Mockito.isA;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.verifyNoMoreInteractions;
+import static org.mockito.Mockito.when;
+
+@RunWith(MockitoJUnitRunner.class)
+public class ParseEvtxTest {
--- End diff --

This class tests the individual methods in the ParseEvtx processor, but not 
the processor lifecycle (like onTrigger). Can you add some more tests that 
exercise the processor? An example of using the nifi-mock framework can be 
found in TestEvaluateXPath, it has the TestRunner stuff with flowfiles, 
relationships, asserts, etc. You will likely want a test file or two to be used 
as input, although if line endings/whitespace are important in the format you 
may just need the data directly in the Java code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Limiting a queue

2016-06-08 Thread Bryan Bende
Ok I didn't realize you had already tried setting the back-pressure
settings. Can you described the processors a little more, are they custom
processors?

I am guessing that ProcessorA is producing all 5k flow files from a single
execution of onTrigger, which would explain why back-pressure didn't solve
the problem, because
back-pressure would stop the processor from executing again, but its
already too late because the first execution already went over the limit.

Without knowing too much about what ProcessorA is doing, I'm wondering  if
there is a way to put some indirection between the two processors. What if
ProcessorA sent its
output to a PutFile processor that wrote all the chunks out to a directory,
then there was a separate GetFile processor that was concurrently picking
up the chunks from that
directory and sending to ProcessorB?

Then the back-pressure between GetFile and ProcessorB would work because
once the queue reached 2000, GetFile wouldn't pick up anymore files. The
downside is you
would need enough disk-space on your NiFi node to possibly store your whole
database table, which may not be an option.

Another idea might be to have two levels of chunks, for example with the
SplitText processor if we want to split a file with 1 million lines in it,
rather than do one split producing
1 million flow files, we usually do a split to 10k chunks, then another
split down to 1 line. Maybe ProcessorA could produce much large chunks, say
10k or 100k records each,
then the next processor further splits those before going to ProcessorB.
This would also allow back-pressure to work a little better the second
split processor and ProcessorB.

If anyone else has ideas here, feel free to chime in.

Thanks,

Bryan

On Wed, Jun 8, 2016 at 10:51 AM, Shaine Berube <
shaine.ber...@perfectsearchcorp.com> wrote:

> I do need more information, because I tried using that option, but the
> processor just continued filling the queue anyway, I told it to only allow
> 2000 before back pressure kicks in, but it kept going and I ended up with
> 5k files in the queue before I restarted Nifi to get the processor to stop.
>
> On Wed, Jun 8, 2016 at 8:45 AM, Bryan Bende  wrote:
>
> > Hello,
> >
> > Take a look at the options available when right-clicking on a queue...
> > What you described is what NiFi calls back-pressure. You can configured a
> > queue to have an object threshold (# of flow files) or data size
> threshold
> > (total size of all flow files).
> > When one of these thresholds is reached, NiFi will no longer let the
> source
> > processor run until the condition goes back under the threshold.
> >
> > Let us know if you need any more info on this.
> >
> > Thanks,
> >
> > Bryan
> >
> > On Wed, Jun 8, 2016 at 10:40 AM, Shaine Berube <
> > shaine.ber...@perfectsearchcorp.com> wrote:
> >
> > > Hello all,
> > >
> > > I'm kind of new to developing Nifi, though I've been doing some pretty
> in
> > > depth stuff and some advanced database queries.  My question is in
> > > regarding the queues between processor, I want to limit a queue to...
> say
> > > 2000, how would I go about doing that?  Or better yet, how would I tell
> > the
> > > processor generating the queue to only put a max of 2000 files into the
> > > queue?
> > >
> > > Allow me to explain with a scenario:
> > > We are doing data migration from one database to another.
> > > -Processor A is generating a queue consumed by Processor B
> > > -Processor A is taking configuration and generating SQL queries in 1000
> > > record chunks so that Processor B can insert them into a new database.
> > > Given the size of the source database, Processor A can potentially
> > generate
> > > hundreds of thousands of files.
> > >
> > > Is there a way for Processor A to check it's down stream queue for the
> > > queue size?  How would I get Processor A to only put 2000 files into
> the
> > > queue at any given time, so that Processor A can continue running but
> > wait
> > > for room in the queue?
> > >
> > > Thank you in advance.
> > >
> > > --
> > > *Shaine Berube*
> > >
> >
>
>
>
> --
> *Shaine Berube*
>


[GitHub] nifi pull request #492: NIFI-1975 - Processor for parsing evtx files

2016-06-08 Thread joewitt
Github user joewitt commented on a diff in the pull request:

https://github.com/apache/nifi/pull/492#discussion_r66277217
  
--- Diff: 
nifi-nar-bundles/nifi-evtx-bundle/nifi-evtx-nar/src/main/resources/META-INF/NOTICE
 ---
@@ -0,0 +1,36 @@
+nifi-evtx-nar
+Copyright 2016 The Apache Software Foundation
+
+This includes derived works from the Apache Software License V2 library 
python-evtx (https://github.com/williballenthin/python-evtx)
+Copyright 2012, 2013 Willi Ballenthin william.ballent...@mandiant.com
+while at Mandiant http://www.mandiant.com
+The derived work is adapted from Evtx/Evtx.py, Evtx/BinaryParser.py, 
Evtx/Nodes.py, Evtx/Views.py and can be found in the 
org.apache.nifi.processors.evtx.parser package.
+
--- End diff --

i am hardly authoritative but i did review this case and believe it to be 
correct.  Some would argue it is more than necessary i'm sure but let's err on 
the side of doing more than we must.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #492: NIFI-1975 - Processor for parsing evtx files

2016-06-08 Thread brosander
Github user brosander commented on the issue:

https://github.com/apache/nifi/pull/492
  
@mattyb149 updated those poms, verified that the nar is in the assembly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Limiting a queue

2016-06-08 Thread Shaine Berube
I do need more information, because I tried using that option, but the
processor just continued filling the queue anyway, I told it to only allow
2000 before back pressure kicks in, but it kept going and I ended up with
5k files in the queue before I restarted Nifi to get the processor to stop.

On Wed, Jun 8, 2016 at 8:45 AM, Bryan Bende  wrote:

> Hello,
>
> Take a look at the options available when right-clicking on a queue...
> What you described is what NiFi calls back-pressure. You can configured a
> queue to have an object threshold (# of flow files) or data size threshold
> (total size of all flow files).
> When one of these thresholds is reached, NiFi will no longer let the source
> processor run until the condition goes back under the threshold.
>
> Let us know if you need any more info on this.
>
> Thanks,
>
> Bryan
>
> On Wed, Jun 8, 2016 at 10:40 AM, Shaine Berube <
> shaine.ber...@perfectsearchcorp.com> wrote:
>
> > Hello all,
> >
> > I'm kind of new to developing Nifi, though I've been doing some pretty in
> > depth stuff and some advanced database queries.  My question is in
> > regarding the queues between processor, I want to limit a queue to... say
> > 2000, how would I go about doing that?  Or better yet, how would I tell
> the
> > processor generating the queue to only put a max of 2000 files into the
> > queue?
> >
> > Allow me to explain with a scenario:
> > We are doing data migration from one database to another.
> > -Processor A is generating a queue consumed by Processor B
> > -Processor A is taking configuration and generating SQL queries in 1000
> > record chunks so that Processor B can insert them into a new database.
> > Given the size of the source database, Processor A can potentially
> generate
> > hundreds of thousands of files.
> >
> > Is there a way for Processor A to check it's down stream queue for the
> > queue size?  How would I get Processor A to only put 2000 files into the
> > queue at any given time, so that Processor A can continue running but
> wait
> > for room in the queue?
> >
> > Thank you in advance.
> >
> > --
> > *Shaine Berube*
> >
>



-- 
*Shaine Berube*


[GitHub] nifi issue #502: Nifi-1972 Apache Ignite Put Cache Processor

2016-06-08 Thread mans2singh
Github user mans2singh commented on the issue:

https://github.com/apache/nifi/pull/502
  
Hey Folks:

Please let me know your thoughts/suggestions on this Nifi Ignite Put 
Processor.

Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] - Markdown option for documentation artifacts

2016-06-08 Thread Matt Burgess
+1 with template

On Wed, Jun 8, 2016 at 10:39 AM, dan bress  wrote:
> +1
>
> On Wed, Jun 8, 2016 at 7:05 AM Andre  wrote:
>
>> +1 on this + a template that matches existing additional.html
>> On 8 Jun 2016 04:28, "Bryan Rosander"  wrote:
>>
>> > Hey all,
>> >
>> > When writing documentation (e.g. the additionalDetails.html for a
>> > processor) it would be nice to have the option to use Markdown instead of
>> > html.
>> >
>> > I think Markdown is easier to read and write than raw HTML and for simple
>> > cases does the job pretty well.  It also has the advantage of being able
>> to
>> > be translated into other document types easily and it would be rendered
>> by
>> > default in Github when the file is clicked.
>> >
>> > There is an MIT-licensed Markdown maven plugin (
>> > https://github.com/walokra/markdown-page-generator-plugin) that seems
>> like
>> > it might work for translating additionalDetails.md (and others) into an
>> > equivalent html page.
>> >
>> > Thanks,
>> > Bryan Rosander
>> >
>>


[GitHub] nifi issue #497: NIFI-1857: HTTPS Site-to-Site

2016-06-08 Thread markap14
Github user markap14 commented on the issue:

https://github.com/apache/nifi/pull/497
  
@ijokarumawak i checked out the new PR and tried the test again. Updated 
nifi.properties only to set secure = false for site-to-site (I think this needs 
to be the default because as-is, nifi doesn't startup out of the box).

I did get further this time, and saw the receiving side trying to receive 
data but still got Exceptions and no data coming through. The logs show:

```
2016-06-08 10:39:04,660 ERROR [Timer-Driven Process Thread-10] 
o.a.nifi.remote.StandardRemoteGroupPort 
RemoteGroupPort[name=Log,target=http://localhost:8080/nifi] failed to 
communicate with remote NiFi instance due to java.io.IOException: Failed to 
confirm transaction with Peer[url=http://127.0.0.1:8080/nifi-api] due to 
java.io.IOException: Unexpected response code: 500 errCode:Abort 
errMessage:Server encountered an exception.
2016-06-08 10:39:05,668 ERROR [NiFi Web Server-24] 
o.apache.nifi.web.api.SiteToSiteResource Unexpected exception occurred. 
clientId=a252a4c6-5a5f-42f3-8270-322923b8c118, 
portId=30638675-e655-4719-b684-905ad0d49eac
2016-06-08 10:39:05,671 ERROR [NiFi Web Server-24] 
o.apache.nifi.web.api.SiteToSiteResource Exception detail:
org.apache.nifi.processor.exception.ProcessException: 
org.apache.nifi.processor.exception.FlowFileAccessException: Failed to import 
data from org.apache.nifi.stream.io.MinimumLengthInputStream@75b15a7e for 
StandardFlowFileRecord[uuid=d8046125-80be-4475-bb43-b15cf6dec4d8,claim=,offset=0,name=531446421738053,size=0]
 due to org.apache.nifi.processor.exception.FlowFileAccessException: Unable to 
create ContentClaim due to java.io.EOFException
at 
org.apache.nifi.remote.StandardRootGroupPort.receiveFlowFiles(StandardRootGroupPort.java:503)
 ~[nifi-site-to-site-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
at 
org.apache.nifi.web.api.SiteToSiteResource.receiveFlowFiles(SiteToSiteResource.java:418)
 ~[classes/:na]
at sun.reflect.GeneratedMethodAccessor348.invoke(Unknown Source) 
~[na:na]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.8.0_60]
at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_60]
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
 [jersey-server-1.19.jar:1.19]
at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
 [jersey-servlet-1.19.jar:1.19]
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
 [jersey-servlet-1.19.jar:1.19]
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
 [jersey-servlet-1.19.jar:1.19]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) 
[javax.servlet-api-3.1.0.jar:3.1.0]
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845) 
[jetty-servlet-9.3.9.v20160517.jar:9.3.9.v20160517]
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
 [jetty-servlet-9.3.9.v20160517.jar:9.3.9.v20160517]
at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:51) 
[jetty-servlets-9.3.9.v20160517.jar:9.3.9.v20160517]
at 

Limiting a queue

2016-06-08 Thread Shaine Berube
Hello all,

I'm kind of new to developing Nifi, though I've been doing some pretty in
depth stuff and some advanced database queries.  My question is in
regarding the queues between processor, I want to limit a queue to... say
2000, how would I go about doing that?  Or better yet, how would I tell the
processor generating the queue to only put a max of 2000 files into the
queue?

Allow me to explain with a scenario:
We are doing data migration from one database to another.
-Processor A is generating a queue consumed by Processor B
-Processor A is taking configuration and generating SQL queries in 1000
record chunks so that Processor B can insert them into a new database.
Given the size of the source database, Processor A can potentially generate
hundreds of thousands of files.

Is there a way for Processor A to check it's down stream queue for the
queue size?  How would I get Processor A to only put 2000 files into the
queue at any given time, so that Processor A can continue running but wait
for room in the queue?

Thank you in advance.

-- 
*Shaine Berube*


Re: [DISCUSS] - Markdown option for documentation artifacts

2016-06-08 Thread dan bress
+1

On Wed, Jun 8, 2016 at 7:05 AM Andre  wrote:

> +1 on this + a template that matches existing additional.html
> On 8 Jun 2016 04:28, "Bryan Rosander"  wrote:
>
> > Hey all,
> >
> > When writing documentation (e.g. the additionalDetails.html for a
> > processor) it would be nice to have the option to use Markdown instead of
> > html.
> >
> > I think Markdown is easier to read and write than raw HTML and for simple
> > cases does the job pretty well.  It also has the advantage of being able
> to
> > be translated into other document types easily and it would be rendered
> by
> > default in Github when the file is clicked.
> >
> > There is an MIT-licensed Markdown maven plugin (
> > https://github.com/walokra/markdown-page-generator-plugin) that seems
> like
> > it might work for translating additionalDetails.md (and others) into an
> > equivalent html page.
> >
> > Thanks,
> > Bryan Rosander
> >
>


[GitHub] nifi issue #510: NIFI-1984: Ensure that locks are always cleaned up by Naive...

2016-06-08 Thread mattyb149
Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/510
  
+1 LGTM, built and ran tests, also started NiFi and tried various delete 
operations including an attempt to delete a processor that had an incoming 
connection. All behavior was as expected


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] - Markdown option for documentation artifacts

2016-06-08 Thread Andre
+1 on this + a template that matches existing additional.html
On 8 Jun 2016 04:28, "Bryan Rosander"  wrote:

> Hey all,
>
> When writing documentation (e.g. the additionalDetails.html for a
> processor) it would be nice to have the option to use Markdown instead of
> html.
>
> I think Markdown is easier to read and write than raw HTML and for simple
> cases does the job pretty well.  It also has the advantage of being able to
> be translated into other document types easily and it would be rendered by
> default in Github when the file is clicked.
>
> There is an MIT-licensed Markdown maven plugin (
> https://github.com/walokra/markdown-page-generator-plugin) that seems like
> it might work for translating additionalDetails.md (and others) into an
> equivalent html page.
>
> Thanks,
> Bryan Rosander
>


[GitHub] nifi issue #492: NIFI-1975 - Processor for parsing evtx files

2016-06-08 Thread mattyb149
Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/492
  
The nifi-evtx-nar needs to be added to the top-level POM and the 
nifi-assembly POM, otherwise it will not be included in the distro.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #510: NIFI-1984: Ensure that locks are always cleaned up b...

2016-06-08 Thread markap14
GitHub user markap14 opened a pull request:

https://github.com/apache/nifi/pull/510

NIFI-1984: Ensure that locks are always cleaned up by NaiveRevisionManager

Ensure that if an Exception is thrown by the 'Deletion Task' when calling 
NaiveRevisionManager.deleteRevision() that the locking is appropriately cleaned 
up.

Looking through the code, there do not appear to be any other places where 
we invoke callbacks without handling appropriate with a try/finally or 
try/catch block.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/markap14/nifi NIFI-1984

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/510.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #510


commit 1d46b5431bf2eca0298d2f2fdc9854ef3f9fedfa
Author: Mark Payne 
Date:   2016-06-08T12:57:37Z

NIFI-1984: Ensure that if an Exception is thrown by the 'Deletion Task' 
when calling NaiveRevisionManager.deleteRevision() that the locking is 
appropriately cleaned up




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: NIFI ListenTCP Processor

2016-06-08 Thread Bryan Bende
Venkat,

You are correct that right now the incoming message delimiter is hard-coded
as "/n".

The property you see in the UI for message delimiter is actually for the
output of the processor. It is used when you increase the batch size > 1
and it writes multiple messages to a single flow file, it uses the value of
that property to separate them.

We definitely want to expose the incoming delimiter as something that can
be set through a property on the processor. I think one reason it hasn't
been done yet is because there are several strategies we'd like to support:

- exact match - this would be your case where you specify "$"
- pattern match - this would be reading until a pattern is seen to help
capture multi-line log messages that start with date patterns
- size match - this would be reading until a specified number of bytes have
been read

That being said, we shouldn't need to do all of these at once so I created
this JIRA for your scenario:

https://issues.apache.org/jira/browse/NIFI-1985

-Bryan



On Wed, Jun 8, 2016 at 2:25 AM, Venkatesh Nandigam <
venkat.nandi...@bridgera.com> wrote:

> Hi Team,
>
> We started using NIFI data flow in our current project by replacing node js
> tcp listeners. we are using nifi listenTCP processor.
>
> our use case: we have some devices they will send message to tcp port, from
> nifi we have to get message and then place data into kafka.
>
> Processors we are using:
>
> ListenTCP processor: to receive data from port and send to kafka topic
>
> PutKafka: place data into kafka topic.
>
> Problem we are facing:
>
> from device side we have message delimiter as "$" but nifi listenTCP
> processor is accepting only "/n". we changed the delimeter in nifi admin
> its not reflecting.. by looking at nifi code we saw TCP_DELIMETER field as
> final. then we changed that class with our delimiter its working fine..
>
> Path:
>
> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/listen/handler/socket/SocketChannelHandler.java
>
> Is this expected behavior? or we missing something?.. if it is excepted
> behavior is they any chance to change that modifiable field?.
>
> Thanks,
> Venkat
>


[GitHub] nifi pull request #509: NIFI-1982: Use Compressed check box value.

2016-06-08 Thread ijokarumawak
GitHub user ijokarumawak opened a pull request:

https://github.com/apache/nifi/pull/509

NIFI-1982: Use Compressed check box value.

- The Compressed check box UI input was not used
- This commit enables Site-to-Site compression configuration from UI

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijokarumawak/nifi nifi-1982

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/509.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #509


commit 70bd97a2ca564b14d16353af55647373eb259253
Author: Koji Kawamura 
Date:   2016-06-08T11:54:40Z

NIFI-1982: Use Compressed check box value.

- The Compressed check box UI input was not used
- This commit enables Site-to-Site compression configuration from UI




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---



Re: Nifi behind AWS ELB

2016-06-08 Thread Edgardo Vega
Mark,

I reread your reply again and realized i missed this statement

"These do not send the attributes, though, so you would need to precede this
with a MergeContent with Merge Format of "FlowFile Stream, v3".
Then, on the receiving side, you could use UnpackContent to unpack these
FlowFile Packages back
into their 'native' form."

I will try to shave some time off and try this out.

Also created a ticket for the improvment to the PostHttp processor (
https://issues.apache.org/jira/browse/NIFI-1983)

Cheers,

Edgardo

On Tue, Jun 7, 2016 at 4:20 PM, Edgardo Vega  wrote:

> Well that blows. Should I create a jira ticket to disable two phased
> commit?
>
>
> On Tuesday, June 7, 2016, Mark Payne  wrote:
>
>> Edgardo,
>>
>> You'd run into a lot of problems trying to use that solution, as many
>> attributes contain
>> characters that are not valid in HTTP headers, and HTTP Headers are
>> delineated with
>> new-lines, so if you have an attribute with new-lines you'll get really
>> weird results.
>>
>> -Mark
>>
>>
>> > On Jun 7, 2016, at 3:52 PM, Edgardo Vega 
>> wrote:
>> >
>> > Mark,
>> >
>> > Amazon only supports sticky session via cookies.
>> >
>> > Disabling the two-phase commit would be really nice
>> >
>> > What if you do a invokehttp with send all the attributes as Http headers
>> > and on the receive side on listenhttp do a .* to turn all the headers
>> back
>> > into attribute? Would that work?
>> >
>> > Cheers,
>> >
>> > Edgardo
>> >
>> > On Tue, Jun 7, 2016 at 3:19 PM, Mark Payne 
>> wrote:
>> >
>> >> The idea behind the DELETE mechanism is that in some environments there
>> >> were timeouts
>> >> that would occur quite frequently between PostHTTP / ListenHTTP and
>> this
>> >> resulted in quite
>> >> a lot of data duplication. By adding in the two-phase commit, we were
>> able
>> >> to drastically reduce
>> >> the amount of data duplication, as a timeout anywhere in the first
>> >> (typically MUCH longer) phase
>> >> would result in the data on the receiving side being dropped because
>> the
>> >> receiving side would
>> >> not delete the hold that it placed on the FlowFiles.
>> >>
>> >> It would be reasonable to add an option for PostHTTP so that it
>> requests
>> >> not to perform a two-phase
>> >> commit. Alternatively, you could use either PostHTTP with 'Send as
>> >> FlowFile' set to 'false' or you
>> >> could use InvokeHTTP. These do not send the attributes, though, so you
>> >> would need to precede this
>> >> with a MergeContent with Merge Format of "FlowFile Stream, v3".
>> >> Then, on the receiving side, you could use UnpackContent to unpack
>> these
>> >> FlowFile Packages back
>> >> into their 'native' form.
>> >>
>> >> Or, a simpler option, if Amazon's ELB supports it, is to configure the
>> ELB
>> >> such that HTTP Requests that
>> >> contain the same value for the "x-nifi-transaction-id" header will go
>> to
>> >> the same node. This
>> >> header was added specifically to allow for this functionality through
>> Load
>> >> Balancers,
>> >> but I don't know if ELB specifically supports this or not.
>> >>
>> >> Thanks
>> >> -Mark
>> >>
>> >>
>> >>> On Jun 7, 2016, at 2:16 PM, Aldrin Piri  wrote:
>> >>>
>> >>> InvokeHTTP may be the better option if the user is not interested in
>> >>> transmitting content _packaged as_ FlowFiles.  Someone with a bit more
>> >>> history than myself can provide some additional context if I have
>> strayed
>> >>> off the path, but PostHTTP and ListenHTTP were precursors to Site to
>> >> Site.
>> >>> While they can transmit arbitrary content, were created for this
>> >>> inter-instance communication to aid in the guaranteed delivery
>> semantics.
>> >>> The listed hold, in this case, is part of that transaction occurring
>> >> where
>> >>> a response is returned to acknowledge receipt via ListenHTTP [1] and
>> the
>> >>> ContentAcknowledgementServlet [2].
>> >>>
>> >>> [1]
>> >>>
>> >>
>> https://github.com/apache/nifi/blob/1bd2cf0d09a7111bcecffd0f473aa71c25a69845/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ListenHTTP.java
>> >>> [2]
>> >>>
>> >>
>> https://github.com/apache/nifi/blob/1bd2cf0d09a7111bcecffd0f473aa71c25a69845/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/servlets/ContentAcknowledgmentServlet.java
>> >>>
>> >>> On Tue, Jun 7, 2016 at 2:10 PM, Bryan Bende  wrote:
>> >>>
>>  Looks like PostHttp interprets the response, and based on a series of
>>  conditions can intentionally issue a delete.
>> 
>>  I can't fully understand what is happening, but the code is here:
>> 
>> 
>> >>
>> 

[GitHub] nifi issue #493: NIFI-1037 Created processor that handles HDFS' inotify even...

2016-06-08 Thread pvillard31
Github user pvillard31 commented on the issue:

https://github.com/apache/nifi/pull/493
  
I have played with it and it works great.
One remark: I am wondering if the 'HDFS_PATH_TO_WATCH' property should be 
improved:
- accept the expression language to handle time-stamped directory?
- accept regular expressions?
- accept comma-separated list of paths?
Since we are polling all HDFS events it could make sense to have the best 
filtering options


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi issue #497: NIFI-1857: HTTPS Site-to-Site

2016-06-08 Thread ijokarumawak
Github user ijokarumawak commented on the issue:

https://github.com/apache/nifi/pull/497
  
@markap14 Fixed the sending data issue. The reason was I didn't call 
ByteBuffer clear() and didn't closed PipedOutputStream properly. Sorry for the 
inconvenience. Please try it again.

Also, I fixed the Remote Process Group Port so that useCompression can be 
set from UI. It seems the UI configuration hasn't been used. I'll send another 
PR for 0.x branch for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


NIFI ListenTCP Processor

2016-06-08 Thread Venkatesh Nandigam
Hi Team,

We started using NIFI data flow in our current project by replacing node js
tcp listeners. we are using nifi listenTCP processor.

our use case: we have some devices they will send message to tcp port, from
nifi we have to get message and then place data into kafka.

Processors we are using:

ListenTCP processor: to receive data from port and send to kafka topic

PutKafka: place data into kafka topic.

Problem we are facing:

from device side we have message delimiter as "$" but nifi listenTCP
processor is accepting only "/n". we changed the delimeter in nifi admin
its not reflecting.. by looking at nifi code we saw TCP_DELIMETER field as
final. then we changed that class with our delimiter its working fine..

Path:
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-processor-utilities/src/main/java/org/apache/nifi/processor/util/listen/handler/socket/SocketChannelHandler.java

Is this expected behavior? or we missing something?.. if it is excepted
behavior is they any chance to change that modifiable field?.

Thanks,
Venkat