[GitHub] nifi pull request: Updating RPM build to fix bootstrap dependencie...

2016-01-31 Thread jvwing
Github user jvwing commented on the pull request:

https://github.com/apache/nifi/pull/196#issuecomment-177621908
  
We no longer need this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1275: Add processor(s) support for Elastic...

2016-01-31 Thread mattyb149
Github user mattyb149 commented on the pull request:

https://github.com/apache/nifi/pull/180#issuecomment-177632224
  
@rpmiskin That's a good idea for the ES ControllerService and 
ReportingTask, I hadn't considered the ReportingTask so I didn't think there 
would be enough need for the ControllerService. I think a breaking change would 
be ok in the future if we have a migration path (moving the properties from the 
processor to the controller service). I'd like to get these out for 0.5.0 so 
the community can use them, then we can gather feedback and see if/when we want 
to go the ControllerService route.  What do you think?

Also I've updated this PR to incorporate your comment on logging, and added 
more unit tests to exercise the non-happy-path processing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1107 Multipart Uploads

2016-01-31 Thread jskora
Github user jskora commented on the pull request:

https://github.com/apache/nifi/pull/192#issuecomment-177583468
  
@joewitt 

I've been trying to get the multipart upload in for a while, my preference 
would be to incorporate the multipart as is and then update it for the 
framework's state once after that's been tested out in production.  Also, since 
the multipart upload state has to cooperate with AWS upload state, so it has 
pain points.

Thanks,
Joe


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1107 Multipart Uploads

2016-01-31 Thread jskora
Github user jskora commented on the pull request:

https://github.com/apache/nifi/pull/192#issuecomment-177570291
  
Just pushed commit that
- cleans up property description,
- validates that upload still exists in S3 before resuming from local 
state, and
- verifies local state was previously created before trying to delete it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1107 Multipart Uploads

2016-01-31 Thread joewitt
Github user joewitt commented on the pull request:

https://github.com/apache/nifi/pull/192#issuecomment-177574967
  
Joe

With nifi 259 in 050 it would be a good idea to build this against that
rather than the previous state approach.  The branch for 259 is under
active review.  Would you like to wait until next release and use it, use
it in current release, or stay with the previously common approach?

Thanks
Joe
On Jan 31, 2016 1:25 PM, "Joe Skora"  wrote:

> Just pushed commit that
>
>- cleans up property description,
>- validates that upload still exists in S3 before resuming from local
>state, and
>- verifies local state was previously created before trying to delete
>it.
>
> —
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-924:Nifi-Camel Integration

2016-01-31 Thread joewitt
Github user joewitt commented on the pull request:

https://github.com/apache/nifi/pull/197#issuecomment-177707951
  
Puspendu - thanks.  I provided an alternative patch which simply reduces 
the size of the test data and which resolves some IO handling issues which 
could also lead to test stability.  There is no reason for such large input 
tests anyway particularly given the intermediary byte and string copies being 
made.  I like your sliding expression matcher concept though.  I'd be 
interested to understand its function on non-uniform byte length character 
sets.  If it does or can work on such character sets we should see about using 
that instead of our current methods which require users to specify maximum 
buffer length.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-924:Nifi-Camel Integration

2016-01-31 Thread joewitt
Github user joewitt commented on the pull request:

https://github.com/apache/nifi/pull/197#issuecomment-177682971
  
we'd love a perm solution for the OOM issue seen in Travis-CI environment.  
But so far the cause of the issue isn't known.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1456 reduced tests to 10MB instead of 100M...

2016-01-31 Thread joewitt
GitHub user joewitt opened a pull request:

https://github.com/apache/nifi/pull/198

NIFI-1456 reduced tests to 10MB instead of 100MB datasets and resolve…

…d IO issues which impact test stability

Signed-off-by: joewitt 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/joewitt/incubator-nifi master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/198.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #198


commit 072486942140121571f5340220c22c60ebf378b5
Author: joewitt 
Date:   2016-02-01T02:00:38Z

NIFI-1456 reduced tests to 10MB instead of 100MB datasets and resolved IO 
issues which impact test stability

Signed-off-by: joewitt 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1456 reduced tests to 10MB instead of 100M...

2016-01-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/198


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: Nifi-Camel Integration

2016-01-31 Thread PuspenduBanerjee
Github user PuspenduBanerjee commented on the pull request:

https://github.com/apache/nifi/pull/186#issuecomment-177439597
  
Moved to : https://github.com/apache/nifi/pull/197


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-924:Nifi-Camel Integration

2016-01-31 Thread PuspenduBanerjee
Github user PuspenduBanerjee commented on the pull request:

https://github.com/apache/nifi/pull/197#issuecomment-177442040
  
@olegz Please review once you get a chance.
@joewitt please find NiFi Template attached  with PR at 
[CamelProcessorTestingTemplate.xml.zip](https://github.com/apache/nifi/files/81/CamelProcessorTestingTemplate.xml.zip)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Next Release of Apache Nifi

2016-01-31 Thread shweta
Hi All,

I wanted to know when is the next major or minor release of Nifi planned.

Thanks and Regards,
Shweta



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Next-Release-of-Apache-Nifi-tp6963.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


How to decompress the snappy compressed data

2016-01-31 Thread shweta
Hi All,

We have a requirement of reading a snappy compressed data from one of our
Kafka servers and then decompress it. 
The "compress content processor" in latest release does not support snappy.
Is there a way to do snappy decompression without writing any custom script
or processor in Nifi.

Thanks,
Shweta 




--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-decompress-the-snappy-compressed-data-tp6969.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


[GitHub] nifi pull request: NIFI-1107 Multipart Uploads

2016-01-31 Thread trkurc
Github user trkurc commented on a diff in the pull request:

https://github.com/apache/nifi/pull/192#discussion_r51360885
  
--- Diff: 
nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/PutS3Object.java
 ---
@@ -118,6 +210,75 @@ protected PropertyDescriptor 
getSupportedDynamicPropertyDescriptor(final String
 .build();
 }
 
+protected File getPersistenceFile() {
+return new File(PERSISTENCE_ROOT + getIdentifier());
+}
+
+protected synchronized MultipartState getLocalState(final String 
s3ObjectKey) throws IOException {
+// get local state if it exists
+MultipartState currState = null;
+final File persistenceFile = getPersistenceFile();
+if (persistenceFile.exists()) {
+try (final FileInputStream fis = new 
FileInputStream(persistenceFile)) {
+final Properties props = new Properties();
+props.load(fis);
+if (props.containsKey(s3ObjectKey)) {
+final String localSerialState = 
props.getProperty(s3ObjectKey);
+if (localSerialState != null) {
+currState = new MultipartState(localSerialState);
+getLogger().info("Local state for {} loaded with 
uploadId {} and {} partETags",
+new Object[]{s3ObjectKey, 
currState.getUploadId(), currState.getPartETags().size()});
+}
+}
+} catch (IOException ioe) {
+getLogger().warn("Failed to recover local state for {} due 
to {}. Assuming no local state and " +
+"restarting upload.", new Object[]{s3ObjectKey, 
ioe.getMessage()});
+}
+}
+return currState;
+}
+
+protected synchronized void persistLocalState(final String 
s3ObjectKey, final MultipartState currState) throws IOException {
+final String currStateStr = (currState == null) ? null : 
currState.toString();
+final File persistenceFile = getPersistenceFile();
+final File parentDir = persistenceFile.getParentFile();
+if (!parentDir.exists() && !parentDir.mkdirs()) {
+throw new IOException("Persistence directory (" + 
parentDir.getAbsolutePath() + ") does not exist and " +
+"could not be created.");
+}
+final Properties props = new Properties();
+if (persistenceFile.exists()) {
+try (final FileInputStream fis = new 
FileInputStream(persistenceFile)) {
+props.load(fis);
+}
+}
+if (currStateStr != null) {
+props.setProperty(s3ObjectKey, currStateStr);
+} else {
+props.remove(s3ObjectKey);
+}
+
+if (props.size() > 0) {
+try (final FileOutputStream fos = new 
FileOutputStream(persistenceFile)) {
+props.store(fos, null);
+} catch (IOException ioe) {
+getLogger().error("Could not store state {} due to {}.",
+new Object[]{persistenceFile.getAbsolutePath(), 
ioe.getMessage()});
+}
+} else {
+try {
+Files.delete(persistenceFile.toPath());
+} catch (IOException ioe) {
+getLogger().error("Could not remove state file {} due to 
{}.",
+new Object[]{persistenceFile.getAbsolutePath(), 
ioe.getMessage()});
--- End diff --

So, on a clean flow, just writing small files, I get these error messages, 
which aren't particularly helpful and show up on the processor bulletin, 
despite sending to success. Not ideal behavior

```
2016-01-31 08:59:38,680 ERROR [Timer-Driven Process Thread-4] 
o.a.nifi.processors.aws.s3.PutS3Object 
PutS3Object[id=9460c94b-72b4-4c39-89a6-2d96f2e1ae81] Could not remove state 
file 
C:\development\nifi\nifi-assembly\target\nifi-0.4.2-SNAPSHOT-bin\nifi-0.4.2-SNAPSHOT\conf\state\9460c94b-72b4-4c39-89a6-2d96f2e1ae81
 due to conf\state\9460c94b-72b4-4c39-89a6-2d96f2e1ae81.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1107 Multipart Uploads

2016-01-31 Thread trkurc
Github user trkurc commented on a diff in the pull request:

https://github.com/apache/nifi/pull/192#discussion_r51361382
  
--- Diff: 
nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/PutS3Object.java
 ---
@@ -89,9 +134,51 @@
 .defaultValue(StorageClass.Standard.name())
 .build();
 
+public static final PropertyDescriptor MULTIPART_THRESHOLD = new 
PropertyDescriptor.Builder()
+.name("Multipart Threshold")
+.description("Specifies the file size threshold for switch 
from the PutS3Object API to the " +
+"PutS3MultipartUpload API.  Flow files bigger than 
this limit will be sent using the stateful " +
+"multipart process.\n" +
+"The valid range is 50MB to 5GB.")
+.required(true)
+.defaultValue("5 GB")
+
.addValidator(StandardValidators.createDataSizeBoundsValidator(MIN_S3_PART_SIZE,
 MAX_S3_PUTOBJECT_SIZE))
+.build();
+
+public static final PropertyDescriptor MULTIPART_PART_SIZE = new 
PropertyDescriptor.Builder()
+.name("Multipart Part Size")
+.description("Specifies the part size for use when the 
PutS3Multipart Upload API is used.\n" +
+"Flow files will be broken into chunks of this size 
for the upload process, but the last part " +
+"sent can be smaller since it is not padded.\n" +
+"The valid range is 50MB to 5GB.")
+.required(true)
+.defaultValue("5 GB")
+
.addValidator(StandardValidators.createDataSizeBoundsValidator(MIN_S3_PART_SIZE,
 MAX_S3_PUTOBJECT_SIZE))
+.build();
+
+public static final PropertyDescriptor MULTIPART_S3_AGEOFF_INTERVAL = 
new PropertyDescriptor.Builder()
+.name("Multipart Upload AgeOff Interval")
+.description("Specifies the interval at which existing 
multipart uploads in AWS S3 will be evaluated " +
+"for ageoff.  Calls to onTrigger() will initiate the 
ageoff evaluation if this interval has been " +
--- End diff --

I think that that this should be "When processor is triggered" rather than 
"calls to onTrigger()" to prevent too much java'ism leaking out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1107 Multipart Uploads

2016-01-31 Thread trkurc
Github user trkurc commented on the pull request:

https://github.com/apache/nifi/pull/192#issuecomment-177515367
  
Did a pretty thorough test, testing weird combinations of parameters, 
stopping and clearing the queue. The only real issue I ran into was with a new 
nifi instance having that bulletin pop up when I'm only uploading files smaller 
than the limit, which seems like it should be an easy fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Provenance reporting

2016-01-31 Thread Aldrin Piri
Richard,

In terms of capturing and recording these items about are things that make
the State Management [1] coming up in the 0.5.0 release for interfacing
with a StateManager to record these items in a framework provided
mechanism.  This would be the preferred approach moving forward.

Until state management, the way this has typically been accomplished is via
the DistributedMapCache [2].  You can evaluate the approach in components
such as ListHDFS or GetHBase in which both items are making use of a very
similar semantic as provided by State Management.

[1] https://cwiki.apache.org/confluence/display/NIFI/State+Management
[2]
http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.distributed.cache.server.map.DistributedMapCacheServer/index.html

On Sun, Jan 31, 2016 at 2:30 AM, Richard Miskin 
wrote:

> Hi,
>
> Based on the changes in NIFI-1275 I’ve been looking at creating a
> ReportingTask to send provenance events to ElasticSearch.
>
> I can see that is is possible to get all events from a specific Id by
> using: getProvenanceRepository().getEvents(firstRecordId, maxRecords)
>
> Is there a standard mechanism for recording the last record id that my
> successfully task handled?
>
> It seems like a common requirement for anything that is going to use that
> getEvents() mechanism so it seems a shame to have to create something
> specific for my task.
>
> Cheers,
> Richard