Re: Problem with PutS3Object processor

2016-02-19 Thread Joseph E. Gottman
I am using an SSLContextService for my credentials.


[Proteus Technologies Outsmart Logo]  
Joe Gottman
Senior Member of Technical Staff
jgott...@proteuseng.com
133 National Business Pkwy, Ste 150
Annapolis Junction, MD 20701
(Office) 301.377.7144
www.proteus-technologies.com
TheBlend.ProteusEng.com (Digital magazine)
This electronic message and any files transmitted with it contain information 
which may be privileged and/or proprietary. The information is intended for use 
solely by the intended recipient(s). If you are not the intended recipient, be 
aware that any disclosure, copying, distribution or use of this information is 
prohibited. If you have received this electronic message in error, please 
advise the sender by reply email or by telephone (443.539.3400) and delete the 
message.

From: Joe Skora 
Sent: Friday, February 19, 2016 2:56 PM
To: users@nifi.apache.org
Subject: Re: Problem with PutS3Object processor

Joseph,

I ran into this same problem last week when I forgot to provide credentials to 
an S3 compatible endpoint.  AWS S3 requires the date header when an 
authorization header is provided, so the underlying Amazon library provides it 
automatically if authorization is used by the processor, but if it thinks the 
request is anonymous it leaves off the date header.

Does your endpoint require authentication Amazon (or Amazon-like) Access Key 
and Secret Key credentials?  If not, can you try providing credentials and see 
if that helps?

Regards,
Joe Skora

On Fri, Feb 19, 2016 at 12:56 PM, Joseph E. Gottman 
> wrote:

I am trying to use a PutS3Object processor with the "Endpoint Override URL" 
pointing to a custom endpoint.  I keep failing with the error message "You must 
specify a date for this operation".  I am using NiFi version 0.4.1 with Java 8 
and Centos 6.7.  I suspect this might have something to do with bug #1025, but 
according to your notes it was fixed for version 0.4.0.




[Proteus Technologies Outsmart Logo]  
Joe Gottman
Senior Member of Technical Staff
jgott...@proteuseng.com
133 National Business Pkwy, Ste 150
Annapolis Junction, MD 20701
(Office) 301.377.7144
www.proteus-technologies.com
TheBlend.ProteusEng.com (Digital magazine)
This electronic message and any files transmitted with it contain information 
which may be privileged and/or proprietary. The information is intended for use 
solely by the intended recipient(s). If you are not the intended recipient, be 
aware that any disclosure, copying, distribution or use of this information is 
prohibited. If you have received this electronic message in error, please 
advise the sender by reply email or by telephone 
(443.539.3400) and delete the message.



Re: Maximum attribute size

2016-02-19 Thread Joe Percivall
Hello Lars,
You are correct that the WAL is different from swapping. 
Swapping is used when a single connection queue grows to be very large. A chunk 
of the FlowFiles are then swapped out of JVM memory and written to disk. Where 
they are stored until they are swapped back in for processing. The WAL is 
almost solely for persistence of information when an NiFi instance is stopped 
for some reason (ie. restarting or hardware failures).
I am currently working on finishing up a document which will explain these and 
many other concepts utilized by the underlying system. So look out for that in 
the relatively near future. Joe
- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joeperciv...@yahoo.com
 

On Wednesday, February 17, 2016 6:48 PM, Lars Francke 
 wrote:
 

 Thanks a lot for confirming my suspicions.
One last clarification: The WAL is different from the swapping concept, 
correct? I guess it's way faster to swap in a dedicated "dump" than replaying a 
WAL.
On Wed, Feb 17, 2016 at 7:53 PM, Joe Witt  wrote:

Lars,

You are right about the thought process.  We've never provided solid
guidance here but we should.  It is definitely the case that flow file
content is streamed to and from the underlying repository and the only
way to access it is through that API.  Thus well behaved extensions
and the framework itself can handle basically data as large as the
underlying repository has space for.  For the flow file attributes
though these are held in memory in a map with each flowfile object.
So it is important to avoid having vast (undefined) quantities of
attributes or attributes with really large (undefined) values.

There are things we can and should do to make even this relatively
transparent to the users and it is why actually we support swapping
flowfiles to disk when there are large queues because even those inmem
attributes can really add up.

Thanks
Joe

On Wed, Feb 17, 2016 at 11:06 AM, Lars Francke  wrote:
> Hi and sorry for all these questions.
>
> I know that FlowFile content is persisted to the content_repository and can
> handle reasonably large amounts of data. Is the same true for attributes?
>
> I download JSON files (up to 200kb I'd say) and I want to insert them as
> they are into a PostgreSQL JSONB column. I'd love to use the PutSQL
> processor for that but it requires parameters in attributes.
>
> I have a feeling that putting large objects in attributes is a bad idea?




  

Re: splitText output appears to be getting dropped

2016-02-19 Thread Matthew Clarke
Conrad,
 The mergeContent processor will bin files based upon the configuration
you have configured.  Since it is taking multiple files and creating one
output file from them, that output file cannot have multiple filenames.
MergeContent will use the filename of the first file in the bin as the
filename of the output file.  As far as the rest of the attributes go from
the numerous source files, the 'Attribute Strategy' property in
MergeContent determines how they are applied to the new output file.

Matt

On Fri, Feb 19, 2016 at 11:25 AM, Conrad Crampton <
conrad.cramp...@secdata.com> wrote:

> Hi,
> Perfect!
> I tried \n for linefeed – didn’t think of shift+enter!
>
> The reason I was updating filename early on in my flow file was just
> because I already had UpdateAttributes that was a handy place to do so. I
> can put it just before the PutFile though so no major issue, just wondered
> why this was happening and if it was be design (feature) or bug.
>
> Thanks
> Conrad
>
> From: Bryan Bende 
> Reply-To: "users@nifi.apache.org" 
> Date: Friday, 19 February 2016 at 16:16
> To: "users@nifi.apache.org" 
> Subject: Re: splitText output appears to be getting dropped
>
> Hello,
>
> MergeContent has properties for header, demarcator, and footer, and also
> has a strategy property which specifies whether these values come from a
> file or inline text.
>
> If you do inline text and specify a demarcator of a new line (shift +
> enter in the demarcator value) then binary concatenation will get you all
> of the lines merged together with new lines between them.
>
> As far as the file naming, can you just wait until after RouteContent to
> rename them? They just need be renamed before the PutFile, but it doesn't
> necessarily have to be before RouteOnContent.
>
> Let us know if that helps.
>
> Thanks,
>
> Bryan
>
>
> On Fri, Feb 19, 2016 at 11:01 AM, Conrad Crampton <
> conrad.cramp...@secdata.com> wrote:
>
>> Hi,
>> Sorry to piggy back on this thread, but I have pretty much the same issue
>> – I am splitting log files -> routeoncontent (various paths) two of these
>> paths (including unmatched), basically need to just get farmed off into a
>> directory just in case they are needed later.
>> These go into a MergeContent processor where I would like to merge into
>> one file – each flowfile content as a line in the file delimited by line
>> feed (as like the original file), whichever way I try this though doesn’t
>> quite do what I want. If I try BinaryConcatenation the file ends up as one
>> long line, if TAR each Flowfile is a separate file in a TAR (not
>> unsurprisingly). There doesn’t seem to be anyway of merging flow file
>> content into one file (that ideally has similar functions to be able to
>> compress, specify number of files etc.)
>>
>> Another related question to the answer below (really helped me out with
>> same issue), however if I rename the filename early on in my process flow,
>> it appears to be changed back to its original at MergeContent processor
>> time so I have to put another UpdateAttributes step in after the Merge to
>> rename the filename.
>> The flow is
>>
>> UpdateAttributes -> RouteOnContent -> UpdateAttribute -> MergeContent ->
>> PutFile
>>  ^   ^^ ^
>>  |  | ||
>> Filename changed samesame reverted
>>
>> If I put an extra UpdateAttribute before PutFile then fine. Logging at
>> each of the above points shows filename updated to ${uuid}-${filename}, but
>> at reverted is back at filename.
>>
>> Any suggestions on particularly the first question??
>>
>> Thanks
>> Conrad
>>
>>
>>
>> From: Jeff Lord 
>> Reply-To: "users@nifi.apache.org" 
>> Date: Friday, 19 February 2016 at 03:22
>> To: "users@nifi.apache.org" 
>> Subject: Re: splitText output appears to be getting dropped
>>
>> Matt,
>>
>> Thanks a bunch!
>> That did the trick.
>> Is there a better way to handle this out of curiosity? Than writing out a
>> single line into multiple files.
>> Each file contains a single string that will be used to build a url.
>>
>> -Jeff
>>
>> On Thu, Feb 18, 2016 at 6:00 PM, Matthew Clarke <
>> matt.clarke@gmail.com> wrote:
>>
>>> Jeff,
>>>   It appears you files are being dropped because your are
>>> auto-terminating the failure relationship on your putFile processor. When
>>> the splitText processor splits the file by lines every new file has the
>>> same filename as the original it came from. My guess is the first file is
>>> being worked to disk and all others are failing because a file of the same
>>> name already exists in target dir. Try adding an UpdateAttribute processor
>>> after the splitText to rename all the files. Easiest way is to append the
>>> files uuid to its filename.  I also do not recommend auto-terminating
>>> failure relationships except in rare cases.
>>>
>>> Matt
>>> On Feb 18, 2016 8:36 PM, "Jeff Lord" 

Problem with PutS3Object processor

2016-02-19 Thread Joseph E. Gottman
I am trying to use a PutS3Object processor with the "Endpoint Override URL" 
pointing to a custom endpoint.  I keep failing with the error message "You must 
specify a date for this operation".  I am using NiFi version 0.4.1 with Java 8 
and Centos 6.7.  I suspect this might have something to do with bug #1025, but 
according to your notes it was fixed for version 0.4.0.




[Proteus Technologies Outsmart Logo]  
Joe Gottman
Senior Member of Technical Staff
jgott...@proteuseng.com
133 National Business Pkwy, Ste 150
Annapolis Junction, MD 20701
(Office) 301.377.7144
www.proteus-technologies.com
TheBlend.ProteusEng.com (Digital magazine)
This electronic message and any files transmitted with it contain information 
which may be privileged and/or proprietary. The information is intended for use 
solely by the intended recipient(s). If you are not the intended recipient, be 
aware that any disclosure, copying, distribution or use of this information is 
prohibited. If you have received this electronic message in error, please 
advise the sender by reply email or by telephone (443.539.3400) and delete the 
message.


Re: splitText output appears to be getting dropped

2016-02-19 Thread Conrad Crampton
Hi,
Perfect!
I tried \n for linefeed – didn’t think of shift+enter!

The reason I was updating filename early on in my flow file was just because I 
already had UpdateAttributes that was a handy place to do so. I can put it just 
before the PutFile though so no major issue, just wondered why this was 
happening and if it was be design (feature) or bug.

Thanks
Conrad

From: Bryan Bende >
Reply-To: "users@nifi.apache.org" 
>
Date: Friday, 19 February 2016 at 16:16
To: "users@nifi.apache.org" 
>
Subject: Re: splitText output appears to be getting dropped

Hello,

MergeContent has properties for header, demarcator, and footer, and also has a 
strategy property which specifies whether these values come from a file or 
inline text.

If you do inline text and specify a demarcator of a new line (shift + enter in 
the demarcator value) then binary concatenation will get you all of the lines 
merged together with new lines between them.

As far as the file naming, can you just wait until after RouteContent to rename 
them? They just need be renamed before the PutFile, but it doesn't necessarily 
have to be before RouteOnContent.

Let us know if that helps.

Thanks,

Bryan


On Fri, Feb 19, 2016 at 11:01 AM, Conrad Crampton 
> wrote:
Hi,
Sorry to piggy back on this thread, but I have pretty much the same issue – I 
am splitting log files -> routeoncontent (various paths) two of these paths 
(including unmatched), basically need to just get farmed off into a directory 
just in case they are needed later.
These go into a MergeContent processor where I would like to merge into one 
file – each flowfile content as a line in the file delimited by line feed (as 
like the original file), whichever way I try this though doesn’t quite do what 
I want. If I try BinaryConcatenation the file ends up as one long line, if TAR 
each Flowfile is a separate file in a TAR (not unsurprisingly). There doesn’t 
seem to be anyway of merging flow file content into one file (that ideally has 
similar functions to be able to compress, specify number of files etc.)

Another related question to the answer below (really helped me out with same 
issue), however if I rename the filename early on in my process flow, it 
appears to be changed back to its original at MergeContent processor time so I 
have to put another UpdateAttributes step in after the Merge to rename the 
filename.
The flow is

UpdateAttributes -> RouteOnContent -> UpdateAttribute -> MergeContent -> PutFile
 ^   ^^ ^
 |  | ||
Filename changed samesame reverted

If I put an extra UpdateAttribute before PutFile then fine. Logging at each of 
the above points shows filename updated to ${uuid}-${filename}, but at reverted 
is back at filename.

Any suggestions on particularly the first question??

Thanks
Conrad



From: Jeff Lord >
Reply-To: "users@nifi.apache.org" 
>
Date: Friday, 19 February 2016 at 03:22
To: "users@nifi.apache.org" 
>
Subject: Re: splitText output appears to be getting dropped

Matt,

Thanks a bunch!
That did the trick.
Is there a better way to handle this out of curiosity? Than writing out a 
single line into multiple files.
Each file contains a single string that will be used to build a url.

-Jeff

On Thu, Feb 18, 2016 at 6:00 PM, Matthew Clarke 
> wrote:

Jeff,
  It appears you files are being dropped because your are auto-terminating 
the failure relationship on your putFile processor. When the splitText 
processor splits the file by lines every new file has the same filename as the 
original it came from. My guess is the first file is being worked to disk and 
all others are failing because a file of the same name already exists in target 
dir. Try adding an UpdateAttribute processor after the splitText to rename all 
the files. Easiest way is to append the files uuid to its filename.  I also do 
not recommend auto-terminating failure relationships except in rare cases.

Matt

On Feb 18, 2016 8:36 PM, "Jeff Lord" 
> wrote:
I have a pretty simple flow where I query for a list of ids using 
executeProcess and than pass that list along to splitText where I am trying to 
split on each line to than dynamically build a url further down the line using 
updateAttribute and so on.

executeProcess -> splitText -> putFile

For some reason I am only getting one file written with one line.
I would expect something more like 100 files each with one line.
Using 

Re: splitText output appears to be getting dropped

2016-02-19 Thread Bryan Bende
Hello,

MergeContent has properties for header, demarcator, and footer, and also
has a strategy property which specifies whether these values come from a
file or inline text.

If you do inline text and specify a demarcator of a new line (shift + enter
in the demarcator value) then binary concatenation will get you all of the
lines merged together with new lines between them.

As far as the file naming, can you just wait until after RouteContent to
rename them? They just need be renamed before the PutFile, but it doesn't
necessarily have to be before RouteOnContent.

Let us know if that helps.

Thanks,

Bryan


On Fri, Feb 19, 2016 at 11:01 AM, Conrad Crampton <
conrad.cramp...@secdata.com> wrote:

> Hi,
> Sorry to piggy back on this thread, but I have pretty much the same issue
> – I am splitting log files -> routeoncontent (various paths) two of these
> paths (including unmatched), basically need to just get farmed off into a
> directory just in case they are needed later.
> These go into a MergeContent processor where I would like to merge into
> one file – each flowfile content as a line in the file delimited by line
> feed (as like the original file), whichever way I try this though doesn’t
> quite do what I want. If I try BinaryConcatenation the file ends up as one
> long line, if TAR each Flowfile is a separate file in a TAR (not
> unsurprisingly). There doesn’t seem to be anyway of merging flow file
> content into one file (that ideally has similar functions to be able to
> compress, specify number of files etc.)
>
> Another related question to the answer below (really helped me out with
> same issue), however if I rename the filename early on in my process flow,
> it appears to be changed back to its original at MergeContent processor
> time so I have to put another UpdateAttributes step in after the Merge to
> rename the filename.
> The flow is
>
> UpdateAttributes -> RouteOnContent -> UpdateAttribute -> MergeContent ->
> PutFile
>  ^   ^ ^ ^
>  |   | | |
> Filename changed same same reverted
>
> If I put an extra UpdateAttribute before PutFile then fine. Logging at
> each of the above points shows filename updated to ${uuid}-${filename}, but
> at reverted is back at filename.
>
> Any suggestions on particularly the first question??
>
> Thanks
> Conrad
>
>
>
> From: Jeff Lord 
> Reply-To: "users@nifi.apache.org" 
> Date: Friday, 19 February 2016 at 03:22
> To: "users@nifi.apache.org" 
> Subject: Re: splitText output appears to be getting dropped
>
> Matt,
>
> Thanks a bunch!
> That did the trick.
> Is there a better way to handle this out of curiosity? Than writing out a
> single line into multiple files.
> Each file contains a single string that will be used to build a url.
>
> -Jeff
>
> On Thu, Feb 18, 2016 at 6:00 PM, Matthew Clarke  > wrote:
>
>> Jeff,
>>   It appears you files are being dropped because your are
>> auto-terminating the failure relationship on your putFile processor. When
>> the splitText processor splits the file by lines every new file has the
>> same filename as the original it came from. My guess is the first file is
>> being worked to disk and all others are failing because a file of the same
>> name already exists in target dir. Try adding an UpdateAttribute processor
>> after the splitText to rename all the files. Easiest way is to append the
>> files uuid to its filename.  I also do not recommend auto-terminating
>> failure relationships except in rare cases.
>>
>> Matt
>> On Feb 18, 2016 8:36 PM, "Jeff Lord"  wrote:
>>
>>> I have a pretty simple flow where I query for a list of ids using
>>> executeProcess and than pass that list along to splitText where I am trying
>>> to split on each line to than dynamically build a url further down the line
>>> using updateAttribute and so on.
>>>
>>> executeProcess -> splitText -> putFile
>>>
>>> For some reason I am only getting one file written with one line.
>>> I would expect something more like 100 files each with one line.
>>> Using the provenance reporter it appears that some of my items are being
>>> dropped.
>>>
>>> Time02/18/2016 17:13:46.145 PST
>>> Event DurationNo value set
>>> Lineage Duration00:00:12.187
>>> TypeDROP
>>> FlowFile Uuid7fa42367-490d-4b54-a32f-d062a885474a
>>> File Size14 bytes
>>> Component Id3b37a828-ba2c-4047-ba7a-578fd0684ce6
>>> Component NamePutFile
>>> Component TypePutFile
>>> DetailsAuto-Terminated by failure Relationship
>>>
>>> Any ideas on what I need to change here?
>>>
>>> Thanks in advance,
>>>
>>> Jeff
>>>
>>
>
>
> ***This email originated outside SecureData***
>
> Click here  to
> report this email as spam.
>
>
> SecureData, combating cyber threats
>
> --
>
> The information contained in this message or any of its attachments may be
> privileged and confidential 

GetMail processor

2016-02-19 Thread philippe.gibert
Hi,
I would like to know if a GetEmail processor is available somewhere or planned 
☺ . I have seen PutEmail but not the dual processor in the help

The goal is to automatically and regularly processes incoming mail  , 
transforms the content and index the transformed content with solr
phil
Best regards

De : Bryan Bende [mailto:bbe...@gmail.com]
Envoyé : jeudi 4 février 2016 15:08
À : users@nifi.apache.org
Objet : Re: Send parameters while using getmongo processor

Hi Sudeep,

From looking at the GetMongo processor, I don't think this can be done today. 
That processor is meant to be a source processor that extracts data from Mongo 
using a fixed query.
It seems to me like we would need a FetchMongo processor with a Query field 
that supported expression language, and you could set it to ContractNumber = 
${flow.flile.contract.number}
Then incoming FlowFiles would have "flow.file.contract.number" as an attribute 
and it would fetch documents matching that.

I don't know that much about MongoDB, but does that sound like what you need?

-Bryan


On Thu, Feb 4, 2016 at 8:00 AM, sudeep mishra 
> wrote:
Hi,

I have following schema of records in MongoDB.

{
"_id" : ObjectId("56b1958a1ebecc0724588c39"),
"ContractNumber" : "ABC87gdtr53",
"DocumentType" : "TestDoc",
"FlowNr" : 3,
"TimeStamp" : "03/02/2016 05:51:09:023"
}

How can I query for a particular contract by 'ContractNumber' using getmongo 
processor? I want to dynamically pass the value of ContractNumber and get back 
the results.


Thanks & Regards,

Sudeep


_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.