Re: ExtractTNEFAttachments: (was Re: ListenSMTP processor)

2016-08-09 Thread Andre
Bryan,

Thanks for the message.

I've seen that link previously. Challenge is Exchange admins have the
ability to force the conversion of email into standard mime and suspect my
email server does that.

In any case, all sorted, I used an unused outlook license I had at home
together with mailtrap.io and generated a number of TNEFs. Will be happy to
replace the samples now.


Having said that... general peer reviews are still welcome. ;-)



Cheers

On Tue, Aug 9, 2016 at 11:39 PM, Bryan Rosander <bryanrosan...@gmail.com>
wrote:

> Hi Andre,
>
> I found a superuser answer that seems like it might be helpful in forcing
> Outlook to use TNEF.
>
> http://superuser.com/questions/613014/how-do-i-force-outlook-to-send-an-
> email-message-to-have-a-winmail-dat-attachment#answer-638244
>
> Hope that helps.
>
> Thanks,
> Bryan
>
> On Tue, Aug 9, 2016 at 9:31 AM, Andre <andre-li...@fucs.org> wrote:
>
> > Hi Joe,
> >
> > I am aware of it, reason I called it out openly so if someone can try to
> > assist.
> >
> > In the past, I have used created content within the junit or crafted it
> > within my lab, but in the case of TNEF I could not find a way of creating
> > the files (POI does not have this ability and outlook seems convinced to
> > create proper HTML attachments).
> >
> > As consequence I reached to the files stored in here:
> >
> > https://svn.apache.org/repos/asf/poi/tags/REL_3_14_BETA1/test-data/hmef/
> >
> > I also checked their NOTICE or LICENSE files but no references, nor are
> the
> > the samples mentioned within their maven profiles. Not ideal but so far
> the
> > best I could find so far. Worse thing comes, we restrict the test units
> to
> > an invalid content, at least we will know it fails safely. :-)
> >
> > Hope this helps to clarify.
> >
> > On Tue, Aug 9, 2016 at 11:19 PM, Joe Witt <joe.w...@gmail.com> wrote:
> >
> > > Andre
> > >
> > > We cannot copy source material even for testing unless we fully and
> > > properly account for licensing and notice concerns.
> > >
> > > Thanks
> > > Joe
> > >
> > > On Aug 9, 2016 8:25 AM, "Andre" <andre-li...@fucs.org> wrote:
> > >
> > > > All,
> > > >
> > > > PR817[1] introduces an winmail.dat extractor.
> > > >
> > > > Following people's feedback, I created a separate processor to handle
> > the
> > > > TNEF attachments.
> > > >
> > > > This means the typical deployment will look like:
> > > >
> > > > (ListenSMTP || GetPOP3) --> ExtractEmailAttachments -->
> > RouteOnAttribute
> > > > [filename=winmail.dat] --> ExtractTNEFAttachments
> > > >
> > > >
> > > > Since I could not generate a TNEF (where are the winmail.dats when
> you
> > > need
> > > > them?!?!) I ended up using the TNEFs available on POI's upstream test
> > > > units. winmail.dat donations to improve the test unit coverage are
> > > > welcome...
> > > >
> > > > Please test, once you confirm this is working I will be happy to
> > create a
> > > > processor to extract and parse TNEF body and mapi Attributes as well.
> > > >
> > > > Cheers
> > > >
> > > > [1]https://github.com/apache/nifi/pull/817/commits
> > > >
> > > > On Mon, Jul 25, 2016 at 1:57 AM, Toivo Adams <toivo.ad...@gmail.com>
> > > > wrote:
> > > >
> > > > > I support Oleg opinion.
> > > > > Do one thing and do it well.
> > > > >
> > > > > Thanks
> > > > > Toivo
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > View this message in context: http://apache-nifi-developer-
> > > > > list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html
> > > > > Sent from the Apache NiFi Developer List mailing list archive at
> > > > > Nabble.com.
> > > > >
> > > >
> > >
> >
>


Re: ExtractTNEFAttachments: (was Re: ListenSMTP processor)

2016-08-09 Thread Joe Witt
Definitely best course of action is to use our own originally created
test data.  This can at times be very difficult but perhaps what Bryan
just pointed out helps.

Alternatively, we can of course include test artifacts in our source
repository but we must simply account for them in license and notice
and they must of course be valid ASLv2 source dependencies.  Those are
things which come from the category-a list seen here:
http://www.apache.org/legal/resolved.html#category-a

In short, this is totally doable we just must be really good stewards
of the L process.

On Tue, Aug 9, 2016 at 9:39 AM, Bryan Rosander <bryanrosan...@gmail.com> wrote:
> Hi Andre,
>
> I found a superuser answer that seems like it might be helpful in forcing
> Outlook to use TNEF.
>
> http://superuser.com/questions/613014/how-do-i-force-outlook-to-send-an-email-message-to-have-a-winmail-dat-attachment#answer-638244
>
> Hope that helps.
>
> Thanks,
> Bryan
>
> On Tue, Aug 9, 2016 at 9:31 AM, Andre <andre-li...@fucs.org> wrote:
>
>> Hi Joe,
>>
>> I am aware of it, reason I called it out openly so if someone can try to
>> assist.
>>
>> In the past, I have used created content within the junit or crafted it
>> within my lab, but in the case of TNEF I could not find a way of creating
>> the files (POI does not have this ability and outlook seems convinced to
>> create proper HTML attachments).
>>
>> As consequence I reached to the files stored in here:
>>
>> https://svn.apache.org/repos/asf/poi/tags/REL_3_14_BETA1/test-data/hmef/
>>
>> I also checked their NOTICE or LICENSE files but no references, nor are the
>> the samples mentioned within their maven profiles. Not ideal but so far the
>> best I could find so far. Worse thing comes, we restrict the test units to
>> an invalid content, at least we will know it fails safely. :-)
>>
>> Hope this helps to clarify.
>>
>> On Tue, Aug 9, 2016 at 11:19 PM, Joe Witt <joe.w...@gmail.com> wrote:
>>
>> > Andre
>> >
>> > We cannot copy source material even for testing unless we fully and
>> > properly account for licensing and notice concerns.
>> >
>> > Thanks
>> > Joe
>> >
>> > On Aug 9, 2016 8:25 AM, "Andre" <andre-li...@fucs.org> wrote:
>> >
>> > > All,
>> > >
>> > > PR817[1] introduces an winmail.dat extractor.
>> > >
>> > > Following people's feedback, I created a separate processor to handle
>> the
>> > > TNEF attachments.
>> > >
>> > > This means the typical deployment will look like:
>> > >
>> > > (ListenSMTP || GetPOP3) --> ExtractEmailAttachments -->
>> RouteOnAttribute
>> > > [filename=winmail.dat] --> ExtractTNEFAttachments
>> > >
>> > >
>> > > Since I could not generate a TNEF (where are the winmail.dats when you
>> > need
>> > > them?!?!) I ended up using the TNEFs available on POI's upstream test
>> > > units. winmail.dat donations to improve the test unit coverage are
>> > > welcome...
>> > >
>> > > Please test, once you confirm this is working I will be happy to
>> create a
>> > > processor to extract and parse TNEF body and mapi Attributes as well.
>> > >
>> > > Cheers
>> > >
>> > > [1]https://github.com/apache/nifi/pull/817/commits
>> > >
>> > > On Mon, Jul 25, 2016 at 1:57 AM, Toivo Adams <toivo.ad...@gmail.com>
>> > > wrote:
>> > >
>> > > > I support Oleg opinion.
>> > > > Do one thing and do it well.
>> > > >
>> > > > Thanks
>> > > > Toivo
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > View this message in context: http://apache-nifi-developer-
>> > > > list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html
>> > > > Sent from the Apache NiFi Developer List mailing list archive at
>> > > > Nabble.com.
>> > > >
>> > >
>> >
>>


Re: ExtractTNEFAttachments: (was Re: ListenSMTP processor)

2016-08-09 Thread Bryan Rosander
Hi Andre,

I found a superuser answer that seems like it might be helpful in forcing
Outlook to use TNEF.

http://superuser.com/questions/613014/how-do-i-force-outlook-to-send-an-email-message-to-have-a-winmail-dat-attachment#answer-638244

Hope that helps.

Thanks,
Bryan

On Tue, Aug 9, 2016 at 9:31 AM, Andre <andre-li...@fucs.org> wrote:

> Hi Joe,
>
> I am aware of it, reason I called it out openly so if someone can try to
> assist.
>
> In the past, I have used created content within the junit or crafted it
> within my lab, but in the case of TNEF I could not find a way of creating
> the files (POI does not have this ability and outlook seems convinced to
> create proper HTML attachments).
>
> As consequence I reached to the files stored in here:
>
> https://svn.apache.org/repos/asf/poi/tags/REL_3_14_BETA1/test-data/hmef/
>
> I also checked their NOTICE or LICENSE files but no references, nor are the
> the samples mentioned within their maven profiles. Not ideal but so far the
> best I could find so far. Worse thing comes, we restrict the test units to
> an invalid content, at least we will know it fails safely. :-)
>
> Hope this helps to clarify.
>
> On Tue, Aug 9, 2016 at 11:19 PM, Joe Witt <joe.w...@gmail.com> wrote:
>
> > Andre
> >
> > We cannot copy source material even for testing unless we fully and
> > properly account for licensing and notice concerns.
> >
> > Thanks
> > Joe
> >
> > On Aug 9, 2016 8:25 AM, "Andre" <andre-li...@fucs.org> wrote:
> >
> > > All,
> > >
> > > PR817[1] introduces an winmail.dat extractor.
> > >
> > > Following people's feedback, I created a separate processor to handle
> the
> > > TNEF attachments.
> > >
> > > This means the typical deployment will look like:
> > >
> > > (ListenSMTP || GetPOP3) --> ExtractEmailAttachments -->
> RouteOnAttribute
> > > [filename=winmail.dat] --> ExtractTNEFAttachments
> > >
> > >
> > > Since I could not generate a TNEF (where are the winmail.dats when you
> > need
> > > them?!?!) I ended up using the TNEFs available on POI's upstream test
> > > units. winmail.dat donations to improve the test unit coverage are
> > > welcome...
> > >
> > > Please test, once you confirm this is working I will be happy to
> create a
> > > processor to extract and parse TNEF body and mapi Attributes as well.
> > >
> > > Cheers
> > >
> > > [1]https://github.com/apache/nifi/pull/817/commits
> > >
> > > On Mon, Jul 25, 2016 at 1:57 AM, Toivo Adams <toivo.ad...@gmail.com>
> > > wrote:
> > >
> > > > I support Oleg opinion.
> > > > Do one thing and do it well.
> > > >
> > > > Thanks
> > > > Toivo
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context: http://apache-nifi-developer-
> > > > list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html
> > > > Sent from the Apache NiFi Developer List mailing list archive at
> > > > Nabble.com.
> > > >
> > >
> >
>


Re: ExtractTNEFAttachments: (was Re: ListenSMTP processor)

2016-08-09 Thread Andre
Hi Joe,

I am aware of it, reason I called it out openly so if someone can try to
assist.

In the past, I have used created content within the junit or crafted it
within my lab, but in the case of TNEF I could not find a way of creating
the files (POI does not have this ability and outlook seems convinced to
create proper HTML attachments).

As consequence I reached to the files stored in here:

https://svn.apache.org/repos/asf/poi/tags/REL_3_14_BETA1/test-data/hmef/

I also checked their NOTICE or LICENSE files but no references, nor are the
the samples mentioned within their maven profiles. Not ideal but so far the
best I could find so far. Worse thing comes, we restrict the test units to
an invalid content, at least we will know it fails safely. :-)

Hope this helps to clarify.

On Tue, Aug 9, 2016 at 11:19 PM, Joe Witt <joe.w...@gmail.com> wrote:

> Andre
>
> We cannot copy source material even for testing unless we fully and
> properly account for licensing and notice concerns.
>
> Thanks
> Joe
>
> On Aug 9, 2016 8:25 AM, "Andre" <andre-li...@fucs.org> wrote:
>
> > All,
> >
> > PR817[1] introduces an winmail.dat extractor.
> >
> > Following people's feedback, I created a separate processor to handle the
> > TNEF attachments.
> >
> > This means the typical deployment will look like:
> >
> > (ListenSMTP || GetPOP3) --> ExtractEmailAttachments --> RouteOnAttribute
> > [filename=winmail.dat] --> ExtractTNEFAttachments
> >
> >
> > Since I could not generate a TNEF (where are the winmail.dats when you
> need
> > them?!?!) I ended up using the TNEFs available on POI's upstream test
> > units. winmail.dat donations to improve the test unit coverage are
> > welcome...
> >
> > Please test, once you confirm this is working I will be happy to create a
> > processor to extract and parse TNEF body and mapi Attributes as well.
> >
> > Cheers
> >
> > [1]https://github.com/apache/nifi/pull/817/commits
> >
> > On Mon, Jul 25, 2016 at 1:57 AM, Toivo Adams <toivo.ad...@gmail.com>
> > wrote:
> >
> > > I support Oleg opinion.
> > > Do one thing and do it well.
> > >
> > > Thanks
> > > Toivo
> > >
> > >
> > >
> > > --
> > > View this message in context: http://apache-nifi-developer-
> > > list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html
> > > Sent from the Apache NiFi Developer List mailing list archive at
> > > Nabble.com.
> > >
> >
>


Re: ExtractTNEFAttachments: (was Re: ListenSMTP processor)

2016-08-09 Thread Joe Witt
Andre

We cannot copy source material even for testing unless we fully and
properly account for licensing and notice concerns.

Thanks
Joe

On Aug 9, 2016 8:25 AM, "Andre" <andre-li...@fucs.org> wrote:

> All,
>
> PR817[1] introduces an winmail.dat extractor.
>
> Following people's feedback, I created a separate processor to handle the
> TNEF attachments.
>
> This means the typical deployment will look like:
>
> (ListenSMTP || GetPOP3) --> ExtractEmailAttachments --> RouteOnAttribute
> [filename=winmail.dat] --> ExtractTNEFAttachments
>
>
> Since I could not generate a TNEF (where are the winmail.dats when you need
> them?!?!) I ended up using the TNEFs available on POI's upstream test
> units. winmail.dat donations to improve the test unit coverage are
> welcome...
>
> Please test, once you confirm this is working I will be happy to create a
> processor to extract and parse TNEF body and mapi Attributes as well.
>
> Cheers
>
> [1]https://github.com/apache/nifi/pull/817/commits
>
> On Mon, Jul 25, 2016 at 1:57 AM, Toivo Adams <toivo.ad...@gmail.com>
> wrote:
>
> > I support Oleg opinion.
> > Do one thing and do it well.
> >
> > Thanks
> > Toivo
> >
> >
> >
> > --
> > View this message in context: http://apache-nifi-developer-
> > list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html
> > Sent from the Apache NiFi Developer List mailing list archive at
> > Nabble.com.
> >
>


ExtractTNEFAttachments: (was Re: ListenSMTP processor)

2016-08-09 Thread Andre
All,

PR817[1] introduces an winmail.dat extractor.

Following people's feedback, I created a separate processor to handle the
TNEF attachments.

This means the typical deployment will look like:

(ListenSMTP || GetPOP3) --> ExtractEmailAttachments --> RouteOnAttribute
[filename=winmail.dat] --> ExtractTNEFAttachments


Since I could not generate a TNEF (where are the winmail.dats when you need
them?!?!) I ended up using the TNEFs available on POI's upstream test
units. winmail.dat donations to improve the test unit coverage are
welcome...

Please test, once you confirm this is working I will be happy to create a
processor to extract and parse TNEF body and mapi Attributes as well.

Cheers

[1]https://github.com/apache/nifi/pull/817/commits

On Mon, Jul 25, 2016 at 1:57 AM, Toivo Adams <toivo.ad...@gmail.com> wrote:

> I support Oleg opinion.
> Do one thing and do it well.
>
> Thanks
> Toivo
>
>
>
> --
> View this message in context: http://apache-nifi-developer-
> list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>


Re: ListenSMTP processor

2016-07-24 Thread Toivo Adams
I support Oleg opinion.
Do one thing and do it well.

Thanks
Toivo



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: ListenSMTP processor

2016-07-24 Thread Andre
I have raised NIFI-2380 to track this improvement.

While raising the ticket I was wondering:

are you happy to give the use the option to chose if to extract the
winmail.dat or not?

I mean something like:

- PROPERTY: "Extract Attachments within a TNEF (i.e. winmail.data): true /
false

If yes, then every time a decoding occur we test the name (or something
better in case it is possible) and then extract it. An attachment created
by a TNEF file would have an attribute email.attachment.tnefdecoded (or
whatever name we decide) set to yes.

If no, processing continues as it is today (i.e. purely based on Apache
Commons MimeMessageParser).


Another possible solution would be an additional processor but IMNSHO this
would be overkill and counter productive.

Ken to hear your thoughts

On Sun, Jul 17, 2016 at 4:46 PM, Andre <andre-li...@fucs.org> wrote:

> Dan,
>
> Ingesting Microsoft Journals seem like a great suggestion for a new
> processor ( ParseExchangeJounal ?).
>
> Regarding TNEF: As far as I know, Apache Commons - Mail does not pase 
> "winmail.dat"
> type attachments. As far as I understand the only ASL compatible
> implementation of a TNEF extractor is Apache's POI and even that
> implementation is not part of POI's main release.
>
> If TNEF support is required we will ether have to code from scratch or
> perhaps use https://github.com/koodaamo/tnefparse together with
> ExecuteScript (although since tnefparse  is LGPL, this solution cannot be
> packaged as part of NiFi).
>
> Cheers
>
> On Sun, Jul 17, 2016 at 10:53 AM, djmdata <danmarshal...@gmail.com> wrote:
>
>> What is the JIRA #?
>>
>> I have a production system that reads email from a custom SMTP listener
>> and
>> places the SMTP payload into Kafka. A Storm topology reads messages from
>> Kafka and parses the emails (Java code using JavaMail API) into useful
>> info
>> (subject, text, attachments, body, etc...).
>>
>> I'm looking at plugging NiFi into this to replace the custom SMTP
>> listener.
>> If you had a processor that could act as a reliable (we can't lose emails)
>> and performant SMTP listener alternative we would use it.
>>
>> Your "email parser processor" is an interesting idea - but beware of the
>> mess you'll find in the wild with email. In our case, we try to parse
>> Exchange (full of non-standard wonders like "TNEF" attachments") as well
>> as
>> email from virtually anywhere (GMail, Yahoo, Joe's email client...). If
>> you
>> can crack that you'll be on to something. We have even more complexity in
>> that we read "Microsoft Journals" which wrap the standard SMTP layout in a
>> Microsoft layer (you'll see this at large Exchange shops doing this kind
>> of
>> thing for use cases like compliance).
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12827.html
>> Sent from the Apache NiFi Developer List mailing list archive at
>> Nabble.com.
>>
>
>


Re: ListenSMTP processor

2016-07-17 Thread Andre
Dan,

Ingesting Microsoft Journals seem like a great suggestion for a new
processor ( ParseExchangeJounal ?).

Regarding TNEF: As far as I know, Apache Commons - Mail does not pase
"winmail.dat"
type attachments. As far as I understand the only ASL compatible
implementation of a TNEF extractor is Apache's POI and even that
implementation is not part of POI's main release.

If TNEF support is required we will ether have to code from scratch or
perhaps use https://github.com/koodaamo/tnefparse together with
ExecuteScript (although since tnefparse  is LGPL, this solution cannot be
packaged as part of NiFi).

Cheers

On Sun, Jul 17, 2016 at 10:53 AM, djmdata <danmarshal...@gmail.com> wrote:

> What is the JIRA #?
>
> I have a production system that reads email from a custom SMTP listener and
> places the SMTP payload into Kafka. A Storm topology reads messages from
> Kafka and parses the emails (Java code using JavaMail API) into useful info
> (subject, text, attachments, body, etc...).
>
> I'm looking at plugging NiFi into this to replace the custom SMTP listener.
> If you had a processor that could act as a reliable (we can't lose emails)
> and performant SMTP listener alternative we would use it.
>
> Your "email parser processor" is an interesting idea - but beware of the
> mess you'll find in the wild with email. In our case, we try to parse
> Exchange (full of non-standard wonders like "TNEF" attachments") as well as
> email from virtually anywhere (GMail, Yahoo, Joe's email client...). If you
> can crack that you'll be on to something. We have even more complexity in
> that we read "Microsoft Journals" which wrap the standard SMTP layout in a
> Microsoft layer (you'll see this at large Exchange shops doing this kind of
> thing for use cases like compliance).
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12827.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>


Re: ListenSMTP processor

2016-07-16 Thread djmdata
What is the JIRA #?

I have a production system that reads email from a custom SMTP listener and
places the SMTP payload into Kafka. A Storm topology reads messages from
Kafka and parses the emails (Java code using JavaMail API) into useful info
(subject, text, attachments, body, etc...).

I'm looking at plugging NiFi into this to replace the custom SMTP listener.
If you had a processor that could act as a reliable (we can't lose emails)
and performant SMTP listener alternative we would use it.

Your "email parser processor" is an interesting idea - but beware of the
mess you'll find in the wild with email. In our case, we try to parse
Exchange (full of non-standard wonders like "TNEF" attachments") as well as
email from virtually anywhere (GMail, Yahoo, Joe's email client...). If you
can crack that you'll be on to something. We have even more complexity in
that we read "Microsoft Journals" which wrap the standard SMTP layout in a
Microsoft layer (you'll see this at large Exchange shops doing this kind of
thing for use cases like compliance).



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12827.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: ListenSMTP processor

2016-05-19 Thread Simon Elliston Ball
Yes, exactly. The challenge would be what to do with the mime boundary header 
(include with content, or dump and rebuild with a merge).

Simon

> On 19 May 2016, at 12:45, Andre <andre-li...@fucs.org> wrote:
> 
> Simon,
> 
> Are you suggesting attributes similar to UnpackContent?
> 
> If yes, seems like a great approach.
> 
> Cheers
> On 19 May 2016 14:50, "Simon Elliston Ball" <si...@simonellistonball.com>
> wrote:
> 
> Fantastic idea!
> 
> Would SplitEmail not make sense to divide by the mime boundary? If you add
> fragment indices in the way other Split processors do, it would be easy to
> recombine an email after processing splits. To be honest, I'm not sure what
> the use case for doing so would be, but it feels consistent with the Split,
> Process, Merge pattern you see elsewhere in NiFi.
> 
> Simon
> 
>> On 19 May 2016, at 03:11, Joe Witt <joe.w...@gmail.com> wrote:
>> 
>> Andre
>> 
>> I like the idea.  I'd suggest having 'ListenSMTP' go ahead and create
>> a good set of FlowFile attributes for things like
>> to/from/cc/subject/number of attachments/time/etc... that make sense
>> for a given e-mail.  The body of the flowfile would be the entire
>> message which i believe would include the attachments themselves which
>> is fair game.  If you did need/want to split out the attachments in
>> your flow then I'd say the 'ParseEmail' idea is good but perhaps call
>> it 'SplitEmail' or 'ExtractEmailAttachment' or something like that.
>> 
>> Thanks
>> Joe
>> 
>>> On Wed, May 18, 2016 at 7:43 PM, Andre F de Miranda <af...@fucs.org>
> wrote:
>>> All,
>>> 
>>> I have been considering writing a "ListenSMTP" processor and was
> wondering
>>> *what is the best way of dealing with multiple attachments*.
>>> 
>>> Looking in here
>>> 
> https://mail-archives.apache.org/mod_mbox/nifi-users/201602.mbox/%3ccaljk9a5ulcitnfo0dlsvd5d-jkcsqm+rqjxuruzwgrdbqad...@mail.gmail.com%3E
>>> 
>>> 
>>> I can read Joe suggesting not using attributes to store large volumes of
>>> data, so far so good, however, as far as I understand a flowfile can only
>>> contain one "content".
>>> 
>>> Currently the way I envision this would be modular that taps into the
>>> pattern set by ListenSyslog / ParseSyslog:
>>> 
>>> ListenSMTP - A processor that only provides an SMTP interface
>>> 
>>> ParseEmail - A processor that reads the flowfile holding the email body
> and
>>> split it into 1 or more flowfiles containing the attached mime objects.
>>> 
>>> The advantage here is that people can use FetchFile or to create a
> GetIMAP
>>> processor to parse messages.
>>> 
>>> Would anyone have a different view on how to achieve this?
>>> 
>>> I thank you in advance



Re: ListenSMTP processor

2016-05-19 Thread Andre
Simon,

Are you suggesting attributes similar to UnpackContent?

If yes, seems like a great approach.

Cheers
On 19 May 2016 14:50, "Simon Elliston Ball" <si...@simonellistonball.com>
wrote:

Fantastic idea!

Would SplitEmail not make sense to divide by the mime boundary? If you add
fragment indices in the way other Split processors do, it would be easy to
recombine an email after processing splits. To be honest, I'm not sure what
the use case for doing so would be, but it feels consistent with the Split,
Process, Merge pattern you see elsewhere in NiFi.

Simon

> On 19 May 2016, at 03:11, Joe Witt <joe.w...@gmail.com> wrote:
>
> Andre
>
> I like the idea.  I'd suggest having 'ListenSMTP' go ahead and create
> a good set of FlowFile attributes for things like
> to/from/cc/subject/number of attachments/time/etc... that make sense
> for a given e-mail.  The body of the flowfile would be the entire
> message which i believe would include the attachments themselves which
> is fair game.  If you did need/want to split out the attachments in
> your flow then I'd say the 'ParseEmail' idea is good but perhaps call
> it 'SplitEmail' or 'ExtractEmailAttachment' or something like that.
>
> Thanks
> Joe
>
>> On Wed, May 18, 2016 at 7:43 PM, Andre F de Miranda <af...@fucs.org>
wrote:
>> All,
>>
>> I have been considering writing a "ListenSMTP" processor and was
wondering
>> *what is the best way of dealing with multiple attachments*.
>>
>> Looking in here
>>
https://mail-archives.apache.org/mod_mbox/nifi-users/201602.mbox/%3ccaljk9a5ulcitnfo0dlsvd5d-jkcsqm+rqjxuruzwgrdbqad...@mail.gmail.com%3E
>>
>>
>> I can read Joe suggesting not using attributes to store large volumes of
>> data, so far so good, however, as far as I understand a flowfile can only
>> contain one "content".
>>
>> Currently the way I envision this would be modular that taps into the
>> pattern set by ListenSyslog / ParseSyslog:
>>
>> ListenSMTP - A processor that only provides an SMTP interface
>>
>> ParseEmail - A processor that reads the flowfile holding the email body
and
>> split it into 1 or more flowfiles containing the attached mime objects.
>>
>> The advantage here is that people can use FetchFile or to create a
GetIMAP
>> processor to parse messages.
>>
>> Would anyone have a different view on how to achieve this?
>>
>> I thank you in advance


Re: ListenSMTP processor

2016-05-19 Thread Andre
Joe,

That's exactly the idea.

I envision to, from, cc, connecting host (src_ip of the last hop), subject,
time and possibly an option to iterate over the headers,  adding
discretionary key value pairs for things like spamassassin scores, etc.

I pkan to keep things simple so I don't intend to add things like SPF,
DKIM, etc but keen to consider.

Happy to call it ExtractMailAttachment. I considered this type of more
explicit name previously but settled for parse just because syslog adopted
parse as well(although ListenSyslog is also capable of parsing).

Will raise a JIRA to track.

Cheers
On 19 May 2016 12:12, "Joe Witt" <joe.w...@gmail.com> wrote:

> Andre
>
> I like the idea.  I'd suggest having 'ListenSMTP' go ahead and create
> a good set of FlowFile attributes for things like
> to/from/cc/subject/number of attachments/time/etc... that make sense
> for a given e-mail.  The body of the flowfile would be the entire
> message which i believe would include the attachments themselves which
> is fair game.  If you did need/want to split out the attachments in
> your flow then I'd say the 'ParseEmail' idea is good but perhaps call
> it 'SplitEmail' or 'ExtractEmailAttachment' or something like that.
>
> Thanks
> Joe
>
> On Wed, May 18, 2016 at 7:43 PM, Andre F de Miranda <af...@fucs.org>
> wrote:
> > All,
> >
> > I have been considering writing a "ListenSMTP" processor and was
> wondering
> > *what is the best way of dealing with multiple attachments*.
> >
> > Looking in here
> >
> https://mail-archives.apache.org/mod_mbox/nifi-users/201602.mbox/%3ccaljk9a5ulcitnfo0dlsvd5d-jkcsqm+rqjxuruzwgrdbqad...@mail.gmail.com%3E
> >
> >
> > I can read Joe suggesting not using attributes to store large volumes of
> > data, so far so good, however, as far as I understand a flowfile can only
> > contain one "content".
> >
> > Currently the way I envision this would be modular that taps into the
> > pattern set by ListenSyslog / ParseSyslog:
> >
> > ListenSMTP - A processor that only provides an SMTP interface
> >
> > ParseEmail - A processor that reads the flowfile holding the email body
> and
> > split it into 1 or more flowfiles containing the attached mime objects.
> >
> > The advantage here is that people can use FetchFile or to create a
> GetIMAP
> > processor to parse messages.
> >
> > Would anyone have a different view on how to achieve this?
> >
> > I thank you in advance
>


Re: ListenSMTP processor

2016-05-18 Thread Simon Elliston Ball
Fantastic idea!

Would SplitEmail not make sense to divide by the mime boundary? If you add 
fragment indices in the way other Split processors do, it would be easy to 
recombine an email after processing splits. To be honest, I'm not sure what the 
use case for doing so would be, but it feels consistent with the Split, 
Process, Merge pattern you see elsewhere in NiFi.

Simon 

> On 19 May 2016, at 03:11, Joe Witt <joe.w...@gmail.com> wrote:
> 
> Andre
> 
> I like the idea.  I'd suggest having 'ListenSMTP' go ahead and create
> a good set of FlowFile attributes for things like
> to/from/cc/subject/number of attachments/time/etc... that make sense
> for a given e-mail.  The body of the flowfile would be the entire
> message which i believe would include the attachments themselves which
> is fair game.  If you did need/want to split out the attachments in
> your flow then I'd say the 'ParseEmail' idea is good but perhaps call
> it 'SplitEmail' or 'ExtractEmailAttachment' or something like that.
> 
> Thanks
> Joe
> 
>> On Wed, May 18, 2016 at 7:43 PM, Andre F de Miranda <af...@fucs.org> wrote:
>> All,
>> 
>> I have been considering writing a "ListenSMTP" processor and was wondering
>> *what is the best way of dealing with multiple attachments*.
>> 
>> Looking in here
>> https://mail-archives.apache.org/mod_mbox/nifi-users/201602.mbox/%3ccaljk9a5ulcitnfo0dlsvd5d-jkcsqm+rqjxuruzwgrdbqad...@mail.gmail.com%3E
>> 
>> 
>> I can read Joe suggesting not using attributes to store large volumes of
>> data, so far so good, however, as far as I understand a flowfile can only
>> contain one "content".
>> 
>> Currently the way I envision this would be modular that taps into the
>> pattern set by ListenSyslog / ParseSyslog:
>> 
>> ListenSMTP - A processor that only provides an SMTP interface
>> 
>> ParseEmail - A processor that reads the flowfile holding the email body and
>> split it into 1 or more flowfiles containing the attached mime objects.
>> 
>> The advantage here is that people can use FetchFile or to create a GetIMAP
>> processor to parse messages.
>> 
>> Would anyone have a different view on how to achieve this?
>> 
>> I thank you in advance


Re: ListenSMTP processor

2016-05-18 Thread Joe Witt
Andre

I like the idea.  I'd suggest having 'ListenSMTP' go ahead and create
a good set of FlowFile attributes for things like
to/from/cc/subject/number of attachments/time/etc... that make sense
for a given e-mail.  The body of the flowfile would be the entire
message which i believe would include the attachments themselves which
is fair game.  If you did need/want to split out the attachments in
your flow then I'd say the 'ParseEmail' idea is good but perhaps call
it 'SplitEmail' or 'ExtractEmailAttachment' or something like that.

Thanks
Joe

On Wed, May 18, 2016 at 7:43 PM, Andre F de Miranda <af...@fucs.org> wrote:
> All,
>
> I have been considering writing a "ListenSMTP" processor and was wondering
> *what is the best way of dealing with multiple attachments*.
>
> Looking in here
> https://mail-archives.apache.org/mod_mbox/nifi-users/201602.mbox/%3ccaljk9a5ulcitnfo0dlsvd5d-jkcsqm+rqjxuruzwgrdbqad...@mail.gmail.com%3E
>
>
> I can read Joe suggesting not using attributes to store large volumes of
> data, so far so good, however, as far as I understand a flowfile can only
> contain one "content".
>
> Currently the way I envision this would be modular that taps into the
> pattern set by ListenSyslog / ParseSyslog:
>
> ListenSMTP - A processor that only provides an SMTP interface
>
> ParseEmail - A processor that reads the flowfile holding the email body and
> split it into 1 or more flowfiles containing the attached mime objects.
>
> The advantage here is that people can use FetchFile or to create a GetIMAP
> processor to parse messages.
>
> Would anyone have a different view on how to achieve this?
>
> I thank you in advance


ListenSMTP processor

2016-05-18 Thread Andre F de Miranda
All,

I have been considering writing a "ListenSMTP" processor and was wondering
*what is the best way of dealing with multiple attachments*.

Looking in here
https://mail-archives.apache.org/mod_mbox/nifi-users/201602.mbox/%3ccaljk9a5ulcitnfo0dlsvd5d-jkcsqm+rqjxuruzwgrdbqad...@mail.gmail.com%3E


I can read Joe suggesting not using attributes to store large volumes of
data, so far so good, however, as far as I understand a flowfile can only
contain one "content".

Currently the way I envision this would be modular that taps into the
pattern set by ListenSyslog / ParseSyslog:

ListenSMTP - A processor that only provides an SMTP interface

ParseEmail - A processor that reads the flowfile holding the email body and
split it into 1 or more flowfiles containing the attached mime objects.

The advantage here is that people can use FetchFile or to create a GetIMAP
processor to parse messages.

Would anyone have a different view on how to achieve this?

I thank you in advance