Re: ExtractTNEFAttachments: (was Re: ListenSMTP processor)
Bryan, Thanks for the message. I've seen that link previously. Challenge is Exchange admins have the ability to force the conversion of email into standard mime and suspect my email server does that. In any case, all sorted, I used an unused outlook license I had at home together with mailtrap.io and generated a number of TNEFs. Will be happy to replace the samples now. Having said that... general peer reviews are still welcome. ;-) Cheers On Tue, Aug 9, 2016 at 11:39 PM, Bryan Rosander <bryanrosan...@gmail.com> wrote: > Hi Andre, > > I found a superuser answer that seems like it might be helpful in forcing > Outlook to use TNEF. > > http://superuser.com/questions/613014/how-do-i-force-outlook-to-send-an- > email-message-to-have-a-winmail-dat-attachment#answer-638244 > > Hope that helps. > > Thanks, > Bryan > > On Tue, Aug 9, 2016 at 9:31 AM, Andre <andre-li...@fucs.org> wrote: > > > Hi Joe, > > > > I am aware of it, reason I called it out openly so if someone can try to > > assist. > > > > In the past, I have used created content within the junit or crafted it > > within my lab, but in the case of TNEF I could not find a way of creating > > the files (POI does not have this ability and outlook seems convinced to > > create proper HTML attachments). > > > > As consequence I reached to the files stored in here: > > > > https://svn.apache.org/repos/asf/poi/tags/REL_3_14_BETA1/test-data/hmef/ > > > > I also checked their NOTICE or LICENSE files but no references, nor are > the > > the samples mentioned within their maven profiles. Not ideal but so far > the > > best I could find so far. Worse thing comes, we restrict the test units > to > > an invalid content, at least we will know it fails safely. :-) > > > > Hope this helps to clarify. > > > > On Tue, Aug 9, 2016 at 11:19 PM, Joe Witt <joe.w...@gmail.com> wrote: > > > > > Andre > > > > > > We cannot copy source material even for testing unless we fully and > > > properly account for licensing and notice concerns. > > > > > > Thanks > > > Joe > > > > > > On Aug 9, 2016 8:25 AM, "Andre" <andre-li...@fucs.org> wrote: > > > > > > > All, > > > > > > > > PR817[1] introduces an winmail.dat extractor. > > > > > > > > Following people's feedback, I created a separate processor to handle > > the > > > > TNEF attachments. > > > > > > > > This means the typical deployment will look like: > > > > > > > > (ListenSMTP || GetPOP3) --> ExtractEmailAttachments --> > > RouteOnAttribute > > > > [filename=winmail.dat] --> ExtractTNEFAttachments > > > > > > > > > > > > Since I could not generate a TNEF (where are the winmail.dats when > you > > > need > > > > them?!?!) I ended up using the TNEFs available on POI's upstream test > > > > units. winmail.dat donations to improve the test unit coverage are > > > > welcome... > > > > > > > > Please test, once you confirm this is working I will be happy to > > create a > > > > processor to extract and parse TNEF body and mapi Attributes as well. > > > > > > > > Cheers > > > > > > > > [1]https://github.com/apache/nifi/pull/817/commits > > > > > > > > On Mon, Jul 25, 2016 at 1:57 AM, Toivo Adams <toivo.ad...@gmail.com> > > > > wrote: > > > > > > > > > I support Oleg opinion. > > > > > Do one thing and do it well. > > > > > > > > > > Thanks > > > > > Toivo > > > > > > > > > > > > > > > > > > > > -- > > > > > View this message in context: http://apache-nifi-developer- > > > > > list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html > > > > > Sent from the Apache NiFi Developer List mailing list archive at > > > > > Nabble.com. > > > > > > > > > > > > > > >
Re: ExtractTNEFAttachments: (was Re: ListenSMTP processor)
Definitely best course of action is to use our own originally created test data. This can at times be very difficult but perhaps what Bryan just pointed out helps. Alternatively, we can of course include test artifacts in our source repository but we must simply account for them in license and notice and they must of course be valid ASLv2 source dependencies. Those are things which come from the category-a list seen here: http://www.apache.org/legal/resolved.html#category-a In short, this is totally doable we just must be really good stewards of the L process. On Tue, Aug 9, 2016 at 9:39 AM, Bryan Rosander <bryanrosan...@gmail.com> wrote: > Hi Andre, > > I found a superuser answer that seems like it might be helpful in forcing > Outlook to use TNEF. > > http://superuser.com/questions/613014/how-do-i-force-outlook-to-send-an-email-message-to-have-a-winmail-dat-attachment#answer-638244 > > Hope that helps. > > Thanks, > Bryan > > On Tue, Aug 9, 2016 at 9:31 AM, Andre <andre-li...@fucs.org> wrote: > >> Hi Joe, >> >> I am aware of it, reason I called it out openly so if someone can try to >> assist. >> >> In the past, I have used created content within the junit or crafted it >> within my lab, but in the case of TNEF I could not find a way of creating >> the files (POI does not have this ability and outlook seems convinced to >> create proper HTML attachments). >> >> As consequence I reached to the files stored in here: >> >> https://svn.apache.org/repos/asf/poi/tags/REL_3_14_BETA1/test-data/hmef/ >> >> I also checked their NOTICE or LICENSE files but no references, nor are the >> the samples mentioned within their maven profiles. Not ideal but so far the >> best I could find so far. Worse thing comes, we restrict the test units to >> an invalid content, at least we will know it fails safely. :-) >> >> Hope this helps to clarify. >> >> On Tue, Aug 9, 2016 at 11:19 PM, Joe Witt <joe.w...@gmail.com> wrote: >> >> > Andre >> > >> > We cannot copy source material even for testing unless we fully and >> > properly account for licensing and notice concerns. >> > >> > Thanks >> > Joe >> > >> > On Aug 9, 2016 8:25 AM, "Andre" <andre-li...@fucs.org> wrote: >> > >> > > All, >> > > >> > > PR817[1] introduces an winmail.dat extractor. >> > > >> > > Following people's feedback, I created a separate processor to handle >> the >> > > TNEF attachments. >> > > >> > > This means the typical deployment will look like: >> > > >> > > (ListenSMTP || GetPOP3) --> ExtractEmailAttachments --> >> RouteOnAttribute >> > > [filename=winmail.dat] --> ExtractTNEFAttachments >> > > >> > > >> > > Since I could not generate a TNEF (where are the winmail.dats when you >> > need >> > > them?!?!) I ended up using the TNEFs available on POI's upstream test >> > > units. winmail.dat donations to improve the test unit coverage are >> > > welcome... >> > > >> > > Please test, once you confirm this is working I will be happy to >> create a >> > > processor to extract and parse TNEF body and mapi Attributes as well. >> > > >> > > Cheers >> > > >> > > [1]https://github.com/apache/nifi/pull/817/commits >> > > >> > > On Mon, Jul 25, 2016 at 1:57 AM, Toivo Adams <toivo.ad...@gmail.com> >> > > wrote: >> > > >> > > > I support Oleg opinion. >> > > > Do one thing and do it well. >> > > > >> > > > Thanks >> > > > Toivo >> > > > >> > > > >> > > > >> > > > -- >> > > > View this message in context: http://apache-nifi-developer- >> > > > list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html >> > > > Sent from the Apache NiFi Developer List mailing list archive at >> > > > Nabble.com. >> > > > >> > > >> > >>
Re: ExtractTNEFAttachments: (was Re: ListenSMTP processor)
Hi Andre, I found a superuser answer that seems like it might be helpful in forcing Outlook to use TNEF. http://superuser.com/questions/613014/how-do-i-force-outlook-to-send-an-email-message-to-have-a-winmail-dat-attachment#answer-638244 Hope that helps. Thanks, Bryan On Tue, Aug 9, 2016 at 9:31 AM, Andre <andre-li...@fucs.org> wrote: > Hi Joe, > > I am aware of it, reason I called it out openly so if someone can try to > assist. > > In the past, I have used created content within the junit or crafted it > within my lab, but in the case of TNEF I could not find a way of creating > the files (POI does not have this ability and outlook seems convinced to > create proper HTML attachments). > > As consequence I reached to the files stored in here: > > https://svn.apache.org/repos/asf/poi/tags/REL_3_14_BETA1/test-data/hmef/ > > I also checked their NOTICE or LICENSE files but no references, nor are the > the samples mentioned within their maven profiles. Not ideal but so far the > best I could find so far. Worse thing comes, we restrict the test units to > an invalid content, at least we will know it fails safely. :-) > > Hope this helps to clarify. > > On Tue, Aug 9, 2016 at 11:19 PM, Joe Witt <joe.w...@gmail.com> wrote: > > > Andre > > > > We cannot copy source material even for testing unless we fully and > > properly account for licensing and notice concerns. > > > > Thanks > > Joe > > > > On Aug 9, 2016 8:25 AM, "Andre" <andre-li...@fucs.org> wrote: > > > > > All, > > > > > > PR817[1] introduces an winmail.dat extractor. > > > > > > Following people's feedback, I created a separate processor to handle > the > > > TNEF attachments. > > > > > > This means the typical deployment will look like: > > > > > > (ListenSMTP || GetPOP3) --> ExtractEmailAttachments --> > RouteOnAttribute > > > [filename=winmail.dat] --> ExtractTNEFAttachments > > > > > > > > > Since I could not generate a TNEF (where are the winmail.dats when you > > need > > > them?!?!) I ended up using the TNEFs available on POI's upstream test > > > units. winmail.dat donations to improve the test unit coverage are > > > welcome... > > > > > > Please test, once you confirm this is working I will be happy to > create a > > > processor to extract and parse TNEF body and mapi Attributes as well. > > > > > > Cheers > > > > > > [1]https://github.com/apache/nifi/pull/817/commits > > > > > > On Mon, Jul 25, 2016 at 1:57 AM, Toivo Adams <toivo.ad...@gmail.com> > > > wrote: > > > > > > > I support Oleg opinion. > > > > Do one thing and do it well. > > > > > > > > Thanks > > > > Toivo > > > > > > > > > > > > > > > > -- > > > > View this message in context: http://apache-nifi-developer- > > > > list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html > > > > Sent from the Apache NiFi Developer List mailing list archive at > > > > Nabble.com. > > > > > > > > > >
Re: ExtractTNEFAttachments: (was Re: ListenSMTP processor)
Hi Joe, I am aware of it, reason I called it out openly so if someone can try to assist. In the past, I have used created content within the junit or crafted it within my lab, but in the case of TNEF I could not find a way of creating the files (POI does not have this ability and outlook seems convinced to create proper HTML attachments). As consequence I reached to the files stored in here: https://svn.apache.org/repos/asf/poi/tags/REL_3_14_BETA1/test-data/hmef/ I also checked their NOTICE or LICENSE files but no references, nor are the the samples mentioned within their maven profiles. Not ideal but so far the best I could find so far. Worse thing comes, we restrict the test units to an invalid content, at least we will know it fails safely. :-) Hope this helps to clarify. On Tue, Aug 9, 2016 at 11:19 PM, Joe Witt <joe.w...@gmail.com> wrote: > Andre > > We cannot copy source material even for testing unless we fully and > properly account for licensing and notice concerns. > > Thanks > Joe > > On Aug 9, 2016 8:25 AM, "Andre" <andre-li...@fucs.org> wrote: > > > All, > > > > PR817[1] introduces an winmail.dat extractor. > > > > Following people's feedback, I created a separate processor to handle the > > TNEF attachments. > > > > This means the typical deployment will look like: > > > > (ListenSMTP || GetPOP3) --> ExtractEmailAttachments --> RouteOnAttribute > > [filename=winmail.dat] --> ExtractTNEFAttachments > > > > > > Since I could not generate a TNEF (where are the winmail.dats when you > need > > them?!?!) I ended up using the TNEFs available on POI's upstream test > > units. winmail.dat donations to improve the test unit coverage are > > welcome... > > > > Please test, once you confirm this is working I will be happy to create a > > processor to extract and parse TNEF body and mapi Attributes as well. > > > > Cheers > > > > [1]https://github.com/apache/nifi/pull/817/commits > > > > On Mon, Jul 25, 2016 at 1:57 AM, Toivo Adams <toivo.ad...@gmail.com> > > wrote: > > > > > I support Oleg opinion. > > > Do one thing and do it well. > > > > > > Thanks > > > Toivo > > > > > > > > > > > > -- > > > View this message in context: http://apache-nifi-developer- > > > list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html > > > Sent from the Apache NiFi Developer List mailing list archive at > > > Nabble.com. > > > > > >
Re: ExtractTNEFAttachments: (was Re: ListenSMTP processor)
Andre We cannot copy source material even for testing unless we fully and properly account for licensing and notice concerns. Thanks Joe On Aug 9, 2016 8:25 AM, "Andre" <andre-li...@fucs.org> wrote: > All, > > PR817[1] introduces an winmail.dat extractor. > > Following people's feedback, I created a separate processor to handle the > TNEF attachments. > > This means the typical deployment will look like: > > (ListenSMTP || GetPOP3) --> ExtractEmailAttachments --> RouteOnAttribute > [filename=winmail.dat] --> ExtractTNEFAttachments > > > Since I could not generate a TNEF (where are the winmail.dats when you need > them?!?!) I ended up using the TNEFs available on POI's upstream test > units. winmail.dat donations to improve the test unit coverage are > welcome... > > Please test, once you confirm this is working I will be happy to create a > processor to extract and parse TNEF body and mapi Attributes as well. > > Cheers > > [1]https://github.com/apache/nifi/pull/817/commits > > On Mon, Jul 25, 2016 at 1:57 AM, Toivo Adams <toivo.ad...@gmail.com> > wrote: > > > I support Oleg opinion. > > Do one thing and do it well. > > > > Thanks > > Toivo > > > > > > > > -- > > View this message in context: http://apache-nifi-developer- > > list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html > > Sent from the Apache NiFi Developer List mailing list archive at > > Nabble.com. > > >
ExtractTNEFAttachments: (was Re: ListenSMTP processor)
All, PR817[1] introduces an winmail.dat extractor. Following people's feedback, I created a separate processor to handle the TNEF attachments. This means the typical deployment will look like: (ListenSMTP || GetPOP3) --> ExtractEmailAttachments --> RouteOnAttribute [filename=winmail.dat] --> ExtractTNEFAttachments Since I could not generate a TNEF (where are the winmail.dats when you need them?!?!) I ended up using the TNEFs available on POI's upstream test units. winmail.dat donations to improve the test unit coverage are welcome... Please test, once you confirm this is working I will be happy to create a processor to extract and parse TNEF body and mapi Attributes as well. Cheers [1]https://github.com/apache/nifi/pull/817/commits On Mon, Jul 25, 2016 at 1:57 AM, Toivo Adams <toivo.ad...@gmail.com> wrote: > I support Oleg opinion. > Do one thing and do it well. > > Thanks > Toivo > > > > -- > View this message in context: http://apache-nifi-developer- > list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html > Sent from the Apache NiFi Developer List mailing list archive at > Nabble.com. >
Re: ListenSMTP processor
I support Oleg opinion. Do one thing and do it well. Thanks Toivo -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12891.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Re: ListenSMTP processor
I have raised NIFI-2380 to track this improvement. While raising the ticket I was wondering: are you happy to give the use the option to chose if to extract the winmail.dat or not? I mean something like: - PROPERTY: "Extract Attachments within a TNEF (i.e. winmail.data): true / false If yes, then every time a decoding occur we test the name (or something better in case it is possible) and then extract it. An attachment created by a TNEF file would have an attribute email.attachment.tnefdecoded (or whatever name we decide) set to yes. If no, processing continues as it is today (i.e. purely based on Apache Commons MimeMessageParser). Another possible solution would be an additional processor but IMNSHO this would be overkill and counter productive. Ken to hear your thoughts On Sun, Jul 17, 2016 at 4:46 PM, Andre <andre-li...@fucs.org> wrote: > Dan, > > Ingesting Microsoft Journals seem like a great suggestion for a new > processor ( ParseExchangeJounal ?). > > Regarding TNEF: As far as I know, Apache Commons - Mail does not pase > "winmail.dat" > type attachments. As far as I understand the only ASL compatible > implementation of a TNEF extractor is Apache's POI and even that > implementation is not part of POI's main release. > > If TNEF support is required we will ether have to code from scratch or > perhaps use https://github.com/koodaamo/tnefparse together with > ExecuteScript (although since tnefparse is LGPL, this solution cannot be > packaged as part of NiFi). > > Cheers > > On Sun, Jul 17, 2016 at 10:53 AM, djmdata <danmarshal...@gmail.com> wrote: > >> What is the JIRA #? >> >> I have a production system that reads email from a custom SMTP listener >> and >> places the SMTP payload into Kafka. A Storm topology reads messages from >> Kafka and parses the emails (Java code using JavaMail API) into useful >> info >> (subject, text, attachments, body, etc...). >> >> I'm looking at plugging NiFi into this to replace the custom SMTP >> listener. >> If you had a processor that could act as a reliable (we can't lose emails) >> and performant SMTP listener alternative we would use it. >> >> Your "email parser processor" is an interesting idea - but beware of the >> mess you'll find in the wild with email. In our case, we try to parse >> Exchange (full of non-standard wonders like "TNEF" attachments") as well >> as >> email from virtually anywhere (GMail, Yahoo, Joe's email client...). If >> you >> can crack that you'll be on to something. We have even more complexity in >> that we read "Microsoft Journals" which wrap the standard SMTP layout in a >> Microsoft layer (you'll see this at large Exchange shops doing this kind >> of >> thing for use cases like compliance). >> >> >> >> -- >> View this message in context: >> http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12827.html >> Sent from the Apache NiFi Developer List mailing list archive at >> Nabble.com. >> > >
Re: ListenSMTP processor
Dan, Ingesting Microsoft Journals seem like a great suggestion for a new processor ( ParseExchangeJounal ?). Regarding TNEF: As far as I know, Apache Commons - Mail does not pase "winmail.dat" type attachments. As far as I understand the only ASL compatible implementation of a TNEF extractor is Apache's POI and even that implementation is not part of POI's main release. If TNEF support is required we will ether have to code from scratch or perhaps use https://github.com/koodaamo/tnefparse together with ExecuteScript (although since tnefparse is LGPL, this solution cannot be packaged as part of NiFi). Cheers On Sun, Jul 17, 2016 at 10:53 AM, djmdata <danmarshal...@gmail.com> wrote: > What is the JIRA #? > > I have a production system that reads email from a custom SMTP listener and > places the SMTP payload into Kafka. A Storm topology reads messages from > Kafka and parses the emails (Java code using JavaMail API) into useful info > (subject, text, attachments, body, etc...). > > I'm looking at plugging NiFi into this to replace the custom SMTP listener. > If you had a processor that could act as a reliable (we can't lose emails) > and performant SMTP listener alternative we would use it. > > Your "email parser processor" is an interesting idea - but beware of the > mess you'll find in the wild with email. In our case, we try to parse > Exchange (full of non-standard wonders like "TNEF" attachments") as well as > email from virtually anywhere (GMail, Yahoo, Joe's email client...). If you > can crack that you'll be on to something. We have even more complexity in > that we read "Microsoft Journals" which wrap the standard SMTP layout in a > Microsoft layer (you'll see this at large Exchange shops doing this kind of > thing for use cases like compliance). > > > > -- > View this message in context: > http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12827.html > Sent from the Apache NiFi Developer List mailing list archive at > Nabble.com. >
Re: ListenSMTP processor
What is the JIRA #? I have a production system that reads email from a custom SMTP listener and places the SMTP payload into Kafka. A Storm topology reads messages from Kafka and parses the emails (Java code using JavaMail API) into useful info (subject, text, attachments, body, etc...). I'm looking at plugging NiFi into this to replace the custom SMTP listener. If you had a processor that could act as a reliable (we can't lose emails) and performant SMTP listener alternative we would use it. Your "email parser processor" is an interesting idea - but beware of the mess you'll find in the wild with email. In our case, we try to parse Exchange (full of non-standard wonders like "TNEF" attachments") as well as email from virtually anywhere (GMail, Yahoo, Joe's email client...). If you can crack that you'll be on to something. We have even more complexity in that we read "Microsoft Journals" which wrap the standard SMTP layout in a Microsoft layer (you'll see this at large Exchange shops doing this kind of thing for use cases like compliance). -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/ListenSMTP-processor-tp10510p12827.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Re: ListenSMTP processor
Yes, exactly. The challenge would be what to do with the mime boundary header (include with content, or dump and rebuild with a merge). Simon > On 19 May 2016, at 12:45, Andre <andre-li...@fucs.org> wrote: > > Simon, > > Are you suggesting attributes similar to UnpackContent? > > If yes, seems like a great approach. > > Cheers > On 19 May 2016 14:50, "Simon Elliston Ball" <si...@simonellistonball.com> > wrote: > > Fantastic idea! > > Would SplitEmail not make sense to divide by the mime boundary? If you add > fragment indices in the way other Split processors do, it would be easy to > recombine an email after processing splits. To be honest, I'm not sure what > the use case for doing so would be, but it feels consistent with the Split, > Process, Merge pattern you see elsewhere in NiFi. > > Simon > >> On 19 May 2016, at 03:11, Joe Witt <joe.w...@gmail.com> wrote: >> >> Andre >> >> I like the idea. I'd suggest having 'ListenSMTP' go ahead and create >> a good set of FlowFile attributes for things like >> to/from/cc/subject/number of attachments/time/etc... that make sense >> for a given e-mail. The body of the flowfile would be the entire >> message which i believe would include the attachments themselves which >> is fair game. If you did need/want to split out the attachments in >> your flow then I'd say the 'ParseEmail' idea is good but perhaps call >> it 'SplitEmail' or 'ExtractEmailAttachment' or something like that. >> >> Thanks >> Joe >> >>> On Wed, May 18, 2016 at 7:43 PM, Andre F de Miranda <af...@fucs.org> > wrote: >>> All, >>> >>> I have been considering writing a "ListenSMTP" processor and was > wondering >>> *what is the best way of dealing with multiple attachments*. >>> >>> Looking in here >>> > https://mail-archives.apache.org/mod_mbox/nifi-users/201602.mbox/%3ccaljk9a5ulcitnfo0dlsvd5d-jkcsqm+rqjxuruzwgrdbqad...@mail.gmail.com%3E >>> >>> >>> I can read Joe suggesting not using attributes to store large volumes of >>> data, so far so good, however, as far as I understand a flowfile can only >>> contain one "content". >>> >>> Currently the way I envision this would be modular that taps into the >>> pattern set by ListenSyslog / ParseSyslog: >>> >>> ListenSMTP - A processor that only provides an SMTP interface >>> >>> ParseEmail - A processor that reads the flowfile holding the email body > and >>> split it into 1 or more flowfiles containing the attached mime objects. >>> >>> The advantage here is that people can use FetchFile or to create a > GetIMAP >>> processor to parse messages. >>> >>> Would anyone have a different view on how to achieve this? >>> >>> I thank you in advance
Re: ListenSMTP processor
Simon, Are you suggesting attributes similar to UnpackContent? If yes, seems like a great approach. Cheers On 19 May 2016 14:50, "Simon Elliston Ball" <si...@simonellistonball.com> wrote: Fantastic idea! Would SplitEmail not make sense to divide by the mime boundary? If you add fragment indices in the way other Split processors do, it would be easy to recombine an email after processing splits. To be honest, I'm not sure what the use case for doing so would be, but it feels consistent with the Split, Process, Merge pattern you see elsewhere in NiFi. Simon > On 19 May 2016, at 03:11, Joe Witt <joe.w...@gmail.com> wrote: > > Andre > > I like the idea. I'd suggest having 'ListenSMTP' go ahead and create > a good set of FlowFile attributes for things like > to/from/cc/subject/number of attachments/time/etc... that make sense > for a given e-mail. The body of the flowfile would be the entire > message which i believe would include the attachments themselves which > is fair game. If you did need/want to split out the attachments in > your flow then I'd say the 'ParseEmail' idea is good but perhaps call > it 'SplitEmail' or 'ExtractEmailAttachment' or something like that. > > Thanks > Joe > >> On Wed, May 18, 2016 at 7:43 PM, Andre F de Miranda <af...@fucs.org> wrote: >> All, >> >> I have been considering writing a "ListenSMTP" processor and was wondering >> *what is the best way of dealing with multiple attachments*. >> >> Looking in here >> https://mail-archives.apache.org/mod_mbox/nifi-users/201602.mbox/%3ccaljk9a5ulcitnfo0dlsvd5d-jkcsqm+rqjxuruzwgrdbqad...@mail.gmail.com%3E >> >> >> I can read Joe suggesting not using attributes to store large volumes of >> data, so far so good, however, as far as I understand a flowfile can only >> contain one "content". >> >> Currently the way I envision this would be modular that taps into the >> pattern set by ListenSyslog / ParseSyslog: >> >> ListenSMTP - A processor that only provides an SMTP interface >> >> ParseEmail - A processor that reads the flowfile holding the email body and >> split it into 1 or more flowfiles containing the attached mime objects. >> >> The advantage here is that people can use FetchFile or to create a GetIMAP >> processor to parse messages. >> >> Would anyone have a different view on how to achieve this? >> >> I thank you in advance
Re: ListenSMTP processor
Joe, That's exactly the idea. I envision to, from, cc, connecting host (src_ip of the last hop), subject, time and possibly an option to iterate over the headers, adding discretionary key value pairs for things like spamassassin scores, etc. I pkan to keep things simple so I don't intend to add things like SPF, DKIM, etc but keen to consider. Happy to call it ExtractMailAttachment. I considered this type of more explicit name previously but settled for parse just because syslog adopted parse as well(although ListenSyslog is also capable of parsing). Will raise a JIRA to track. Cheers On 19 May 2016 12:12, "Joe Witt" <joe.w...@gmail.com> wrote: > Andre > > I like the idea. I'd suggest having 'ListenSMTP' go ahead and create > a good set of FlowFile attributes for things like > to/from/cc/subject/number of attachments/time/etc... that make sense > for a given e-mail. The body of the flowfile would be the entire > message which i believe would include the attachments themselves which > is fair game. If you did need/want to split out the attachments in > your flow then I'd say the 'ParseEmail' idea is good but perhaps call > it 'SplitEmail' or 'ExtractEmailAttachment' or something like that. > > Thanks > Joe > > On Wed, May 18, 2016 at 7:43 PM, Andre F de Miranda <af...@fucs.org> > wrote: > > All, > > > > I have been considering writing a "ListenSMTP" processor and was > wondering > > *what is the best way of dealing with multiple attachments*. > > > > Looking in here > > > https://mail-archives.apache.org/mod_mbox/nifi-users/201602.mbox/%3ccaljk9a5ulcitnfo0dlsvd5d-jkcsqm+rqjxuruzwgrdbqad...@mail.gmail.com%3E > > > > > > I can read Joe suggesting not using attributes to store large volumes of > > data, so far so good, however, as far as I understand a flowfile can only > > contain one "content". > > > > Currently the way I envision this would be modular that taps into the > > pattern set by ListenSyslog / ParseSyslog: > > > > ListenSMTP - A processor that only provides an SMTP interface > > > > ParseEmail - A processor that reads the flowfile holding the email body > and > > split it into 1 or more flowfiles containing the attached mime objects. > > > > The advantage here is that people can use FetchFile or to create a > GetIMAP > > processor to parse messages. > > > > Would anyone have a different view on how to achieve this? > > > > I thank you in advance >
Re: ListenSMTP processor
Fantastic idea! Would SplitEmail not make sense to divide by the mime boundary? If you add fragment indices in the way other Split processors do, it would be easy to recombine an email after processing splits. To be honest, I'm not sure what the use case for doing so would be, but it feels consistent with the Split, Process, Merge pattern you see elsewhere in NiFi. Simon > On 19 May 2016, at 03:11, Joe Witt <joe.w...@gmail.com> wrote: > > Andre > > I like the idea. I'd suggest having 'ListenSMTP' go ahead and create > a good set of FlowFile attributes for things like > to/from/cc/subject/number of attachments/time/etc... that make sense > for a given e-mail. The body of the flowfile would be the entire > message which i believe would include the attachments themselves which > is fair game. If you did need/want to split out the attachments in > your flow then I'd say the 'ParseEmail' idea is good but perhaps call > it 'SplitEmail' or 'ExtractEmailAttachment' or something like that. > > Thanks > Joe > >> On Wed, May 18, 2016 at 7:43 PM, Andre F de Miranda <af...@fucs.org> wrote: >> All, >> >> I have been considering writing a "ListenSMTP" processor and was wondering >> *what is the best way of dealing with multiple attachments*. >> >> Looking in here >> https://mail-archives.apache.org/mod_mbox/nifi-users/201602.mbox/%3ccaljk9a5ulcitnfo0dlsvd5d-jkcsqm+rqjxuruzwgrdbqad...@mail.gmail.com%3E >> >> >> I can read Joe suggesting not using attributes to store large volumes of >> data, so far so good, however, as far as I understand a flowfile can only >> contain one "content". >> >> Currently the way I envision this would be modular that taps into the >> pattern set by ListenSyslog / ParseSyslog: >> >> ListenSMTP - A processor that only provides an SMTP interface >> >> ParseEmail - A processor that reads the flowfile holding the email body and >> split it into 1 or more flowfiles containing the attached mime objects. >> >> The advantage here is that people can use FetchFile or to create a GetIMAP >> processor to parse messages. >> >> Would anyone have a different view on how to achieve this? >> >> I thank you in advance
Re: ListenSMTP processor
Andre I like the idea. I'd suggest having 'ListenSMTP' go ahead and create a good set of FlowFile attributes for things like to/from/cc/subject/number of attachments/time/etc... that make sense for a given e-mail. The body of the flowfile would be the entire message which i believe would include the attachments themselves which is fair game. If you did need/want to split out the attachments in your flow then I'd say the 'ParseEmail' idea is good but perhaps call it 'SplitEmail' or 'ExtractEmailAttachment' or something like that. Thanks Joe On Wed, May 18, 2016 at 7:43 PM, Andre F de Miranda <af...@fucs.org> wrote: > All, > > I have been considering writing a "ListenSMTP" processor and was wondering > *what is the best way of dealing with multiple attachments*. > > Looking in here > https://mail-archives.apache.org/mod_mbox/nifi-users/201602.mbox/%3ccaljk9a5ulcitnfo0dlsvd5d-jkcsqm+rqjxuruzwgrdbqad...@mail.gmail.com%3E > > > I can read Joe suggesting not using attributes to store large volumes of > data, so far so good, however, as far as I understand a flowfile can only > contain one "content". > > Currently the way I envision this would be modular that taps into the > pattern set by ListenSyslog / ParseSyslog: > > ListenSMTP - A processor that only provides an SMTP interface > > ParseEmail - A processor that reads the flowfile holding the email body and > split it into 1 or more flowfiles containing the attached mime objects. > > The advantage here is that people can use FetchFile or to create a GetIMAP > processor to parse messages. > > Would anyone have a different view on how to achieve this? > > I thank you in advance
ListenSMTP processor
All, I have been considering writing a "ListenSMTP" processor and was wondering *what is the best way of dealing with multiple attachments*. Looking in here https://mail-archives.apache.org/mod_mbox/nifi-users/201602.mbox/%3ccaljk9a5ulcitnfo0dlsvd5d-jkcsqm+rqjxuruzwgrdbqad...@mail.gmail.com%3E I can read Joe suggesting not using attributes to store large volumes of data, so far so good, however, as far as I understand a flowfile can only contain one "content". Currently the way I envision this would be modular that taps into the pattern set by ListenSyslog / ParseSyslog: ListenSMTP - A processor that only provides an SMTP interface ParseEmail - A processor that reads the flowfile holding the email body and split it into 1 or more flowfiles containing the attached mime objects. The advantage here is that people can use FetchFile or to create a GetIMAP processor to parse messages. Would anyone have a different view on how to achieve this? I thank you in advance