Yes, please do update the patch. I'm sorry I did not get to this; many other things intruded. I created the branch but did not apply the original patch onto it, so please supply a whole new patch.
Karl On Fri, May 4, 2018 at 11:28 AM Olivier Tavard < [email protected]> wrote: > Hi, > > I wanted to know if the code remains interesting for the MCF community. > I updated it since the initial release so please tell me if I need to > submit a new patch into the issue already created : > https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1500 > <https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1500 > > > > Thanks, > Best regards, > > Olivier TAVARD > > > > Le 15 mars 2018 à 15:58, Karl Wright <[email protected]> a écrit : > > > > Excellent!! > > > > Thank you again. I'll try to set up the branch this weekend. > > > > Karl > > > > > > On Thu, Mar 15, 2018 at 10:52 AM, Olivier Tavard < > > [email protected]> wrote: > > > >> Hi Karl, > >> > >> Sure thing, I created a ticket : https://issues.apache.org/ > >> jira/projects/CONNECTORS/issues/CONNECTORS-1500 with the code in > >> attachment. > >> No specific libraries used, just JSOUP library that is already in the > MCF > >> core project. > >> > >> Best regards, > >> > >> Olivier > >> > >> > >>> Le 15 mars 2018 à 11:51, Karl Wright <[email protected]> a écrit : > >>> > >>> Hi Oliver, > >>> > >>> Thank you very much for your contribution! > >>> > >>> To have a legal trail, I usually prefer the following approach -- > >>> > >>> (1) Create a ticket > >>> (2) Attach a diff to the ticket > >>> > >>> We'll then integrate the diff into a branch, and then finally into > trunk. > >>> > >>> Can you also let us know what kinds of dependent jars the contribution > >>> has? We'd need to know about not only direct dependencies, but also > any > >>> downstream dependencies that may be incompatible with the Apache > License. > >>> Usually we can figure this out but it saves time to know in advance if > >>> there are LGPL dependencies (for instance). > >>> > >>> Karl > >>> > >>> > >>> On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard < > >>> [email protected]> wrote: > >>> > >>>> Hello MCF community, > >>>> > >>>> I developed a transformation connector based on Jsoup. The goal of > this > >>>> code id to simply choose an encompassing tag in a HTML document for > text > >>>> extracting. And inside this tag, this connector allows you to remove > >>>> subparts that you do no want : all the tags corresponding to declared > >> types > >>>> or specific attribute tag names for example. > >>>> I would like to know if it could interest you. The code is in Apache > V2 > >>>> licence and I integrated it in our enterprise search solution > >> (Datafari). > >>>> This morning I integrated the code in a fork MCF project on GitHub. > >>>> Obviously it needs some work including code refactoring, renaming > >> classes, > >>>> unit tests that I will be able to do if you are interested by the > code. > >>>> The code is here : https://github.com/otavard/manifoldcf/tree/ > >>>> htmlextractorconnector < > https://github.com/otavard/manifoldcf/commits/ > >>>> htmlextractorconnector> > >>>> And the documentation here : https://datafari.atlassian. > >>>> > net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+ > >>>> connector <https://datafari.atlassian.net/wiki/spaces/DATAFARI/ > >>>> pages/237240321/HTML+Extractor+Transformation+connector> > >>>> > >>>> Best regards, > >>>> > >>>> Olivier TAVARD > >>>> > >>>> > >>>> > >> > >> > >
