Hi, I have just updated the CHANGES.txt adding CONNECTORS-1500 included in the 2.10 release with a mention to Olivier.
Olivier, thank you so much for your contribution. We should find a good way to also create a test suite for this new connector. Cheers, PJ 2018-05-05 11:57 GMT+02:00 Karl Wright <[email protected]>: > Hi Olivier, > > This was actually already committed. But it was renamed as the > html-extractor connector, not "datafari", which didn't mean anything to me. > > Any changes you want to make should therefore be supplied as a diff against > the html-extractor connector. > > Sorry for the confusion!! > > Karl > > > On Fri, May 4, 2018 at 4:28 PM Karl Wright <[email protected]> wrote: > > > Yes, please do update the patch. I'm sorry I did not get to this; many > > other things intruded. I created the branch but did not apply the > original > > patch onto it, so please supply a whole new patch. > > > > Karl > > > > > > On Fri, May 4, 2018 at 11:28 AM Olivier Tavard < > > [email protected]> wrote: > > > >> Hi, > >> > >> I wanted to know if the code remains interesting for the MCF community. > >> I updated it since the initial release so please tell me if I need to > >> submit a new patch into the issue already created : > >> https://issues.apache.org/jira/projects/CONNECTORS/ > issues/CONNECTORS-1500 > >> < > >> https://issues.apache.org/jira/projects/CONNECTORS/ > issues/CONNECTORS-1500 > >> > > >> > >> Thanks, > >> Best regards, > >> > >> Olivier TAVARD > >> > >> > >> > Le 15 mars 2018 à 15:58, Karl Wright <[email protected]> a écrit : > >> > > >> > Excellent!! > >> > > >> > Thank you again. I'll try to set up the branch this weekend. > >> > > >> > Karl > >> > > >> > > >> > On Thu, Mar 15, 2018 at 10:52 AM, Olivier Tavard < > >> > [email protected]> wrote: > >> > > >> >> Hi Karl, > >> >> > >> >> Sure thing, I created a ticket : https://issues.apache.org/ > >> >> jira/projects/CONNECTORS/issues/CONNECTORS-1500 with the code in > >> >> attachment. > >> >> No specific libraries used, just JSOUP library that is already in the > >> MCF > >> >> core project. > >> >> > >> >> Best regards, > >> >> > >> >> Olivier > >> >> > >> >> > >> >>> Le 15 mars 2018 à 11:51, Karl Wright <[email protected]> a écrit : > >> >>> > >> >>> Hi Oliver, > >> >>> > >> >>> Thank you very much for your contribution! > >> >>> > >> >>> To have a legal trail, I usually prefer the following approach -- > >> >>> > >> >>> (1) Create a ticket > >> >>> (2) Attach a diff to the ticket > >> >>> > >> >>> We'll then integrate the diff into a branch, and then finally into > >> trunk. > >> >>> > >> >>> Can you also let us know what kinds of dependent jars the > contribution > >> >>> has? We'd need to know about not only direct dependencies, but also > >> any > >> >>> downstream dependencies that may be incompatible with the Apache > >> License. > >> >>> Usually we can figure this out but it saves time to know in advance > if > >> >>> there are LGPL dependencies (for instance). > >> >>> > >> >>> Karl > >> >>> > >> >>> > >> >>> On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard < > >> >>> [email protected]> wrote: > >> >>> > >> >>>> Hello MCF community, > >> >>>> > >> >>>> I developed a transformation connector based on Jsoup. The goal of > >> this > >> >>>> code id to simply choose an encompassing tag in a HTML document for > >> text > >> >>>> extracting. And inside this tag, this connector allows you to > remove > >> >>>> subparts that you do no want : all the tags corresponding to > declared > >> >> types > >> >>>> or specific attribute tag names for example. > >> >>>> I would like to know if it could interest you. The code is in > Apache > >> V2 > >> >>>> licence and I integrated it in our enterprise search solution > >> >> (Datafari). > >> >>>> This morning I integrated the code in a fork MCF project on GitHub. > >> >>>> Obviously it needs some work including code refactoring, renaming > >> >> classes, > >> >>>> unit tests that I will be able to do if you are interested by the > >> code. > >> >>>> The code is here : https://github.com/otavard/manifoldcf/tree/ > >> >>>> htmlextractorconnector < > >> https://github.com/otavard/manifoldcf/commits/ > >> >>>> htmlextractorconnector> > >> >>>> And the documentation here : https://datafari.atlassian. > >> >>>> > >> net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+ > >> >>>> connector <https://datafari.atlassian.net/wiki/spaces/DATAFARI/ > >> >>>> pages/237240321/HTML+Extractor+Transformation+connector> > >> >>>> > >> >>>> Best regards, > >> >>>> > >> >>>> Olivier TAVARD > >> >>>> > >> >>>> > >> >>>> > >> >> > >> >> > >> > >> > -- Piergiorgio Lucidi https://www.open4dev.com
