I committed the latest code changes. As far as the doc is concerned, that's going to take longer because a conversion to Forrest will need to be done.
Karl On Wed, May 9, 2018 at 10:21 AM Olivier Tavard < olivier.tav...@francelabs.com> wrote: > Hi, > > OK thank you for the explanation and for the contribution integration. I > did not know that the contribution was already part of the 2.10 release. > I submitted a patch englobing the first patch and the new code on the JIRA > issue : CONNECTORS-1500. It is a diff against the html extractor connector. > > The documentation is here : > https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector > < > https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector > > > If you want to integrate at least the user documentation on the official > MCF site, no problem. Without it, it will be hard for users to understand > the goal of this connector I think ! > > Best regards, > > Olivier TAVARD > > > > Le 5 mai 2018 à 14:02, Piergiorgio Lucidi <piergior...@apache.org> a > écrit : > > > > Hi, > > > > I have just updated the CHANGES.txt adding CONNECTORS-1500 included in > the > > 2.10 release with a mention to Olivier. > > > > Olivier, thank you so much for your contribution. > > > > We should find a good way to also create a test suite for this new > > connector. > > > > Cheers, > > PJ > > > > 2018-05-05 11:57 GMT+02:00 Karl Wright <daddy...@gmail.com>: > > > >> Hi Olivier, > >> > >> This was actually already committed. But it was renamed as the > >> html-extractor connector, not "datafari", which didn't mean anything to > me. > >> > >> Any changes you want to make should therefore be supplied as a diff > against > >> the html-extractor connector. > >> > >> Sorry for the confusion!! > >> > >> Karl > >> > >> > >> On Fri, May 4, 2018 at 4:28 PM Karl Wright <daddy...@gmail.com> wrote: > >> > >>> Yes, please do update the patch. I'm sorry I did not get to this; many > >>> other things intruded. I created the branch but did not apply the > >> original > >>> patch onto it, so please supply a whole new patch. > >>> > >>> Karl > >>> > >>> > >>> On Fri, May 4, 2018 at 11:28 AM Olivier Tavard < > >>> olivier.tav...@francelabs.com> wrote: > >>> > >>>> Hi, > >>>> > >>>> I wanted to know if the code remains interesting for the MCF > community. > >>>> I updated it since the initial release so please tell me if I need to > >>>> submit a new patch into the issue already created : > >>>> https://issues.apache.org/jira/projects/CONNECTORS/ > >> issues/CONNECTORS-1500 > >>>> < > >>>> https://issues.apache.org/jira/projects/CONNECTORS/ > >> issues/CONNECTORS-1500 > >>>>> > >>>> > >>>> Thanks, > >>>> Best regards, > >>>> > >>>> Olivier TAVARD > >>>> > >>>> > >>>>> Le 15 mars 2018 à 15:58, Karl Wright <daddy...@gmail.com> a écrit : > >>>>> > >>>>> Excellent!! > >>>>> > >>>>> Thank you again. I'll try to set up the branch this weekend. > >>>>> > >>>>> Karl > >>>>> > >>>>> > >>>>> On Thu, Mar 15, 2018 at 10:52 AM, Olivier Tavard < > >>>>> olivier.tav...@francelabs.com> wrote: > >>>>> > >>>>>> Hi Karl, > >>>>>> > >>>>>> Sure thing, I created a ticket : https://issues.apache.org/ > >>>>>> jira/projects/CONNECTORS/issues/CONNECTORS-1500 with the code in > >>>>>> attachment. > >>>>>> No specific libraries used, just JSOUP library that is already in > the > >>>> MCF > >>>>>> core project. > >>>>>> > >>>>>> Best regards, > >>>>>> > >>>>>> Olivier > >>>>>> > >>>>>> > >>>>>>> Le 15 mars 2018 à 11:51, Karl Wright <daddy...@gmail.com> a écrit > : > >>>>>>> > >>>>>>> Hi Oliver, > >>>>>>> > >>>>>>> Thank you very much for your contribution! > >>>>>>> > >>>>>>> To have a legal trail, I usually prefer the following approach -- > >>>>>>> > >>>>>>> (1) Create a ticket > >>>>>>> (2) Attach a diff to the ticket > >>>>>>> > >>>>>>> We'll then integrate the diff into a branch, and then finally into > >>>> trunk. > >>>>>>> > >>>>>>> Can you also let us know what kinds of dependent jars the > >> contribution > >>>>>>> has? We'd need to know about not only direct dependencies, but > also > >>>> any > >>>>>>> downstream dependencies that may be incompatible with the Apache > >>>> License. > >>>>>>> Usually we can figure this out but it saves time to know in advance > >> if > >>>>>>> there are LGPL dependencies (for instance). > >>>>>>> > >>>>>>> Karl > >>>>>>> > >>>>>>> > >>>>>>> On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard < > >>>>>>> olivier.tav...@francelabs.com> wrote: > >>>>>>> > >>>>>>>> Hello MCF community, > >>>>>>>> > >>>>>>>> I developed a transformation connector based on Jsoup. The goal of > >>>> this > >>>>>>>> code id to simply choose an encompassing tag in a HTML document > for > >>>> text > >>>>>>>> extracting. And inside this tag, this connector allows you to > >> remove > >>>>>>>> subparts that you do no want : all the tags corresponding to > >> declared > >>>>>> types > >>>>>>>> or specific attribute tag names for example. > >>>>>>>> I would like to know if it could interest you. The code is in > >> Apache > >>>> V2 > >>>>>>>> licence and I integrated it in our enterprise search solution > >>>>>> (Datafari). > >>>>>>>> This morning I integrated the code in a fork MCF project on > GitHub. > >>>>>>>> Obviously it needs some work including code refactoring, renaming > >>>>>> classes, > >>>>>>>> unit tests that I will be able to do if you are interested by the > >>>> code. > >>>>>>>> The code is here : https://github.com/otavard/manifoldcf/tree/ > >>>>>>>> htmlextractorconnector < > >>>> https://github.com/otavard/manifoldcf/commits/ > >>>>>>>> htmlextractorconnector> > >>>>>>>> And the documentation here : https://datafari.atlassian. > >>>>>>>> > >>>> > net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+ > >>>>>>>> connector <https://datafari.atlassian.net/wiki/spaces/DATAFARI/ > >>>>>>>> pages/237240321/HTML+Extractor+Transformation+connector> > >>>>>>>> > >>>>>>>> Best regards, > >>>>>>>> > >>>>>>>> Olivier TAVARD > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>> > >>>> > >> > > > > > > > > -- > > Piergiorgio Lucidi > > https://www.open4dev.com > >