I committed the latest code changes.

As far as the doc is concerned, that's going to take longer because a
conversion to Forrest will need to be done.

Karl


On Wed, May 9, 2018 at 10:21 AM Olivier Tavard <
olivier.tav...@francelabs.com> wrote:

> Hi,
>
> OK thank you for the explanation and for the contribution integration. I
> did not know that the contribution was already part of the 2.10 release.
> I submitted a patch englobing the first patch and the new code on the JIRA
> issue : CONNECTORS-1500. It is a diff against the html extractor connector.
>
> The documentation is here :
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector
> <
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector
> >
> If you want to integrate at least the user documentation on the official
> MCF site, no problem. Without it, it will be hard for users to understand
> the goal of this connector I think !
>
> Best regards,
>
> Olivier TAVARD
>
>
> > Le 5 mai 2018 à 14:02, Piergiorgio Lucidi <piergior...@apache.org> a
> écrit :
> >
> > Hi,
> >
> > I have just updated the CHANGES.txt adding CONNECTORS-1500 included in
> the
> > 2.10 release with a mention to Olivier.
> >
> > Olivier, thank you so much for your contribution.
> >
> > We should find a good way to also create a test suite for this new
> > connector.
> >
> > Cheers,
> > PJ
> >
> > 2018-05-05 11:57 GMT+02:00 Karl Wright <daddy...@gmail.com>:
> >
> >> Hi Olivier,
> >>
> >> This was actually already committed.  But it was renamed as the
> >> html-extractor connector, not "datafari", which didn't mean anything to
> me.
> >>
> >> Any changes you want to make should therefore be supplied as a diff
> against
> >> the html-extractor connector.
> >>
> >> Sorry for the confusion!!
> >>
> >> Karl
> >>
> >>
> >> On Fri, May 4, 2018 at 4:28 PM Karl Wright <daddy...@gmail.com> wrote:
> >>
> >>> Yes, please do update the patch.  I'm sorry I did not get to this; many
> >>> other things intruded.  I created the branch but did not apply the
> >> original
> >>> patch onto it, so please supply a whole new patch.
> >>>
> >>> Karl
> >>>
> >>>
> >>> On Fri, May 4, 2018 at 11:28 AM Olivier Tavard <
> >>> olivier.tav...@francelabs.com> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I wanted to know if the code remains interesting for the MCF
> community.
> >>>> I updated it since the initial release so please tell me if I need to
> >>>> submit a new patch into the issue already created :
> >>>> https://issues.apache.org/jira/projects/CONNECTORS/
> >> issues/CONNECTORS-1500
> >>>> <
> >>>> https://issues.apache.org/jira/projects/CONNECTORS/
> >> issues/CONNECTORS-1500
> >>>>>
> >>>>
> >>>> Thanks,
> >>>> Best regards,
> >>>>
> >>>> Olivier TAVARD
> >>>>
> >>>>
> >>>>> Le 15 mars 2018 à 15:58, Karl Wright <daddy...@gmail.com> a écrit :
> >>>>>
> >>>>> Excellent!!
> >>>>>
> >>>>> Thank you again.  I'll try to set up the branch this weekend.
> >>>>>
> >>>>> Karl
> >>>>>
> >>>>>
> >>>>> On Thu, Mar 15, 2018 at 10:52 AM, Olivier Tavard <
> >>>>> olivier.tav...@francelabs.com> wrote:
> >>>>>
> >>>>>> Hi Karl,
> >>>>>>
> >>>>>> Sure thing, I created a ticket : https://issues.apache.org/
> >>>>>> jira/projects/CONNECTORS/issues/CONNECTORS-1500 with the code in
> >>>>>> attachment.
> >>>>>> No specific libraries used, just JSOUP library that is already in
> the
> >>>> MCF
> >>>>>> core project.
> >>>>>>
> >>>>>> Best regards,
> >>>>>>
> >>>>>> Olivier
> >>>>>>
> >>>>>>
> >>>>>>> Le 15 mars 2018 à 11:51, Karl Wright <daddy...@gmail.com> a écrit
> :
> >>>>>>>
> >>>>>>> Hi Oliver,
> >>>>>>>
> >>>>>>> Thank you very much for your contribution!
> >>>>>>>
> >>>>>>> To have a legal trail, I usually prefer the following approach --
> >>>>>>>
> >>>>>>> (1) Create a ticket
> >>>>>>> (2) Attach a diff to the ticket
> >>>>>>>
> >>>>>>> We'll then integrate the diff into a branch, and then finally into
> >>>> trunk.
> >>>>>>>
> >>>>>>> Can you also let us know what kinds of dependent jars the
> >> contribution
> >>>>>>> has?  We'd need to know about not only direct dependencies, but
> also
> >>>> any
> >>>>>>> downstream dependencies that may be incompatible with the Apache
> >>>> License.
> >>>>>>> Usually we can figure this out but it saves time to know in advance
> >> if
> >>>>>>> there are LGPL dependencies (for instance).
> >>>>>>>
> >>>>>>> Karl
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard <
> >>>>>>> olivier.tav...@francelabs.com> wrote:
> >>>>>>>
> >>>>>>>> Hello MCF community,
> >>>>>>>>
> >>>>>>>> I developed a transformation connector based on Jsoup. The goal of
> >>>> this
> >>>>>>>> code id to simply choose an encompassing tag in a HTML document
> for
> >>>> text
> >>>>>>>> extracting. And inside this tag, this connector allows you to
> >> remove
> >>>>>>>> subparts that you do no want : all the tags corresponding to
> >> declared
> >>>>>> types
> >>>>>>>> or specific attribute tag names for example.
> >>>>>>>> I would like to know if it could interest you. The code is in
> >> Apache
> >>>> V2
> >>>>>>>> licence  and I integrated it in our enterprise search solution
> >>>>>> (Datafari).
> >>>>>>>> This morning I integrated the code in a fork MCF project on
> GitHub.
> >>>>>>>> Obviously it needs some work including code refactoring, renaming
> >>>>>> classes,
> >>>>>>>> unit tests that I will be able to do if you are interested by the
> >>>> code.
> >>>>>>>> The code is here : https://github.com/otavard/manifoldcf/tree/
> >>>>>>>> htmlextractorconnector <
> >>>> https://github.com/otavard/manifoldcf/commits/
> >>>>>>>> htmlextractorconnector>
> >>>>>>>> And the documentation here : https://datafari.atlassian.
> >>>>>>>>
> >>>>
> net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+
> >>>>>>>> connector <https://datafari.atlassian.net/wiki/spaces/DATAFARI/
> >>>>>>>> pages/237240321/HTML+Extractor+Transformation+connector>
> >>>>>>>>
> >>>>>>>> Best regards,
> >>>>>>>>
> >>>>>>>> Olivier TAVARD
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >
> >
> >
> > --
> > Piergiorgio Lucidi
> > https://www.open4dev.com
>
>

Reply via email to