[jira] [Commented] (CONNECTORS-1495) Brand new website

2018-05-09 Thread Piergiorgio Lucidi (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469951#comment-16469951
 ] 

Piergiorgio Lucidi commented on CONNECTORS-1495:


I have just uploaded a new screenshot related to the latest revision.

I moved the project logo outside the topbar and I have introduced the new 
horizontal banner of ASF on the right.

!Website - status - 20180510-2.png|width=705,height=417!

> Brand new website
> -
>
> Key: CONNECTORS-1495
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1495
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Site
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Piergiorgio Lucidi
>Assignee: Piergiorgio Lucidi
>Priority: Major
> Fix For: ManifoldCF next
>
> Attachments: ManifoldCF-FluidoSkin.png, PDF-Rendition-1.png, 
> PDF-Rendition-2.png, Website - status - 20180510-2.png, Website - status - 
> 20180510.png
>
>   Original Estimate: 480h
>  Remaining Estimate: 480h
>
> The community decided to work on a brand new website:
> [http://mail-archives.apache.org/mod_mbox/manifoldcf-dev/201712.mbox/%3CCAHVHQx8odjgXMw%3DnhmSeDt0pYOUd0j%2BtkmMNtFnCJvHFcZwyEg%40mail.gmail.com%3E]
> The proposed technology is Jekyll but we have also to decide the website 
> template to use.
> [~kamaci] suggested the [Apache CloudStack|https://cloudstack.apache.org/] 
> template.
> [~molgun] proposed this approach:
>  # Find a modern new static site generator like Jekyll [1]
>  # Create a template
>  # Start to use it in a specific path like 
> [https://manifoldcf.apache.org/*new*]
>  # Migrate our Forrest xml's to Markdown (we can automate this somehow)
>  # Start to serve our new site on root path
> [1] [https://jekyllrb.com/docs/home/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1495) Brand new website

2018-05-09 Thread Piergiorgio Lucidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piergiorgio Lucidi updated CONNECTORS-1495:
---
Attachment: Website - status - 20180510-2.png

> Brand new website
> -
>
> Key: CONNECTORS-1495
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1495
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Site
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Piergiorgio Lucidi
>Assignee: Piergiorgio Lucidi
>Priority: Major
> Fix For: ManifoldCF next
>
> Attachments: ManifoldCF-FluidoSkin.png, PDF-Rendition-1.png, 
> PDF-Rendition-2.png, Website - status - 20180510-2.png, Website - status - 
> 20180510.png
>
>   Original Estimate: 480h
>  Remaining Estimate: 480h
>
> The community decided to work on a brand new website:
> [http://mail-archives.apache.org/mod_mbox/manifoldcf-dev/201712.mbox/%3CCAHVHQx8odjgXMw%3DnhmSeDt0pYOUd0j%2BtkmMNtFnCJvHFcZwyEg%40mail.gmail.com%3E]
> The proposed technology is Jekyll but we have also to decide the website 
> template to use.
> [~kamaci] suggested the [Apache CloudStack|https://cloudstack.apache.org/] 
> template.
> [~molgun] proposed this approach:
>  # Find a modern new static site generator like Jekyll [1]
>  # Create a template
>  # Start to use it in a specific path like 
> [https://manifoldcf.apache.org/*new*]
>  # Migrate our Forrest xml's to Markdown (we can automate this somehow)
>  # Start to serve our new site on root path
> [1] [https://jekyllrb.com/docs/home/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: ManifoldCF-ant #634

2018-05-09 Thread Apache Jenkins Server
See 


Changes:

[kwright] CONNECTORS-1500: Update the connector with latest code from the 
contributor

--
[...truncated 659.44 KB...]
jar-agents:

compile-pull-agent:

jar-pull-agent:

compile-jetty-runner:

jar-jetty-runner:

compile-script-engine:

jar-script-engine:

lib:

process-lib-classpath:

jetty-lib-classpath:

setup-jetty-processes:

general-set-jetty-classpath:

setup-jetty-processes-proprietary:

preclean-hsqldb-processes:

scripts-common:

scripts-hsqldb:
 [copy] Copying 4 files to 


compile-core:

jar-core:

compile-ui-core:

jar-ui-core:

compile-agents:

jar-agents:

compile-pull-agent:

jar-pull-agent:

compile-jetty-runner:

jar-jetty-runner:

compile-script-engine:

jar-script-engine:

lib:

hsqldb-lib-classpath:

setup-hsqldb-processes:

general-set-hsqldb-classpath:

setup-hsqldb-processes-proprietary:

preclean-zookeeper-processes:

scripts-common:

scripts-zookeeper:
 [copy] Copying 4 files to 


compile-core:

jar-core:

compile-ui-core:

jar-ui-core:

compile-agents:

jar-agents:

compile-pull-agent:

jar-pull-agent:

compile-jetty-runner:

jar-jetty-runner:

compile-script-engine:

jar-script-engine:

lib:

zookeeper-lib-classpath:

setup-zookeeper-processes:

general-set-zookeeper-classpath:

setup-zookeeper-processes-proprietary:
 [copy] Copying 15 files to 

[mkdir] Created dir: 


multi-processes-file:

preclean-processes:
[mkdir] Created dir: 


scripts-common:
 [copy] Copying 1 file to 


scripts:
 [copy] Copying 4 files to 


compile-core:

jar-core:

compile-ui-core:

jar-ui-core:

compile-agents:

jar-agents:

compile-pull-agent:

jar-pull-agent:

compile-jetty-runner:

jar-jetty-runner:

compile-script-engine:

jar-script-engine:

lib:

process-lib-classpath:

database-lib-classpath:

setup-processes:

general-set-classpath:
[mkdir] Created dir: 


multi-process-file-example:

preclean-jetty-processes:

scripts-common:

scripts-jetty:
 [copy] Copying 4 files to 


compile-core:

jar-core:

compile-ui-core:

jar-ui-core:

compile-agents:

jar-agents:

compile-pull-agent:

jar-pull-agent:

compile-jetty-runner:

jar-jetty-runner:

compile-script-engine:

jar-script-engine:

lib:

process-lib-classpath:

jetty-lib-classpath:

setup-jetty-processes:

general-set-jetty-classpath:

preclean-hsqldb-processes:

scripts-common:

scripts-hsqldb:
 [copy] Copying 4 files to 


compile-core:

jar-core:

compile-ui-core:

jar-ui-core:

compile-agents:

jar-agents:

compile-pull-agent:

jar-pull-agent:

compile-jetty-runner:

jar-jetty-runner:

compile-script-engine:

jar-script-engine:

lib:

hsqldb-lib-classpath:

setup-hsqldb-processes:

general-set-hsqldb-classpath:
 [copy] Copying 13 files to 


BUILD FAILED
:261: The following 
error occurred while executing this line:
:1664: 
IOException in 

 - java.io.IOException:No space left on device

Total time: 22 seconds
Build step 'Invoke Ant' marked build as failure
[locks-and-latches] Releasing all the locks
[locks-and-latches] All the locks released
Archiving artifacts
Publishing Javadoc


Build failed in Jenkins: ManifoldCF-mvn #650

2018-05-09 Thread Apache Jenkins Server
See 


Changes:

[kwright] CONNECTORS-1500: Update the connector with latest code from the 
contributor

--
[...truncated 459.25 KB...]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] Copying 6 resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ 
mcf-searchblox-connector ---
[INFO] Compiling 6 source files to 

[INFO] 
[INFO] --- native2ascii-maven-plugin:1.0-beta-1:native2ascii 
(native2ascii-utf8) @ mcf-searchblox-connector ---
[INFO] Includes: [**/*.properties]
[INFO] Excludes: []
[INFO] Processing 

[INFO] Processing 

[INFO] 
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ 
mcf-searchblox-connector ---
[debug] execute contextualize
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 

[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ 
mcf-searchblox-connector ---
[INFO] Compiling 1 source file to 

[INFO] 
[INFO] --- maven-surefire-plugin:2.17:test (default-test) @ 
mcf-searchblox-connector ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.3.1:jar (default-jar) @ mcf-searchblox-connector 
---
[INFO] Building jar: 

[INFO] 
[INFO] --- maven-site-plugin:3.4:attach-descriptor (attach-descriptor) @ 
mcf-searchblox-connector ---
[INFO] 
[INFO] >>> maven-source-plugin:2.4:jar (attach-sources) @ 
mcf-searchblox-connector >>>
[INFO] 
[INFO] <<< maven-source-plugin:2.4:jar (attach-sources) @ 
mcf-searchblox-connector <<<
[INFO] 
[INFO] --- maven-source-plugin:2.4:jar (attach-sources) @ 
mcf-searchblox-connector ---
[INFO] Building jar: 

[INFO] 
[INFO] --- maven-assembly-plugin:2.4.1:single (make-assembly) @ 
mcf-searchblox-connector ---
[INFO] Building jar: 

[WARNING] Configuration options: 'appendAssemblyId' is set to false, and 
'classifier' is missing.
Instead of attaching the assembly file: 

 it will become the file for main project artifact.
NOTE: If multiple descriptors or descriptor-formats are provided for this 
project, the value of this file will be non-deterministic!
[WARNING] Replacing pre-existing project main-artifact file: 

with assembly file: 

[INFO] 
[INFO] --- maven-failsafe-plugin:2.18.1:integration-test (integration-test) @ 
mcf-searchblox-connector ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-failsafe-plugin:2.18.1:verify (verify) @ 
mcf-searchblox-connector ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ 
mcf-searchblox-connector ---
[INFO] Installing 

 to 
/home/jenkins/.m2/repository/org/apache/manifoldcf/mcf-searchblox-connector/2.11-SNAPSHOT/mcf-searchblox-connector-2.11-SNAPSHOT.jar
[INFO] Installing 
 
to 
/home/jenkins/.m2/repository/org/apache/manifoldcf/mcf-searchblox-connector/2.11-SNAPSHOT/mcf-searchblox-connector-2.11-SNAPSHOT.pom
[INFO] Installing 

 to 
/home/jenkins/.m2/repository/org/apache/manifoldcf/mcf-searchblox-connector/2.11-SNAPSHOT/mcf-searchblox-connector-2.11-SNAPSHOT-sources.jar
[INFO] 
[INFO] 
[INFO] 

[jira] [Comment Edited] (CONNECTORS-1495) Brand new website

2018-05-09 Thread Piergiorgio Lucidi (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469616#comment-16469616
 ] 

Piergiorgio Lucidi edited comment on CONNECTORS-1495 at 5/9/18 10:44 PM:
-

I worked this night on some nice improvements for our new website:
 * Removed the sidebar [^Website - status - 20180510.png]
 * Updated the maven-site-plugin dependency with the latest stable 3.7.1 that 
includes what we asked for fixing the multilingual feature
 * Added the PDF rendition for each language using the maven-pdf-plugin 
[^PDF-Rendition-1.png] | [^PDF-Rendition-2.png]
 * Downgraded the maven-fluido-skin to 1.6 for some compatibility issues with 
maven-pdf-plugin
 * Temporary fix of who.md template waiting for an hotfix from our friend 
[~sonosolobit] :)

The fix that I'm asking to implement in the converter is related to the 
markdown lists,  a list can be rendered only if before the first element we put 
an empty line.

The PDF rendition generates three different PDF files but it seems to have some 
issues with japanese and chinese language :(.

Probably I have to drop a new question to the Maven list, we could have a 
similar bug related to multilingual rendering.


was (Author: piergiorgioluc...@gmail.com):
I worked this night on some nice improvements for our new website:
 * Updated the maven-site-plugin dependency with the latest stable 3.7.1 that 
includes what we asked for fixing the multilingual feature
 * Added the PDF rendition for each language using the maven-pdf-plugin
 * Downgraded the maven-fluido-skin to 1.6 for some compatibility issues with 
maven-pdf-plugin
 * Temporary fix of who.md template waiting for an hotfix from our friend 
[~sonosolobit] :)

The fix that I'm asking to implement in the converter is related to the 
markdown lists,  a list can be rendered only if before the first element we put 
an empty line.

The PDF rendition generates three different PDF files but it seems to have some 
issues with japanese and chinese language :(.

Probably I have to drop a new question to the Maven list, we could have a 
similar bug related to multilingual rendering.

> Brand new website
> -
>
> Key: CONNECTORS-1495
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1495
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Site
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Piergiorgio Lucidi
>Assignee: Piergiorgio Lucidi
>Priority: Major
> Fix For: ManifoldCF next
>
> Attachments: ManifoldCF-FluidoSkin.png, PDF-Rendition-1.png, 
> PDF-Rendition-2.png, Website - status - 20180510.png
>
>   Original Estimate: 480h
>  Remaining Estimate: 480h
>
> The community decided to work on a brand new website:
> [http://mail-archives.apache.org/mod_mbox/manifoldcf-dev/201712.mbox/%3CCAHVHQx8odjgXMw%3DnhmSeDt0pYOUd0j%2BtkmMNtFnCJvHFcZwyEg%40mail.gmail.com%3E]
> The proposed technology is Jekyll but we have also to decide the website 
> template to use.
> [~kamaci] suggested the [Apache CloudStack|https://cloudstack.apache.org/] 
> template.
> [~molgun] proposed this approach:
>  # Find a modern new static site generator like Jekyll [1]
>  # Create a template
>  # Start to use it in a specific path like 
> [https://manifoldcf.apache.org/*new*]
>  # Migrate our Forrest xml's to Markdown (we can automate this somehow)
>  # Start to serve our new site on root path
> [1] [https://jekyllrb.com/docs/home/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1495) Brand new website

2018-05-09 Thread Piergiorgio Lucidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piergiorgio Lucidi updated CONNECTORS-1495:
---
Attachment: Website - status - 20180510.png
PDF-Rendition-1.png
PDF-Rendition-2.png

> Brand new website
> -
>
> Key: CONNECTORS-1495
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1495
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Site
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Piergiorgio Lucidi
>Assignee: Piergiorgio Lucidi
>Priority: Major
> Fix For: ManifoldCF next
>
> Attachments: ManifoldCF-FluidoSkin.png, PDF-Rendition-1.png, 
> PDF-Rendition-2.png, Website - status - 20180510.png
>
>   Original Estimate: 480h
>  Remaining Estimate: 480h
>
> The community decided to work on a brand new website:
> [http://mail-archives.apache.org/mod_mbox/manifoldcf-dev/201712.mbox/%3CCAHVHQx8odjgXMw%3DnhmSeDt0pYOUd0j%2BtkmMNtFnCJvHFcZwyEg%40mail.gmail.com%3E]
> The proposed technology is Jekyll but we have also to decide the website 
> template to use.
> [~kamaci] suggested the [Apache CloudStack|https://cloudstack.apache.org/] 
> template.
> [~molgun] proposed this approach:
>  # Find a modern new static site generator like Jekyll [1]
>  # Create a template
>  # Start to use it in a specific path like 
> [https://manifoldcf.apache.org/*new*]
>  # Migrate our Forrest xml's to Markdown (we can automate this somehow)
>  # Start to serve our new site on root path
> [1] [https://jekyllrb.com/docs/home/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1495) Brand new website

2018-05-09 Thread Piergiorgio Lucidi (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469616#comment-16469616
 ] 

Piergiorgio Lucidi commented on CONNECTORS-1495:


I worked this night on some nice improvements for our new website:
 * Updated the maven-site-plugin dependency with the latest stable 3.7.1 that 
includes what we asked for fixing the multilingual feature
 * Added the PDF rendition for each language using the maven-pdf-plugin
 * Downgraded the maven-fluido-skin to 1.6 for some compatibility issues with 
maven-pdf-plugin
 * Temporary fix of who.md template waiting for an hotfix from our friend 
[~sonosolobit] :)

The fix that I'm asking to implement in the converter is related to the 
markdown lists,  a list can be rendered only if before the first element we put 
an empty line.

The PDF rendition generates three different PDF files but it seems to have some 
issues with japanese and chinese language :(.

Probably I have to drop a new question to the Maven list, we could have a 
similar bug related to multilingual rendering.

> Brand new website
> -
>
> Key: CONNECTORS-1495
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1495
> Project: ManifoldCF
>  Issue Type: New Feature
>  Components: Site
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Piergiorgio Lucidi
>Assignee: Piergiorgio Lucidi
>Priority: Major
> Fix For: ManifoldCF next
>
> Attachments: ManifoldCF-FluidoSkin.png
>
>   Original Estimate: 480h
>  Remaining Estimate: 480h
>
> The community decided to work on a brand new website:
> [http://mail-archives.apache.org/mod_mbox/manifoldcf-dev/201712.mbox/%3CCAHVHQx8odjgXMw%3DnhmSeDt0pYOUd0j%2BtkmMNtFnCJvHFcZwyEg%40mail.gmail.com%3E]
> The proposed technology is Jekyll but we have also to decide the website 
> template to use.
> [~kamaci] suggested the [Apache CloudStack|https://cloudstack.apache.org/] 
> template.
> [~molgun] proposed this approach:
>  # Find a modern new static site generator like Jekyll [1]
>  # Create a template
>  # Start to use it in a specific path like 
> [https://manifoldcf.apache.org/*new*]
>  # Migrate our Forrest xml's to Markdown (we can automate this somehow)
>  # Start to serve our new site on root path
> [1] [https://jekyllrb.com/docs/home/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: MCF transformation connector contribution

2018-05-09 Thread Karl Wright
I committed the latest code changes.

As far as the doc is concerned, that's going to take longer because a
conversion to Forrest will need to be done.

Karl


On Wed, May 9, 2018 at 10:21 AM Olivier Tavard <
olivier.tav...@francelabs.com> wrote:

> Hi,
>
> OK thank you for the explanation and for the contribution integration. I
> did not know that the contribution was already part of the 2.10 release.
> I submitted a patch englobing the first patch and the new code on the JIRA
> issue : CONNECTORS-1500. It is a diff against the html extractor connector.
>
> The documentation is here :
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector
> <
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector
> >
> If you want to integrate at least the user documentation on the official
> MCF site, no problem. Without it, it will be hard for users to understand
> the goal of this connector I think !
>
> Best regards,
>
> Olivier TAVARD
>
>
> > Le 5 mai 2018 à 14:02, Piergiorgio Lucidi  a
> écrit :
> >
> > Hi,
> >
> > I have just updated the CHANGES.txt adding CONNECTORS-1500 included in
> the
> > 2.10 release with a mention to Olivier.
> >
> > Olivier, thank you so much for your contribution.
> >
> > We should find a good way to also create a test suite for this new
> > connector.
> >
> > Cheers,
> > PJ
> >
> > 2018-05-05 11:57 GMT+02:00 Karl Wright :
> >
> >> Hi Olivier,
> >>
> >> This was actually already committed.  But it was renamed as the
> >> html-extractor connector, not "datafari", which didn't mean anything to
> me.
> >>
> >> Any changes you want to make should therefore be supplied as a diff
> against
> >> the html-extractor connector.
> >>
> >> Sorry for the confusion!!
> >>
> >> Karl
> >>
> >>
> >> On Fri, May 4, 2018 at 4:28 PM Karl Wright  wrote:
> >>
> >>> Yes, please do update the patch.  I'm sorry I did not get to this; many
> >>> other things intruded.  I created the branch but did not apply the
> >> original
> >>> patch onto it, so please supply a whole new patch.
> >>>
> >>> Karl
> >>>
> >>>
> >>> On Fri, May 4, 2018 at 11:28 AM Olivier Tavard <
> >>> olivier.tav...@francelabs.com> wrote:
> >>>
>  Hi,
> 
>  I wanted to know if the code remains interesting for the MCF
> community.
>  I updated it since the initial release so please tell me if I need to
>  submit a new patch into the issue already created :
>  https://issues.apache.org/jira/projects/CONNECTORS/
> >> issues/CONNECTORS-1500
>  <
>  https://issues.apache.org/jira/projects/CONNECTORS/
> >> issues/CONNECTORS-1500
> >
> 
>  Thanks,
>  Best regards,
> 
>  Olivier TAVARD
> 
> 
> > Le 15 mars 2018 à 15:58, Karl Wright  a écrit :
> >
> > Excellent!!
> >
> > Thank you again.  I'll try to set up the branch this weekend.
> >
> > Karl
> >
> >
> > On Thu, Mar 15, 2018 at 10:52 AM, Olivier Tavard <
> > olivier.tav...@francelabs.com> wrote:
> >
> >> Hi Karl,
> >>
> >> Sure thing, I created a ticket : https://issues.apache.org/
> >> jira/projects/CONNECTORS/issues/CONNECTORS-1500 with the code in
> >> attachment.
> >> No specific libraries used, just JSOUP library that is already in
> the
>  MCF
> >> core project.
> >>
> >> Best regards,
> >>
> >> Olivier
> >>
> >>
> >>> Le 15 mars 2018 à 11:51, Karl Wright  a écrit
> :
> >>>
> >>> Hi Oliver,
> >>>
> >>> Thank you very much for your contribution!
> >>>
> >>> To have a legal trail, I usually prefer the following approach --
> >>>
> >>> (1) Create a ticket
> >>> (2) Attach a diff to the ticket
> >>>
> >>> We'll then integrate the diff into a branch, and then finally into
>  trunk.
> >>>
> >>> Can you also let us know what kinds of dependent jars the
> >> contribution
> >>> has?  We'd need to know about not only direct dependencies, but
> also
>  any
> >>> downstream dependencies that may be incompatible with the Apache
>  License.
> >>> Usually we can figure this out but it saves time to know in advance
> >> if
> >>> there are LGPL dependencies (for instance).
> >>>
> >>> Karl
> >>>
> >>>
> >>> On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard <
> >>> olivier.tav...@francelabs.com> wrote:
> >>>
>  Hello MCF community,
> 
>  I developed a transformation connector based on Jsoup. The goal of
>  this
>  code id to simply choose an encompassing tag in a HTML document
> for
>  text
>  extracting. And inside this tag, this connector allows you to
> >> remove
>  subparts that you do no want : all the tags corresponding to
> >> declared
> >> types
>  or 

Re: MCF transformation connector contribution

2018-05-09 Thread Olivier Tavard
Hi,

OK thank you for the explanation and for the contribution integration. I did 
not know that the contribution was already part of the 2.10 release.
I submitted a patch englobing the first patch and the new code on the JIRA 
issue : CONNECTORS-1500. It is a diff against the html extractor connector.

The documentation is here : 
https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector
 

If you want to integrate at least the user documentation on the official MCF 
site, no problem. Without it, it will be hard for users to understand the goal 
of this connector I think !

Best regards,

Olivier TAVARD


> Le 5 mai 2018 à 14:02, Piergiorgio Lucidi  a écrit :
> 
> Hi,
> 
> I have just updated the CHANGES.txt adding CONNECTORS-1500 included in the
> 2.10 release with a mention to Olivier.
> 
> Olivier, thank you so much for your contribution.
> 
> We should find a good way to also create a test suite for this new
> connector.
> 
> Cheers,
> PJ
> 
> 2018-05-05 11:57 GMT+02:00 Karl Wright :
> 
>> Hi Olivier,
>> 
>> This was actually already committed.  But it was renamed as the
>> html-extractor connector, not "datafari", which didn't mean anything to me.
>> 
>> Any changes you want to make should therefore be supplied as a diff against
>> the html-extractor connector.
>> 
>> Sorry for the confusion!!
>> 
>> Karl
>> 
>> 
>> On Fri, May 4, 2018 at 4:28 PM Karl Wright  wrote:
>> 
>>> Yes, please do update the patch.  I'm sorry I did not get to this; many
>>> other things intruded.  I created the branch but did not apply the
>> original
>>> patch onto it, so please supply a whole new patch.
>>> 
>>> Karl
>>> 
>>> 
>>> On Fri, May 4, 2018 at 11:28 AM Olivier Tavard <
>>> olivier.tav...@francelabs.com> wrote:
>>> 
 Hi,
 
 I wanted to know if the code remains interesting for the MCF community.
 I updated it since the initial release so please tell me if I need to
 submit a new patch into the issue already created :
 https://issues.apache.org/jira/projects/CONNECTORS/
>> issues/CONNECTORS-1500
 <
 https://issues.apache.org/jira/projects/CONNECTORS/
>> issues/CONNECTORS-1500
> 
 
 Thanks,
 Best regards,
 
 Olivier TAVARD
 
 
> Le 15 mars 2018 à 15:58, Karl Wright  a écrit :
> 
> Excellent!!
> 
> Thank you again.  I'll try to set up the branch this weekend.
> 
> Karl
> 
> 
> On Thu, Mar 15, 2018 at 10:52 AM, Olivier Tavard <
> olivier.tav...@francelabs.com> wrote:
> 
>> Hi Karl,
>> 
>> Sure thing, I created a ticket : https://issues.apache.org/
>> jira/projects/CONNECTORS/issues/CONNECTORS-1500 with the code in
>> attachment.
>> No specific libraries used, just JSOUP library that is already in the
 MCF
>> core project.
>> 
>> Best regards,
>> 
>> Olivier
>> 
>> 
>>> Le 15 mars 2018 à 11:51, Karl Wright  a écrit :
>>> 
>>> Hi Oliver,
>>> 
>>> Thank you very much for your contribution!
>>> 
>>> To have a legal trail, I usually prefer the following approach --
>>> 
>>> (1) Create a ticket
>>> (2) Attach a diff to the ticket
>>> 
>>> We'll then integrate the diff into a branch, and then finally into
 trunk.
>>> 
>>> Can you also let us know what kinds of dependent jars the
>> contribution
>>> has?  We'd need to know about not only direct dependencies, but also
 any
>>> downstream dependencies that may be incompatible with the Apache
 License.
>>> Usually we can figure this out but it saves time to know in advance
>> if
>>> there are LGPL dependencies (for instance).
>>> 
>>> Karl
>>> 
>>> 
>>> On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard <
>>> olivier.tav...@francelabs.com> wrote:
>>> 
 Hello MCF community,
 
 I developed a transformation connector based on Jsoup. The goal of
 this
 code id to simply choose an encompassing tag in a HTML document for
 text
 extracting. And inside this tag, this connector allows you to
>> remove
 subparts that you do no want : all the tags corresponding to
>> declared
>> types
 or specific attribute tag names for example.
 I would like to know if it could interest you. The code is in
>> Apache
 V2
 licence  and I integrated it in our enterprise search solution
>> (Datafari).
 This morning I integrated the code in a fork MCF project on GitHub.
 Obviously it needs some work including code refactoring, renaming
>> classes,
 unit tests that I will be able to do if you are interested by the
 code.
 The code is here : 

[jira] [Updated] (CONNECTORS-1500) HTML Extractor transformation connector contribution

2018-05-09 Thread Olivier Tavard (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olivier Tavard updated CONNECTORS-1500:
---
Attachment: global_patch.txt

> HTML Extractor transformation connector contribution
> 
>
> Key: CONNECTORS-1500
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1500
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.10
>
> Attachments: fix_englobing_tag_selection.txt, global_patch.txt, 
> html_extractor_transformation_connector.txt
>
>
> Hi,
> I developed a transformation connector based on Jsoup. The goal of this code 
> is to simply choose an encompassing tag in a HTML document for text 
> extracting. And inside this tag, this connector allows you to remove subparts 
> that you do no want : all the tags corresponding to declared types or 
> specific attribute tag names for example.
> The code is in Apache V2 licence  and it is in attachment.
> It needs some work including code refactoring, renaming classes, unit tests 
> that I will be able to do if you are interested by the code.
> The documentation is here :
> [https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector]<[https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+connector]
>  
> It does not use additional libraries that the ones already present in MCF 
> project. It is based on Jsoup library on lib folder.
> Best regards,
> Olivier



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)