CVE-2022-25312: An XML external entity (XXE) injection vulnerability exists in the Apache Any23 RDFa XSLTStylesheet extractor

2022-03-04 Thread lewis john mcgibbney
Description:

An XML external entity (XXE) injection vulnerability was discovered in
the Any23 RDFa XSLTStylesheet extractor and is known to affect Any23
versions < 2.7. XML external entity injection (also known as XXE) is a
web security vulnerability that allows an attacker to interfere with
an application's processing of XML data. It often allows an attacker
to view files on the application server filesystem, and to interact
with any back-end or external systems that the application itself can
access.

Resolution:

This issue is fixed in Apache Any23 2.7 which can be downloaded from
https://any23.apache.org/download.html. We strongly encourage all
Any23 users to upgrade to Apache Any23 2.7.

Credit:

The Apache Any23 Project Management Committee would like to thank Lion
Tree a.k.a liontree0110 for reporting this issue.

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[ANNOUNCE] Apache Any23 2.7

2022-03-04 Thread lewis john mcgibbney
The Apache Any23 Project Management Committee is pleased to announce
the release of Apache Any23 2.7.

Apache Anything To Triples (Any23) is a library, a web service and a
command line tool that extracts structured data in RDF format from a
variety of Web documents.

Any23 2.7 requires JDK11 to build and run.

Release Notes: https://github.com/apache/any23/blob/any23-2.7/RELEASE-NOTES.md

Download: http://any23.apache.org/download.html

Maven Artifacts:
https://search.maven.org/search?q=g:org.apache.any23%20AND%20v:2.7

DOAP: https://github.com/apache/any23-committers/blob/master/doap_Any23.rdf

Have Fun,
(Lewis), on behalf of the Apache Any23 PMC
N.B. The release artifacts can take a bit of time to reach the
distribution servers, please be patient.



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.7

2022-03-03 Thread lewis john mcgibbney
72 hrs has expired so I am happy to bring this VOTE to a close. Thanks to
everyone able to VOTE and review this release candidate. The RESULT is as
follows

[4] +1, Release as Apache Any23 ${latestStableRelease}
Andy Seaborne*
Hans Brende*
David Cockbill
Lewis John McGibbney*
*Any23 PMC-binding


[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

I'll go ahead and make the release :)
Thanks
lewismc

On Mon, Feb 21, 2022 at 11:16 AM lewis john mcgibbney 
wrote:

> Hi,
>
> Please VOTE on the release candidate for Apache Any23 2.7.
>
> Note, this release candidate requires JDK11 to build and run.
>
> We solved 18 
> issues:https://issues.apache.org/jira/projects/ANY23/versions/12350742
>
> Git source tag 
> (a8ee3fc67536bf702c06db67011095c3fdd6cf3a):https://gitbox.apache.org/repos/asf?p=any23.git;a=commit;h=a8ee3fc67536bf702c06db67011095c3fdd6cf3a
>
> Staging 
> repo:https://repository.apache.org/content/repositories/orgapacheany23-1012
>
> Sources and CLI binaries area:https://dist.apache.org/repos/dist/dev/any23
>
> PGP release keys (signed using 
> 48BAEBF6):https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, Release as Apache Any23 ${latestStableRelease}
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
>
> P.S. Here is my +1
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[VOTE] Release Apache Any23 2.7

2022-02-21 Thread lewis john mcgibbney
Hi,

Please VOTE on the release candidate for Apache Any23 2.7.

Note, this release candidate requires JDK11 to build and run.

We solved 18 
issues:https://issues.apache.org/jira/projects/ANY23/versions/12350742

Git source tag 
(a8ee3fc67536bf702c06db67011095c3fdd6cf3a):https://gitbox.apache.org/repos/asf?p=any23.git;a=commit;h=a8ee3fc67536bf702c06db67011095c3fdd6cf3a

Staging 
repo:https://repository.apache.org/content/repositories/orgapacheany23-1012

Sources and CLI binaries area:https://dist.apache.org/repos/dist/dev/any23

PGP release keys (signed using
48BAEBF6):https://dist.apache.org/repos/dist/release/any23/KEYS

Vote will be open for 72 hours.

[ ] +1, Release as Apache Any23 ${latestStableRelease}
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)


P.S. Here is my +1


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[ANNOUNCE] Apache Any23 2.6 Release

2022-01-08 Thread lewis john mcgibbney
The Apache Any23 Team is pleased to announce the release of Apache Any23
2.6.

Apache Anything To Triples (Any23) is a library, a web service and a
command line tool that extracts structured data in RDF format from a
variety of Web documents.

Any23 2.6 requires JDK11 to build and run.

Release Notes:
https://github.com/apache/any23/blob/any23-2.6/RELEASE-NOTES.md

Download: http://any23.apache.org/download.html

Dependency information: https://any23.apache.org/dependency-info.html

DOAP: https://github.com/apache/any23-committers/blob/master/doap_Any23.rdf

Community and support: https://any23.apache.org/mailing-lists.html

Have Fun,
(lewismc), on behalf of the Apache Any23 PMC

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.6 RC#2

2022-01-07 Thread lewis john mcgibbney
Hi user@ and dev@,
72 hours have expired. I'm closing off this VOTE. Thank you everyone able
to VOTE. The RESULT is as follows

[3] +1, release as Any23 2.6
Hans Brende*
Andy Seaborne*
Lewis John McGibbney*

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

*Any23 PMC-binding

This is excellent. i will close out the release and promote it.
Have a great weekend.
lewismc

On Mon, Jan 3, 2022 at 9:45 PM lewis john mcgibbney 
wrote:

> Hi user@ and dev@,
>
> Please VOTE on the 2nd release candidate for Apache Any23 2.6. Most
> notably this RC addresses several security vulnerabilities by upgrading
> every single Any23 dependency.
>
> We solved 62 issues:
> https://issues.apache.org/jira/projects/ANY23/versions/12350556
>
> Git source tag (7ea496991f3a053b00cba2ec82ef8a8a4d7e401e):
> https://gitbox.apache.org/repos/asf?p=any23.git;a=tag;h=refs/tags/any23-2.6
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1011
> <https://repository.apache.org/content/repositories/orgapacheany23-1010>
>
> Staging source:
> https://dist.apache.org/repos/dist/dev/any23/2.6/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, release as Any23 2.6
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[VOTE] Release Apache Any23 2.6 RC#2

2022-01-03 Thread lewis john mcgibbney
Hi user@ and dev@,

Please VOTE on the 2nd release candidate for Apache Any23 2.6. Most notably
this RC addresses several security vulnerabilities by upgrading every
single Any23 dependency.

We solved 62 issues:
https://issues.apache.org/jira/projects/ANY23/versions/12350556

Git source tag (7ea496991f3a053b00cba2ec82ef8a8a4d7e401e):
https://gitbox.apache.org/repos/asf?p=any23.git;a=tag;h=refs/tags/any23-2.6

Staging repo:
https://repository.apache.org/content/repositories/orgapacheany23-1011


Staging source:
https://dist.apache.org/repos/dist/dev/any23/2.6/

PGP release keys (signed using 48BAEBF6):
https://dist.apache.org/repos/dist/release/any23/KEYS

Vote will be open for 72 hours.

[ ] +1, release as Any23 2.6
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)

P.S. Here is my +1
-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.6

2022-01-03 Thread lewis john mcgibbney
Hi user@, dev@,
I'm going to bring this VOTE thread to a close with the following results.

[3] +1, release as Any23 2.6
Andy Seaborne*
Lewis John McGibbney*
David Cockbill

*Any23 PMC binding

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

Unfortunately on this occasion we were unable to get enough VOTE's from PMC
members to progress with the release.

There have been several improvements to Any23 since this release candidate
was produced, so I will go ahead and produce a new release candidate and
see if we have better luck with RC#2.

lewismc

On Wed, Nov 3, 2021 at 10:10 PM lewis john mcgibbney 
wrote:

> Hi user@ and dev@,
>
> Please VOTE on the release candidate for Apache Any23 2.6.
>
> We solved 45 issues:
> https://issues.apache.org/jira/projects/ANY23/versions/12350556
>
> Git source tag (ac9c507bf7aedb4909d5bc135944a5bd3c474bd7):
> https://gitbox.apache.org/repos/asf?p=any23.git;a=tag;h=refs/tags/any23-2.6
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1010
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.6/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, release as Any23 2.6
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Release Apache Any23 2.6

2021-11-12 Thread lewis john mcgibbney
Hi Any23 PMC,
Sorry to do it this way... but can anyone review this release candidate?
Thank you
lewismc

On Wed, Nov 3, 2021 at 10:10 PM lewis john mcgibbney 
wrote:

> Hi user@ and dev@,
>
> Please VOTE on the release candidate for Apache Any23 2.6.
>
> We solved 45 issues:
> https://issues.apache.org/jira/projects/ANY23/versions/12350556
>
> Git source tag (ac9c507bf7aedb4909d5bc135944a5bd3c474bd7):
> https://gitbox.apache.org/repos/asf?p=any23.git;a=tag;h=refs/tags/any23-2.6
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1010
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.6/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, release as Any23 2.6
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Release Apache Any23 2.6

2021-11-10 Thread lewis john mcgibbney
Hi Folks,
Anyone else able to VOTE?
Thank you
lewismc

On Wed, Nov 3, 2021 at 10:10 PM lewis john mcgibbney 
wrote:

> Hi user@ and dev@,
>
> Please VOTE on the release candidate for Apache Any23 2.6.
>
> We solved 45 issues:
> https://issues.apache.org/jira/projects/ANY23/versions/12350556
>
> Git source tag (ac9c507bf7aedb4909d5bc135944a5bd3c474bd7):
> https://gitbox.apache.org/repos/asf?p=any23.git;a=tag;h=refs/tags/any23-2.6
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1010
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.6/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, release as Any23 2.6
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: Re: [VOTE] Release Apache Any23 2.6

2021-11-04 Thread Lewis John McGibbney
On 2021/11/04 09:08:55 Andy Seaborne wrote:

> Suggested tweak to wording inline:
...
> These are not all binaries (as in "convenience binaries" in ASF-speak).
> 
> Some are the release source!
> 
> ==> "Proposed dist/ area:"

Thanks Andy. I've updated the documentation and will push it live when we make 
the release. Thanks


[VOTE] Release Apache Any23 2.6

2021-11-03 Thread lewis john mcgibbney
Hi user@ and dev@,

Please VOTE on the release candidate for Apache Any23 2.6.

We solved 45 issues:
https://issues.apache.org/jira/projects/ANY23/versions/12350556

Git source tag (ac9c507bf7aedb4909d5bc135944a5bd3c474bd7):
https://gitbox.apache.org/repos/asf?p=any23.git;a=tag;h=refs/tags/any23-2.6

Staging repo:
https://repository.apache.org/content/repositories/orgapacheany23-1010

Staging binaries:
https://dist.apache.org/repos/dist/dev/any23/2.6/

PGP release keys (signed using 48BAEBF6):
https://dist.apache.org/repos/dist/release/any23/KEYS

Vote will be open for 72 hours.

[ ] +1, release as Any23 2.6
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


CVE-2021-40146: A Remote Code Execution (RCE) vulnerability exists in Apache Any23 YAMLExtractor.java

2021-09-10 Thread lewis john mcgibbney
Description:

A Remote Code Execution (RCE) vulnerability was discovered in the
Any23 YAMLExtractor.java file and is known to affect Any23 versions <
2.5. RCE vulnerabilities allow a malicious actor to execute any code
of their choice on a remote machine over LAN, WAN, or internet. RCE
belongs to the broader class of arbitrary code execution (ACE)
vulnerabilities.

Credit:

The Apache Any23 Project Management Committee would like to thank
Zhuxuan Wu for reporting the security vulnerability.



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


CVE-2021-38555: An XML external entity (XXE) injection vulnerability exists in Apache Any23 StreamUtils.java

2021-09-10 Thread lewis john mcgibbney
Severity: critical

Description:

An XML external entity (XXE) injection vulnerability was discovered in
the Any23 StreamUtils.java file and is known to affect Any23 versions
< 2.5. XML external entity injection (also known as XXE) is a web
security vulnerability that allows an attacker to interfere with an
application's processing of XML data. It often allows an attacker to
view files on the application server filesystem, and to interact with
any back-end or external systems that the application itself can
access.

Credit:

The Apache Any23 Project Management Committee would like to thank
Zhuxuan Wu for reporting the security vulnerability.



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[ANNOUNCE] Apache Any23 2.5 Release

2021-09-10 Thread lewis john mcgibbney
*What?*
The Apache Any23 Team is pleased to announce the release of Apache Any23 2.5.

Apache Anything To Triples (Any23) is a library, a web service and a
command line tool that
extracts structured data in RDF format from a variety of Web documents.
*Where?*
Download: http://any23.apache.org/download.html
Maven Artifacts: https://s.apache.org/3lai8
DOAP: https://github.com/apache/any23-committers/blob/master/doap_Any23.rdf
Release Notes: 
https://dist.apache.org/repos/dist/release/any23/2.5/RELEASE-NOTES.txt

Have Fun,
Lewis
(on behalf of the Apache Any23 PMC)

N.B. The release artifacts can take a bit of time to reach the
distribution servers, please be patient.



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Fwd: WebDataCommons releases 86.3 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 15.3 million websites

2021-01-21 Thread lewis john mcgibbney
FYI folks

-- Forwarded message -
From: Lewis John Mcgibbney 
Date: Thu, Jan 21, 2021 at 1:04 PM
Subject: Re: WebDataCommons releases 86.3 billion quads Microdata, Embedded
JSON-LD, RDFa, and Microformat data originating from 15.3 million websites
To: Web Data Commons 


Congratulations on the new dataset release.
The statistics are really interesting.
Really good to hear that Any23 is performing nominally. That is good. :)

On Thursday, 21 January 2021 at 02:00:44 UTC-8 apri...@gmail.com wrote:

> Hi all,
>
> we are happy to announce the new release of the WebDataCommons Microdata,
> JSON-LD, RDFa and Microformat data corpus.
>
> The data has been extracted from the September 2020 version of the Common
> Crawl covering 3.4 billion HTML pages which originate from 34.5 million
> websites (pay-level domains). For the extraction of structured data, the
> newest version 2.4 of the any23 library was used.
>
> In summary, we found structured data within 1.7 billion HTML pages out of
> the 3.4 billion pages contained in the crawl (50%). These pages originate
> from 15.3 million different pay-level domains out of the 34.5 million
> pay-level-domains covered by the crawl (44.3%). Last year, we only found
> structured data in 37% of the pages and on 37.2% of the pay-level-domains.
>
> Approximately 7.8 million of the 2020 websites use Microdata, 7.6 million
> websites use JSON-LD, and 3.3 million websites make use of RDFa.
> Microformats are used by more than 4 million websites within the crawl.
>
>
>
> *Statistics about the December 2020 Release:*
>
> Basic statistics about the December 2020 Microdata, JSON-LD, RDFa, and
> Microformat data sets as well as the vocabularies that are used together
> with each markup format are found at:
>
> http://webdatacommons.org/structureddata/2020-12/stats/stats.html
>
>
>
> *Markup Format Adoption*
>
> The page below provides an overview of trends in the adoption of the
> different markup formats as well as widely used schema.org classes in the
> timespan 2012 to 2020:
>
> http://webdatacommons.org/structureddata/#toc3
>
> Comparing the statistics from the new 2020 release to the statistics about
> the 2019 release of the data sets
>
> http://webdatacommons.org/structureddata/2019-12/stats/stats.html
>
> we can observe that although the overall number of pages in the crawl is
> by 38.9% larger in comparison to the crawl used for the 2019 release, the
> corresponding growth in terms of domains is only 7.9%, indicating that the
> crawl corpus used this year is much deeper in comparison to the one of last
> year. However, we see that more and more websites annotate their content,
> as the yearly increase of the domains having annotated data was more than
> 28%. The markup format with the largest domain growth in adoption (>50%) is
> JSON-LD. The growing trend of the JSON-LD format becomes even more obvious
> in certain domains, such as hotels.com and yahoo.com, which have switched
> from using Microdata to using JSON-LD as dominant markup language.
> Concerning the vocabulary adoption, schema.org continues to be the most
> dominant vocabulary. More concretely, the classes schema:WebPage,
> schema:Product, schema:Rating, schema:Organization and schema:Person saw a
> major adoption increase in comparison to 2019 (>40%). Looking at the
> richness of JSON-LD descriptions, we notice that the average number of
> triples per URL has grown from 29 in 2019 to 41 in 2020 and has now reached
> a similar level of detail as the Microdata annotations (avg 39 triples per
> URL).
>
>
>
> *Download *
>
> The overall size of the December 2020 RDFa, Microdata, Embedded JSON-LD
> and Microformat data sets is 86.3 billion RDF quads. For download, we split
> the data into 21,346 files with a total size of 1.9 TB.
>
>
> http://webdatacommons.org/structureddata/2020-12/stats/how_to_get_the_data.html
>
> In addition, we have created for over 43 different schema.org classes
> separate files, including all quads extracted from pages, using a specific
> schema.org class.
>
>
> http://webdatacommons.org/structureddata/2020-12/stats/schema_org_subsets.html
>
>
>
> *Lots of thanks to:*
>
> + the Common Crawl project for providing their great web crawl and
> thus enabling the WebDataCommons project.
> + the Any23 project for providing and maintaining their great library of
> structured data parsers.
> + Amazon Web Services in Education Grant for supporting WebDataCommons.
>
>
> *General Information about the WebDataCommons Project*
>
> The WebDataCommons project extracts yearly since 2012 structured data from
> the Common Crawl, the largest web corpus available to the public, and
> provides the extract

[ANNOUNCEMENT] Apache Any23 2.4 Release

2020-10-06 Thread lewis john mcgibbney
The Apache Any23 Team is pleased to announce the release of Apache Any23
2.4.

Apache Anything To Triples (Any23) is a library, a web service and a
command line tool that
extracts structured data in RDF format from a variety of Web documents.

Release Notes:
https://github.com/apache/any23/blob/any23-2.4/RELEASE-NOTES.txt

Download: http://any23.apache.org/download.html

Maven Artifacts: https://s.apache.org/l6sg9

Community mailing lists: http://any23.apache.org/mailing-lists.html

Have Fun,
(Lewis), on behalf of the Apache Any23 PMC
N.B. The release artifacts can take a bit of time to reach the distribution
servers, please be patient.

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.4

2020-10-06 Thread lewis john mcgibbney
Thank you to everyone able to cast a VOTE. 72 hours has come and gone and
it is time to close this VOTE with the following RESULT

[4] +1, release this as Apache Any23 2.4
Andy Seaborne*
Lewis John McGibbney*
Shashanka Balakuntala
Jacek Grzebyta*

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

*Any23 PMC binding

I'll progress with the remainder of the release now.

lewismc

On Sun, Sep 20, 2020 at 8:14 PM lewis john mcgibbney 
wrote:

> Hi,
>
> Please VOTE on the release candidate for Apache Any23 2.4
>
> We solved 39 issues:
> https://issues.apache.org/jira/projects/ANY23/versions/12344593
>
> Git source tag (151b916
> <https://github.com/apache/any23/commit/151b916cfd0c526d9b34407549202a8849cb70d6>
> ):
> https://github.com/apache/any23/tree/any23-2.4
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1008
>
> Staging binaries:
> https://dist.apache.org/repos/dist/release/any23/2.3/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, release this as Apache Any23 2.4
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: LD-JSON Embedded not working

2020-10-02 Thread Lewis John McGibbney
Hi Mauro,

On 2020/10/01 09:41:19, Mauro Asprea  wrote: 
> Thank you Lewis!
> 
> Then I should assume that 2.3 is "broken"? I'll try the upcoming 2.4 as you
> suggested.

Unfortunately it looks like this is one that slipped through the net... which 
is disappointing as we have tests for this extractor. 
https://github.com/apache/any23/blob/any23-2.3/core/src/test/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractorTest.java
.. I don't have an explanation right now!

> 
> I still have one more question, apart from that, what is the best way to
> debug Any23 issues like this?

I would suggest that tis mailing list is the best place. If you want, I suppose 
we could set up Slack on the ASF slack channel... 

Any suggestions?
Thanks


Re: LD-JSON Embedded not working

2020-09-30 Thread Lewis John McGibbney
Hi Mauro,

On 2020/09/24 11:28:24, Mauro Asprea  wrote: 
> Hello, what am I doing wrong?
> 
> I downloaded the CLI binary distribution and verified that google does find
> the embedded LD-JSON triplets as you see here
> https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Fwww.monster.com%2Fjobs%2Fsearch%2F%3Fq%3DRuby%26where%3DAustin__2C-TX

Just a quick note. It is impossible to know what kind of 'standard' Google uses 
for the structure data testing tool. As you can see, it is also being 
deprecated pretty quickly. 

> 
> Then I run any23 but I get no quads/triplets...
> 
> hamilcar:apache-any23-cli-2.3:> bin/any23 rover -p -s -t "
> > https://www.monster.com/jobs/search/?q=Ruby&where=Austin__2C-TX"; -f json
> > -e html-embedded-jsonld -l monster-jsonld.log

Using the 2.4 RC#1 (https://dist.apache.org/repos/dist/dev/any23/2.4/) I get 
the following results which include the one Organization, one ItemList and 29 
itemListElement's

./bin/any23 rover -p -s -t 
"https://www.monster.com/jobs/search/?q=Ruby&where=Austin__2C-TX"; -f json -e 
html-embedded-jsonld -l monster-jsonld.log -o monster.json


Apache Any23 :: rover


>Summary:
   -total calls: 1
   -total triples: 128
   -total runtime: 30 ms!
   -tripls/ms: 4
   -ms/calls: 30
>Extractor: html-embedded-jsonld
   -total calls: 1
   -total triples: 128
   -total runtime: 30 ms!
   -tripls/ms: 4
   -ms/calls: 30


Apache Any23 SUCCESS
Total time: 2s
Finished at: Wed Sep 30 11:11:04 PDT 2020
Final Memory: 107M/367M

I suggest that you should upgrade to 2.4.

> 
> 
> How can I increase the logging level to see any hidden debug messages?

You would need to literally hack the source code to add  more debug logging. 

I created a ticket to address the log4j appender issue - 
https://issues.apache.org/jira/browse/ANY23-454

> 
> Also as you can see, this webpage has an embedded LD+JSON script that is
> not being picked up by the extractor. Help?
> 

If I remove the extractor flag e.g. -e html-embedded-jsonld, then I get lots 
more results. Some of these are however trivial in nature so would need to be 
filtered out.

>Summary:
   -total calls: 21
   -total triples: 189
   -total runtime: 688 ms!
   -tripls/ms: 0
   -ms/calls: 32
>Extractor: html-head-icbm
   -total calls: 1
   -total triples: 0
   -total runtime: 6 ms!
   -tripls/ms: 0
   -ms/calls: 6
>Extractor: html-mf-geo
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-head-meta
   -total calls: 1
   -total triples: 16
   -total runtime: 5 ms!
   -tripls/ms: 3
   -ms/calls: 5
>Extractor: html-mf-adr
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-hcalendar
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-hresume
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-hreview
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: consolidation-extractor
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-xpath
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-head-title
   -total calls: 1
   -total triples: 1
   -total runtime: 1 ms!
   -tripls/ms: 1
   -ms/calls: 1
>Extractor: html-mf-hcard
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-rdfa11
   -total calls: 1
   -total triples: 44
   -total runtime: 33 ms!
   -tripls/ms: 1
   -ms/calls: 33
>Extractor: html-mf-hreview-aggregate
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-license
   -total calls: 1
   -total triples: 0
   -total runtime: 3 ms!
   -tripls/ms: 0
   -ms/calls: 3
>Extractor: html-mf-xfn
   -total calls: 1
   -total triples: 0
   -total runtime: 2 ms!
   -tripls/ms: 0
   -ms/calls: 2
>Extractor: html-mf-species
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-hlisting
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-microdata
   -total calls: 1
   -total triples: 0
   -total runtime: 2 ms!
   -tripls/ms: 0
   -ms/calls: 2
>Extractor: html-mf-hrecipe
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-embedded-jsonld
   -total calls: 1
   -total triples: 128
   -total runtime: 627 ms!
   -tripls/ms: 0
   -ms/calls: 627
>Extractor: html-head-links
   -

Re: [ROLL CALL] Apache Any23 project interest

2020-09-22 Thread Lewis John McGibbney
Hi Claude,

Thanks for your input. I agree with you. 

I looked at the issues registered in JIRA 
https://issues.apache.org/jira/projects/ANY23/issues/ANY23-294?filter=allopenissues

Many of these are optimizations rather than new features. Also, there are only 
39 issues identified for the entire project right now... which indicates to me 
that the library is pretty stable.

Lewis

On 2020/09/22 21:24:59, Claude Warren  wrote: 
> I have not had time to develop anything for any23 but I have used it and
> think that not retiring the project is the best course of action.  As long
> as someone is around to check on security issues and make sure the code is
> at a minimum java level, perhaps it does not need more development.
> 
> Claude
> 
> On Sun, Sep 20, 2020 at 8:24 PM lewis john mcgibbney 
> wrote:
> 
> > Hi user@, dev@,
> > I was unable to file last week's board report for the project :(
> > I did see the board feedback which encouraged us to hold a roll call to
> > see who is around and what interest there is in Any23.
> > Is there anyone out there? A simple response here gives me an idea of
> > whether it is worthwhile continuing with the project.
> > Thanks all,
> > lewismc
> >
> > --
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
> >
> 
> 
> -- 
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
> 


[VOTE] Release Apache Any23 2.4

2020-09-20 Thread lewis john mcgibbney
Hi,

Please VOTE on the release candidate for Apache Any23 2.4

We solved 39 issues:
https://issues.apache.org/jira/projects/ANY23/versions/12344593

Git source tag (151b916

):
https://github.com/apache/any23/tree/any23-2.4

Staging repo:
https://repository.apache.org/content/repositories/orgapacheany23-1008

Staging binaries:
https://dist.apache.org/repos/dist/release/any23/2.3/

PGP release keys (signed using 48BAEBF6):
https://dist.apache.org/repos/dist/release/any23/KEYS

Vote will be open for 72 hours.

[ ] +1, release this as Apache Any23 2.4
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)

P.S. Here is my +1

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[ROLL CALL] Apache Any23 project interest

2020-09-20 Thread lewis john mcgibbney
Hi user@, dev@,
I was unable to file last week's board report for the project :(
I did see the board feedback which encouraged us to hold a roll call to see
who is around and what interest there is in Any23.
Is there anyone out there? A simple response here gives me an idea of
whether it is worthwhile continuing with the project.
Thanks all,
lewismc

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RELEASE] Apache Any23 2.3

2019-03-04 Thread lewis john mcgibbney
The Apache Any23 Team is pleased to announce the release of Apache Any23
2.3.

Apache Anything To Triples (Any23) is a library, a web service and a
command line tool that
extracts structured data in RDF format from a variety of Web documents.

Release Notes:
https://github.com/apache/any23/blob/any23-2.3/RELEASE-NOTES.txt

Download: http://any23.apache.org/download.html

Maven Artifacts: https://s.apache.org/mwOE

DOAP: https://github.com/apache/any23-committers/blob/master/doap_Any23.rdf

Have Fun,
(Lewis), on behalf of the Apache Any23 PMC
N.B. The release artifacts can take a bit of time to reach the distribution
servers, please be patient.
-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.3

2019-03-04 Thread lewis john mcgibbney
Hi user@, dev@,
72 have come and gone. See below for the RESULT

[5] +1, let's release as Apache Any23 2.3
Lewis John McGibbney*
Hans Brende*
David Cockbill
Jacek Grzebyta*
Kevin Ratnasekera

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

*Apache Any23 PMC

I am happy to state that the VOTE passes with 3 PMC-binding VOTE's. I'll go
ahead and complete the release management.
Thank you everyone that contributed to the Any23 2.3 development drive.
Thank you to all that reviewed the release candidate.
Lewis

On Tue, Feb 26, 2019 at 2:42 PM lewis john mcgibbney 
wrote:

> Hi user@ and dev@any23,
> A release candidate for Any23 2.3 is available!
>
> We solved 104 issues:
> https://issues.apache.org/jira/projects/ANY23/versions/12342665
>
> Git source tag (1df40a5e9ddd153b14ff8647e8af70aebb148d70):
>
> https://gitbox.apache.org/repos/asf?p=any23.git;a=tag;h=c4e30a9d7e6148023452234eadf1752ddd92d47f
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1007
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, let's release as Apache Any23 2.3
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[VOTE] Release Apache Any23 2.3

2019-02-26 Thread lewis john mcgibbney
Hi user@ and dev@any23,
A release candidate for Any23 2.3 is available!

We solved 104 issues:
https://issues.apache.org/jira/projects/ANY23/versions/12342665

Git source tag (1df40a5e9ddd153b14ff8647e8af70aebb148d70):
https://gitbox.apache.org/repos/asf?p=any23.git;a=tag;h=c4e30a9d7e6148023452234eadf1752ddd92d47f

Staging repo:
https://repository.apache.org/content/repositories/orgapacheany23-1007

Staging binaries:
https://dist.apache.org/repos/dist/dev/any23

PGP release keys (signed using 48BAEBF6):
https://dist.apache.org/repos/dist/release/any23/KEYS

Vote will be open for 72 hours.

[ ] +1, let's release as Apache Any23 2.3
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)

P.S. Here is my +1

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[VOTE] Move Canonical Any23 Source Code to

2018-12-08 Thread lewis john mcgibbney
dev@,
As per Daniel Gruno's NOTICE [0], all https://git-wip-us.apache.org
repositories will be moved to the new gitbox service which includes direct
write access on github as well as the standard ASF commit access via
gitbox.apache.org.
We can do this voluntarily or we can be forced to do it (which basically
means we are being forced to do it but the former option at least enables
us to have some input into how it is done.)
This thread is a VOTE for us to move the canonical Any23 source code over
to gitbox.apache.org. There will be no severe impact to our users or dev@
community. I anticipate the only impact to be the requirement to change the
remote canonical origin mapping in your Git client. Everything else should
be fine.
Some benefits include the ability for us to use the normal Github workflow
for merging, etc. so infact it will probably streamline some aspects of our
development workflow.
Please VOTE as follows

[ ] +1 Move the canonical Any23 source code from
https://git-wip-us.apache.org to gitbox.apache.org.
[ ] +1 DO NOT move the canonical Any23 source code from
https://git-wip-us.apache.org to gitbox.apache.org (please provide
justification)

This VOTE will be open a minimum of 72 hours.
Here is my +1
Lewis


[0] https://s.apache.org/aGMR
-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: Configuring the NTriplesWriter used by org.apache.any23.writer.TripleHandler

2018-07-31 Thread lewis john mcgibbney
Hi Lars,
Response inline...

On Tue, Jul 24, 2018 at 2:13 AM,  wrote:

>
>
> From: "Svensson, Lars" 
> To: "user@any23.apache.org" 
> Cc:
> Bcc:
> Date: Tue, 24 Jul 2018 09:13:25 +
> Subject: Configuring the NTriplesWriter used by
> org.apache.any23.writer.TripleHandler
> Greetings all,
>
> When using the NTriplesWriter, I wanted to configure it to write unicode
> points as escape sequences. I tried to subclass 
> org.apache.any23.writer.TripleHandler
> and overwrite the access to the org.eclipse.rdf4j.rio.ntriples.NTriplesWriter
> but couldn't do that since the access to the NTriplesWriter is package
> protected. I ended up copying the code which seems a bit clunky...
>
> Is there a chance you could make access to the NTriplesWriter in
> TripleHandler protected instead of package protected in order to allow for
> better subclassing?
>

Are you in a position to provide a pull request for this? If it provides
compelling functionality such as you are describing, the donctribitoion
would be most welcome.
Thank you,
Lewis


[ANNOUNCE] Apache Any23 2.2

2018-03-23 Thread lewis john mcgibbney
The Apache Any23 Team is pleased to announce the release of Apache Any23
2.2.

*What is Any23?*

Anything To Triples (Any23) is a library, a web service and a command line
tool that extracts structured data in RDF format from a variety of Web
documents. Currently it supports the following input formats:

   - RDF/XML , Turtle
   , Notation 3
   
   - RDFa  with RDFa1.1 prefix
   mechanism
   
   - Microformats1  and Microformats2
   : hAdr, hCard, hCalendar,
   hEntry, hEvent, hGeo, hItem, hListing, hProduct, hProduct, hRecipie,
   hResume, hReview, License, Species, XFN, etc
   - JSON-LD : JSON for Linking Data. a lightweight
   Linked Data format based on the already successful JSON format and provides
   a way to help JSON data interoperate at Web-scale.
   - HTML5 Microdata : (such as Schema.org
   )
   - CSV : Comma Separated Values with
   separator autodetection.
   - Vocabularies: Extraction support for Dublin Core Terms
   , Description of a Career
   , Description Of
   A Project , Friend Of A Friend
   , GEO Names
   , ICAL
   , lkif-core
   , Open Graph Protocol
   , BBC Programmes Ontology ,
   RDF Review Vocabulary , schema.org,
   VCard , BBC Wildlife Ontology
    and XHTML
   ... and more!
   - YAML : human friendly data serialization
   standard for all programming languages.
   - Additionally, as of 2.1 Any23 provides functionality to extract
   triples using the Open Information Extraction (Open IE) system
   . The Open IE system runs
   over sentences and creates extractions that represent relations in text, in
   the case of Any23, this results in triples.

*Downloads*
http://any23.apache.org/download.html

*Release Notes:*

https://s.apache.org/YmRb

Have Fun,
Lewis, on behalf of the Apache Any23 Project Management Committee

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.2 RC#2

2018-03-23 Thread lewis john mcgibbney
Hi Folks,
I am bringing this VOTE thread to a close.
Thank you to everyone that VOTE'd, the RESULT is below.


[4] +1, Release Apache Any23 2.2 RC#2
Hans Brende*
Lewis John McGibbney*
Jacek Grzebyta*
Reto Gmür*

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

*Any23 PMC

I'll go ahead and make the release.
Thanks
Lewis

On Mon, Mar 12, 2018 at 11:00 PM, lewis john mcgibbney 
wrote:

> Hi user@ and dev@,
> We need one more PMC VOTE to release here.
> Thanks to anyone able to take the time to do this.
> Best
> Lewis
>
> On Thu, Mar 1, 2018 at 3:49 PM, lewis john mcgibbney 
> wrote:
>
>> Hi Folks,
>>
>> I would like to open a VOTE on the Any23 2.2 RC#1
>>
>> We addressed 43 issues:
>> https://s.apache.org/LUYt
>>
>> Git source tag (b0d1d21b110b603c23f0ee12bd7e70ccc5c5c8bbf):
>> https://s.apache.org/SlRO
>>
>> Staging repo:
>> https://repository.apache.org/content/repositories/orgapacheany23-1006
>>
>> Staging binaries:
>> https://dist.apache.org/repos/dist/dev/any23/2.2/
>>
>> PGP release keys (signed using 48BAEBF6):
>> https://dist.apache.org/repos/dist/release/any23/KEYS
>>
>> Vote will be open for 72 hours.
>>
>> [ ] +1, Release Apache Any23 2.2 RC#2
>> [ ] +/-0, fine, but consider to fix few issues before...
>> [ ] -1, nope, because... (and please explain why)
>>
>> P.S. Here is my +1
>>
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc
>>
>
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Release Apache Any23 2.2

2018-03-15 Thread lewis john mcgibbney
Hi Reto,
Thanks for pointing that out, here is the staging repository

https://repository.apache.org/content/repositories/orgapacheany23-1006/

As you can see it was just the wrong hyperlink.
Thanks
Lewis

On Thu, Mar 15, 2018 at 03:01 Reto Gmür  wrote:

> Hi Lewis
>
>
>
> The staging repos 404s. I could verify the git tag, but I think I should
> be voting on the tarball that is then actually released.
>
>
>
> Cheers,
>
> Reto
>
>
>
> *From:* lewis john mcgibbney 
> *Sent:* Wednesday, January 31, 2018 3:17 AM
> *To:* Bill Anderson ; ans...@apache.org;
> dpalmis...@apache.org; giova...@apache.org; Chris Mattmann <
> mattm...@apache.org>; mosta...@apache.org; Nick Kew ;
> prami...@apache.org; Reto Bachmann-Gmür ; Simone Tripodi
> ; szy...@apache.org; Tommaso Teofili <
> tomm...@apache.org>; Andy Seaborne 
> *Subject:* Fwd: [VOTE] Release Apache Any23 2.2
>
>
>
> Hi Folks,
>
> Anyone in the PMC able to VOTE on the Any23 2.2 release candidate would
> really be helping out. We need one more binding PMC VOTE to release.
>
> Thanks
>
> Lewis
>
>
>
> -- Forwarded message --
> From: *lewis john mcgibbney* 
> Date: Thu, Jan 25, 2018 at 11:24 AM
> Subject: [VOTE] Release Apache Any23 2.2
> To: user@any23.apache.org, d...@any23.apache.org
>
> Hi Folks,
>
> I would like to open a VOTE on the Any23 2.2 RC#1
>
> We solved 40 issues:
> https://s.apache.org/BT4V
>
> Git source tag (b6ed4cfa288b29068c5d822f666ff38814c947c9):
> https://s.apache.org/GOLk
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1004
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.2/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, Release Apache Any23 2.2
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
>
> --
>
> http://home.apache.org/~lewismc/
>
> http://people.apache.org/keys/committer/lewismc
>
>
>
>
> --
>
> http://home.apache.org/~lewismc/
>
> http://people.apache.org/keys/committer/lewismc
>
-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Release Apache Any23 2.2 RC#2

2018-03-12 Thread lewis john mcgibbney
Hi user@ and dev@,
We need one more PMC VOTE to release here.
Thanks to anyone able to take the time to do this.
Best
Lewis

On Thu, Mar 1, 2018 at 3:49 PM, lewis john mcgibbney 
wrote:

> Hi Folks,
>
> I would like to open a VOTE on the Any23 2.2 RC#1
>
> We addressed 43 issues:
> https://s.apache.org/LUYt
>
> Git source tag (b0d1d21b110b603c23f0ee12bd7e70ccc5c5c8bbf):
> https://s.apache.org/SlRO
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1006
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.2/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, Release Apache Any23 2.2 RC#2
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Release Apache Any23 2.2 RC#2

2018-03-09 Thread lewis john mcgibbney
PING, we need one more PMC +1

On Thu, Mar 1, 2018 at 3:49 PM, lewis john mcgibbney 
wrote:

> Hi Folks,
>
> I would like to open a VOTE on the Any23 2.2 RC#1
>
> We addressed 43 issues:
> https://s.apache.org/LUYt
>
> Git source tag (b0d1d21b110b603c23f0ee12bd7e70ccc5c5c8bbf):
> https://s.apache.org/SlRO
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1006
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.2/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, Release Apache Any23 2.2 RC#2
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Release Apache Any23 2.2 RC#2

2018-03-05 Thread lewis john mcgibbney
Hello All,
As we work through https://issues.apache.org/jira/browse/INFRA-16130, I
have decided to NOT stage the apache-any23-service* artifacts.
We are waiting on making them available through a CDN.. I will keep you
updated on this.
In the meantime, please VOTE on the RC as all artifacts associated with the
RC#2 are now available.
Thank you in advance,
Lewis


On Thu, Mar 1, 2018 at 4:04 PM, lewis john mcgibbney 
wrote:

> Hi Folks,
> There is a temporary issue uploading the larger service artifacts to the
> staging SVN server.
> https://issues.apache.org/jira/browse/INFRA-16130
> Please wait until this has been addressed before VOTE'ing on the RC.
> Thank you
> Lewis
>
> On Thu, Mar 1, 2018 at 3:49 PM, lewis john mcgibbney 
> wrote:
>
>> Hi Folks,
>>
>> I would like to open a VOTE on the Any23 2.2 RC#1
>>
>> We addressed 43 issues:
>> https://s.apache.org/LUYt
>>
>> Git source tag (b0d1d21b110b603c23f0ee12bd7e70ccc5c5c8bbf):
>> https://s.apache.org/SlRO
>>
>> Staging repo:
>> https://repository.apache.org/content/repositories/orgapacheany23-1006
>>
>> Staging binaries:
>> https://dist.apache.org/repos/dist/dev/any23/2.2/
>>
>> PGP release keys (signed using 48BAEBF6):
>> https://dist.apache.org/repos/dist/release/any23/KEYS
>>
>> Vote will be open for 72 hours.
>>
>> [ ] +1, Release Apache Any23 2.2 RC#2
>> [ ] +/-0, fine, but consider to fix few issues before...
>> [ ] -1, nope, because... (and please explain why)
>>
>> P.S. Here is my +1
>>
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc
>>
>
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Release Apache Any23 2.2 RC#2

2018-03-01 Thread lewis john mcgibbney
Hi Folks,
There is a temporary issue uploading the larger service artifacts to the
staging SVN server.
https://issues.apache.org/jira/browse/INFRA-16130
Please wait until this has been addressed before VOTE'ing on the RC.
Thank you
Lewis

On Thu, Mar 1, 2018 at 3:49 PM, lewis john mcgibbney 
wrote:

> Hi Folks,
>
> I would like to open a VOTE on the Any23 2.2 RC#1
>
> We addressed 43 issues:
> https://s.apache.org/LUYt
>
> Git source tag (b0d1d21b110b603c23f0ee12bd7e70ccc5c5c8bbf):
> https://s.apache.org/SlRO
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1006
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.2/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, Release Apache Any23 2.2 RC#2
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[VOTE] Release Apache Any23 2.2 RC#2

2018-03-01 Thread lewis john mcgibbney
Hi Folks,

I would like to open a VOTE on the Any23 2.2 RC#1

We addressed 43 issues:
https://s.apache.org/LUYt

Git source tag (b0d1d21b110b603c23f0ee12bd7e70ccc5c5c8bbf):
https://s.apache.org/SlRO

Staging repo:
https://repository.apache.org/content/repositories/orgapacheany23-1006

Staging binaries:
https://dist.apache.org/repos/dist/dev/any23/2.2/

PGP release keys (signed using 48BAEBF6):
https://dist.apache.org/repos/dist/release/any23/KEYS

Vote will be open for 72 hours.

[ ] +1, Release Apache Any23 2.2 RC#2
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)

P.S. Here is my +1

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.2

2018-02-09 Thread lewis john mcgibbney
Hi Folks,
Thank you to everyone who was able to VOTE on the Any23 2.2 RC#1.
The RESULT is below


[4] +1, Release Apache Any23 2.2
Lewis McGibbney*
Chris Mattmann*
Hans Brende
Jacek Grzebyta*

[ ] +/-0, fine, but consider to fix few issues before...

[1] -1, nope, because... (and please explain why)
Andy Seaborne* - https://s.apache.org/SnNQ

* Any23 PMC

The VOTE does not pass as there are blocking issues as highlighted by Andy.
I will rollback and reproduce an RC#2.
Lewis

On Thu, Jan 25, 2018 at 11:24 AM, lewis john mcgibbney 
wrote:

> Hi Folks,
>
> I would like to open a VOTE on the Any23 2.2 RC#1
>
> We solved 40 issues:
> https://s.apache.org/BT4V
>
> Git source tag (b6ed4cfa288b29068c5d822f666ff38814c947c9):
> https://s.apache.org/GOLk
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1004
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.2/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, Release Apache Any23 2.2
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Release Apache Any23 2.2

2018-02-02 Thread lewis john mcgibbney
PING please, thank you

On Thu, Jan 25, 2018 at 11:24 AM, lewis john mcgibbney 
wrote:

> Hi Folks,
>
> I would like to open a VOTE on the Any23 2.2 RC#1
>
> We solved 40 issues:
> https://s.apache.org/BT4V
>
> Git source tag (b6ed4cfa288b29068c5d822f666ff38814c947c9):
> https://s.apache.org/GOLk
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1004
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.2/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, Release Apache Any23 2.2
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[VOTE] Release Apache Any23 2.2

2018-01-25 Thread lewis john mcgibbney
Hi Folks,

I would like to open a VOTE on the Any23 2.2 RC#1

We solved 40 issues:
https://s.apache.org/BT4V

Git source tag (b6ed4cfa288b29068c5d822f666ff38814c947c9):
https://s.apache.org/GOLk

Staging repo:
https://repository.apache.org/content/repositories/orgapacheany23-1004

Staging binaries:
https://dist.apache.org/repos/dist/dev/any23/2.2/

PGP release keys (signed using 48BAEBF6):
https://dist.apache.org/repos/dist/release/any23/KEYS

Vote will be open for 72 hours.

[ ] +1, Release Apache Any23 2.2
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)

P.S. Here is my +1

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Any23 2.1 now part of Nutch 1.X software

2018-01-11 Thread lewis john mcgibbney
Hi Folks,
A quick heads up that Any23 is now part of the 1.x Nutch plugin
distribution and will be shipped with the next Nutch release.
The longstanding Any23 issue can be found at
https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-1129
Thanks
Lewis

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: parse broken uri

2017-12-12 Thread lewis john mcgibbney
Hello Alfonso,
This took me a while longer than expected to get around to providing a fix.
Please see https://issues.apache.org/jira/browse/ANY23-314 and proposed fix
at https://github.com/apache/any23/pull/49
For the service specifically, the primary issue was that the TripleHandler
was not being closed upon an unsuccessful extraction e.g. we were failing
to write successful triples even if some were detected prior to a parse
error. That has now been fixed.
Thanks
Lewis

On Tue, Dec 12, 2017 at 4:46 AM,  wrote:

> From: alfonso.debi...@libero.it
> To: user@any23.apache.org
> Cc:
> Bcc:
> Date: Thu, 30 Nov 2017 17:49:45 +0100 (CET)
> Subject: Re: parse broken uri
>
> Hi Lewis,
>
> l'Uri is this: https://www.jobcluster.de,
>
> Thanks for the reply.
>
>


Re: Model for any32 core vs nquads

2017-12-12 Thread lewis john mcgibbney
Hello Anna,
As you've seen the any23-nquads module and associated functionality [0] was
removed (I believe with 2.0 release) with the functionality being baked
back in to the relevant any23-core extractor [1] and writer [2] packages
respectively.
You should be able to use the NQUADS implementation more easily now via the
common extractor and writer interfaces provided.
With regards to the result, any feedback on inconsistencies or anomalies
would be appreciated.
hth
Lewis

[0]
https://github.com/apache/any23/tree/any23-1.1/nquads/src/main/java/org/apache/any23/io/nquads
[1]
https://github.com/apache/any23/tree/master/core/src/main/java/org/apache/any23/extractor/rdf
[2]
https://github.com/apache/any23/tree/master/core/src/main/java/org/apache/any23/writer

On Tue, Dec 12, 2017 at 4:46 AM,  wrote:

>
> From: Anna Primpeli 
> To: 
> Cc:
> Bcc:
> Date: Tue, 12 Dec 2017 13:46:08 +0100
> Subject: Model for any32 core vs nquads
>
> Hello Any23 team,
>
>
>
> I am using any23-core (version 1.1) for my project and I would like to
> update to the current version, 2.1.
>
> In the same project we make use of any23-nquads as well. As it seems there
> is an incompatibility on the rdf model dependency. The current version of
> any23-core uses RDF4J while any23-nquads uses sesame dependencies.
>
>
>
> Is there a plan of updating the any23-nquads model to RDF4J as well? Is
> there any workaround possible or should I stick to the old version of
> any23-core?
>
>
>
> Thank you very much in advance!
>
>
>
> Best,
>
> Anna
>
>
>
>
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: parse broken uri

2017-11-30 Thread lewis john mcgibbney
Hi Alfonso,
Can you please provide us with a URI which reproduces this issue?
If we can reproduce it, then we can register a ticket over at
https://issues.apache.org/jira/projects/ANY23
Thanks

On Thu, Nov 30, 2017 at 6:48 AM,  wrote:

>
> From: alfonso.debi...@libero.it
> To: user@any23.apache.org
> Cc:
> Bcc:
> Date: Thu, 30 Nov 2017 15:48:01 +0100 (CET)
> Subject: parse broken uri
>
> Hi users, I’m using any23 version 2.0 in my project, I have tested the
> extraction of RDF microformats from HTML pages. In this HTML there is an
> inconsistent URI, without protocol specification (example: //
> any23.apache.org instead of https://any23.apache.org )
>
> The library gives me the log:
>
> WARN rdf.Any23ValueFactoryWrapper: Not a valid (absolute) IRI:
>
> INFO extractor.SingleDocumentExtraction: Processing null
>
> I am seeing the method fixIRIWithException that fixes some potentially
> broken relative or absolute URI, but for this case it doesn’t fix this
> problem.Is it possible to integrate a patch to solve this problem? Thanks
>
> Best regards,
>
> Alfonso
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[RELEASE] Apache Any23 2.1

2017-11-01 Thread lewis john mcgibbney
Hi Folks,

The Apache Any23 Project Management Committee is happy to announce the
release and availability of Apache Any23 2.1.

What is Any23?

Anything To Triples (Any23 - https://any23.apache.org) is a library, a web
service and a command line tool that extracts structured data in RDF format
from a variety of Web documents. Any23 is licensed under the Apache License
v2.0. The DOAP for Any23 can be found at https://s.apache.org/roxX

How do I download Any23?

http://any23.apache.org/download.html

What's contained in this release?

Release Report: https://s.apache.org/lALE

Of significant interest to Any23 users will be the addition of
functionality to extract triples using the OpenIE System -
https://github.com/allenai/openie-standalone. The Open IE system runs over
sentences and creates extractions that represent relations in text, in the
case of Any23, this results in triples.

Thank you
Lewis
(On behalf of the Apache Any23 Project Management Committee)

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.1 RC#1

2017-11-01 Thread lewis john mcgibbney
Hi Everyone,
Let's bring this VOTE to a close. Below is the RESULT.

[3] +1, release Any23 2.1
Lewis John McGibbney*
Andy Seaborne*
Jacek Grzebyta*

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

*Any23 PMC Binding

Thank you to everyone that contributed towards this release. I will go
ahead and publish the release and then announce. Also thank you Andy for
the PING.
Best
Lewis

On Fri, Sep 15, 2017 at 1:11 AM, lewis john mcgibbney 
wrote:

> Hi user@ and dev@,
> Thank you to everyone who worked and used Any23 on it's 2.1-SNAPSHOT
> development drive, I would likVOTE to release Any23 2.1.
>
> We solved 10 issues:
> https://s.apache.org/34tJ
>
> Git source tag (fc6fd91df338f793da58bb80368ac421f544f7ee):
> https://s.apache.org/wO73
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1003/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/dev/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, release Any23 2.1
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> N.B. +1 from me
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: [VOTE] Release Apache Any23 2.1 RC#1

2017-09-19 Thread lewis john mcgibbney
Hi Folks,
@Andy,
Thanks for pointing this out, it is a shortcoming in my VOTE email.
The actual proposed release artifacts ca be located at
https://dist.apache.org/repos/dist/dev/any23/2.1/
Thanks
Lewis

On Fri, Sep 15, 2017 at 1:11 AM, lewis john mcgibbney 
wrote:

> Hi user@ and dev@,
> Thank you to everyone who worked and used Any23 on it's 2.1-SNAPSHOT
> development drive, I would likVOTE to release Any23 2.1.
>
> We solved 10 issues:
> https://s.apache.org/34tJ
>
> Git source tag (fc6fd91df338f793da58bb80368ac421f544f7ee):
> https://s.apache.org/wO73
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1003/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/dev/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, release Any23 2.1
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> N.B. +1 from me
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[VOTE] Release Apache Any23 2.1 RC#1

2017-09-15 Thread lewis john mcgibbney
Hi user@ and dev@,
Thank you to everyone who worked and used Any23 on it's 2.1-SNAPSHOT
development drive, I would likVOTE to release Any23 2.1.

We solved 10 issues:
https://s.apache.org/34tJ

Git source tag (fc6fd91df338f793da58bb80368ac421f544f7ee):
https://s.apache.org/wO73

Staging repo:
https://repository.apache.org/content/repositories/orgapacheany23-1003/

PGP release keys (signed using 48BAEBF6):
https://dist.apache.org/repos/dist/dev/any23/KEYS

Vote will be open for 72 hours.

[ ] +1, release Any23 2.1
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)

N.B. +1 from me

--
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Fwd: Digest for web-data-comm...@googlegroups.com - 1 update in 1 topic

2017-07-12 Thread lewis john mcgibbney
-- Forwarded message -
From: 
Date: Wed, Jul 12, 2017 at 5:15 AM
Subject: Digest for web-data-comm...@googlegroups.com - 1 update in 1 topic
To: Digest recipients 


web-data-comm...@googlegroups.com

Google
Groups


Topic digest
View all topics


   - ANN: WebDataCommons releases 24.4 billion quads RDFa, Microdata,
   Embedded JSON-LD and Microformat data originating from 2.7 million
   pay-level-domains <#m_7918109642067574354_group_thread_0> - 1 Update

ANN: WebDataCommons releases 24.4 billion quads RDFa, Microdata, Embedded
JSON-LD and Microformat data originating from 2.7 million pay-level-domains

Anna Primpeli : Jul 11 06:13AM -0700

Hello,

the updated schemaOrg subset files are now online
<
http://webdatacommons.org/structureddata/2015-11/stats/schema_org_subsets.html
>
.
Thank you once again for your feedback! Please let us know in case you face
any further problems.

Best,
Anna
Back to top <#m_7918109642067574354_digest_top>
You received this digest because you're subscribed to updates for this
group. You can change your settings on the group membership page

.
To unsubscribe from this group and stop receiving emails from it send an
email to web-data-commons+unsubscr...@googlegroups.com.
-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


any23.org Web Service Restored

2017-06-14 Thread lewis john mcgibbney
Hi Folks,
I recently worked on restoring our service to any23.org (via
any23-vm2.apache.org). It is running off of *Apache Any23 v.2.1-SNAPSHOT
(2017-06-05 19:11:41+) *as shown on the bottom of the page.
Please report any issue to our JIRA instance.
Thank you to INFRA for all of the assistance.
Best
Lewis

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: Use of Parentheses in IRI's

2017-04-26 Thread lewis john mcgibbney
Hi Andy,
Thanks for the response on this one.

On Wed, Apr 26, 2017 at 6:46 AM,  wrote:

>
> From: Andy Seaborne 
> To: user@any23.apache.org
> Cc:
> Bcc:
> Date: Wed, 12 Apr 2017 08:41:24 +0100
> Subject: Re: Use of Parentheses in IRI's
> It's not a W3C standard
>
> IRIs are RFC 3987
> URIs are RFC 3986
>
> Parentheses are legal in the path part, in the query string and in the
> fragment.
>
>ipchar = iunreserved / pct-encoded / sub-delims / ":"
>   / "@"
>
>sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
>   / "*" / "+" / "," / ";" / "="
>
> Andy
>
> http://www.sparql.org/iri-validator.html
>
>


Use of Parentheses in IRI's

2017-04-11 Thread lewis john mcgibbney
Hi Folks,
I was generating RDFXML from CSV using Any23 master and experienced some
issues parsing the resulting RDFXML [0]. Once I removed the parentheses
from the CSV column headers then regenerated the RDFXML using Any23 the
error did not occur.
Can anyone point me to W3C documentation which states that parentheses are
illegal in IRI's?
Thanks

[0] https://github.com/mmisw/orr-portal/issues/99

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[RELEASE] Apache Any23 2.0

2017-02-26 Thread lewis john mcgibbney
Hello Folks,

The Apache Any23 [0] project management committee are please to announce
the release of Any23 2.0 which marks a major milestone for the project.

Anything To Triples (any23) is a library, a web service and a command line
tool that extracts structured data in RDF format from a variety of Web
documents.

The release notes for this release can be found at [1] and the release
artifacts can be obtained from [2]. Additionally, Any23 Maven artifacts can
be found on Maven Central [3].

The Any23 DOAP can be located at [4].

Thank you, enjoy. Please report any issues to our community mailing lists
[5].
Lewis
On behalf of the Any23 Project Management Committee

[0] http://any23.apache.org
[1] https://github.com/apache/any23/blob/any23-2.0/RELEASE-NOTES.txt
[2] http://any23.apache.org/download.html
[3] http://search.maven.org/#search|ga|1|g%3A%22org.apache.any23%22
[4] https://s.apache.org/any23doap
[5] http://any23.apache.org/mail-lists.html

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.0

2017-02-26 Thread lewis john mcgibbney
Hi Folks,
Thank you to everyone that was able to VOTE.
The RESULT is as follows

[5] +1, let's get it rmblee!!!
Lewis John McGibbney *
Renato Marroquín Mogrovejo
Reto Gmür *
Kevin Ratnasekera
Andy Seaborne *

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

* Any23 PMC

The VOTE therefore passes :)
I'll go ahead and push the remainder of the release.
Thank you again to everyone for reviewing the release candidate. I'll open
an issue for the feedback and address it in the short term.
Lewis

On Tue, Feb 21, 2017 at 4:26 PM, lewis john mcgibbney 
wrote:

> Hi Folks,
> Another PING to on this thread. Thank you to everyone who has been able to
> review and VOTE.
> @Renato, did you try again to test and are you able to review?
> Lewis
>
> On Fri, Feb 10, 2017 at 2:45 PM, lewis john mcgibbney 
> wrote:
>
>> Hi user@ and dev@,
>>
>> I would like to open a VOTE thread to release Apache Any23 2.0. This VOTE
>> will be open for at least 72 hours.
>>
>> We solved 40 issues:
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
>> ctId=12312323&version=12338100
>>
>>
>> Git source tag signature (de5e1dbc4cd9e077062a5fbb02b9314fdae13df8):
>> https://git-wip-us.apache.org/repos/asf?p=any23.git;a=tag;h=
>> 653ef9dedb9417fe81ca4e8b2688e5c5343295f7
>>
>> Staging repo:
>> https://repository.apache.org/content/repositories/orgapacheany23-1002/
>>
>> Staging binaries:
>> https://dist.apache.org/repos/dist/dev/any23/
>>
>> PGP release keys (signed using 48BAEBF6):
>> http://apache.org/dist/any23/KEYS
>>
>> Vote will be open for 72 hours.
>>
>> [ ] +1, let's get it rmblee!!!
>> [ ] +/-0, fine, but consider to fix few issues before...
>> [ ] -1, nope, because... (and please explain why)
>>
>> P.S. Here is my +1
>>
>> --
>> http://home.apache.org/~lewismc/
>> @hectorMcSpector
>> http://www.linkedin.com/in/lmcgibbney
>>
>
>
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: [VOTE] Release Apache Any23 2.0

2017-02-21 Thread lewis john mcgibbney
Hi Folks,
Another PING to on this thread. Thank you to everyone who has been able to
review and VOTE.
@Renato, did you try again to test and are you able to review?
Lewis

On Fri, Feb 10, 2017 at 2:45 PM, lewis john mcgibbney 
wrote:

> Hi user@ and dev@,
>
> I would like to open a VOTE thread to release Apache Any23 2.0. This VOTE
> will be open for at least 72 hours.
>
> We solved 40 issues:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12312323&version=12338100
>
>
> Git source tag signature (de5e1dbc4cd9e077062a5fbb02b9314fdae13df8):
> https://git-wip-us.apache.org/repos/asf?p=any23.git;a=tag;h=
> 653ef9dedb9417fe81ca4e8b2688e5c5343295f7
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1002/
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/
>
> PGP release keys (signed using 48BAEBF6):
> http://apache.org/dist/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, let's get it rmblee!!!
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: [VOTE] Release Apache Any23 2.0

2017-02-17 Thread lewis john mcgibbney
Hi Renato,
I can't reproduce the errors you are getting. With the same artifact your
trying to test, I get the following

[INFO]

[INFO] Reactor Summary:
[INFO]
[INFO] Apache Any23 ... SUCCESS [
3.655 s]
[INFO] Apache Any23 :: Base API ... SUCCESS [
3.620 s]
[INFO] Apache Any23 :: Test Resources . SUCCESS [
0.387 s]
[INFO] Apache Any23 :: CSV Utilities .. SUCCESS [
1.129 s]
[INFO] Apache Any23 :: Mime Type Detection  SUCCESS [
3.373 s]
[INFO] Apache Any23 :: Encoding Detection . SUCCESS [
1.968 s]
[INFO] Apache Any23 :: Core ... SUCCESS [
13.073 s]
[INFO] Apache Any23 :: CLI  SUCCESS [
11.624 s]
[INFO] Apache Any23 :: Plugins :: Basic Crawler ... SUCCESS [
21.691 s]
[INFO] Apache Any23 :: Plugins :: HTML Scraper  SUCCESS [
2.831 s]
[INFO] Apache Any23 :: Plugins :: Office Scraper .. SUCCESS [
3.829 s]
[INFO] Apache Any23 :: Plugins :: Integration Test  SUCCESS [
47.188 s]
[INFO] Apache Any23 :: Service  SUCCESS [
21.753 s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 02:16 min
[INFO] Finished at: 2017-02-17T10:42:08-08:00
[INFO] Final Memory: 71M/930M
[INFO]


On Thu, Feb 16, 2017 at 10:33 AM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

>
>
>   EmbeddedJSONLDExtractorTest.testEmbeddedJSONLDInHead:30->Abs
> tractExtractorTestCase.assertExtract:223->AbstractExtractorTestCase.assertExtract:210
> ? Runtime
>
>
>   EmbeddedJSONLDExtractorTest.testSeveralEmbeddedJSONLDInHead:
> 37->AbstractExtractorTestCase.assertExtract:223->AbstractExt
> ractorTestCase.assertExtract:210 ? Runtime
>
> 2017-02-16 19:25 GMT+01:00 lewis john mcgibbney :
>
>> No, we build against Java 8. Can you look further into what tests are
>> failing and if there are any specific error messages?
>> Thanks
>>
>>
>> On Thu, Feb 16, 2017 at 10:00 AM, Renato Marroquín Mogrovejo <
>> renatoj.marroq...@gmail.com> wrote:
>>
>>> I downloaded this one in here:
>>>
>>>- apache-any23-2.0-src.tar.gz
>>>
>>> <https://dist.apache.org/repos/dist/dev/any23/apache-any23-2.0-src.tar.gz>
>>> wait a second, now that I think about it, it might be because
>>>of JAVA8? could that be it?
>>>
>>>
>>> 2017-02-16 18:57 GMT+01:00 Mcgibbney, Lewis J (398M) <
>>> lewis.j.mcgibb...@jpl.nasa.gov>:
>>>
>>>> Damn. No there should not be any issues. The build is stable
>>>> https://builds.apache.org/job/Any23-trunk/
>>>>
>>>> I’ll have a look later, I cannot reproduce this but maybe I am wrong.
>>>>
>>>>
>>>>
>>>> Dr. Lewis John McGibbney Ph.D., B.Sc.
>>>>
>>>> Data Scientist II
>>>>
>>>> Computer Science for Data Intensive Applications Group 398M
>>>>
>>>> Jet Propulsion Laboratory
>>>>
>>>> California Institute of Technology
>>>>
>>>> 4800 Oak Grove Drive
>>>>
>>>> Pasadena, California 91109-8099
>>>>
>>>> Mail Stop : 158-256C
>>>>
>>>> Tel:  (+1) (818)-393-7402 <(818)%20393-7402>
>>>>
>>>> Cell: (+1) (626)-487-3476 <(626)%20487-3476>
>>>>
>>>> Fax:  (+1) (818)-393-1190 <(818)%20393-1190>
>>>>
>>>> Email: lewis.j.mcgibb...@jpl.nasa.gov
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  Dare Mighty Things
>>>>
>>>>
>>>>
>>>> *From: *Renato Marroquín Mogrovejo 
>>>> *Date: *Thursday, February 16, 2017 at 9:55 AM
>>>> *To: *"Mcgibbney, Lewis J (398M)" ,
>>>> Lewis John Mcgibbney 
>>>> *Subject: *Fwd: [VOTE] Release Apache Any23 2.0
>>>>
>>>>
>>>>
>>>> I did and it's broken, I wrote to the list mate
>>>>
>>>>
>>>>
>>>> I tried doing  and core fails compiling in:
>>>>
>>>>
>>>>
>>>> Tests in error:
>>>>
>>>>   EmbeddedJSONLDExtrac

Re: [VOTE] Release Apache Any23 2.0

2017-02-16 Thread lewis john mcgibbney
No, we build against Java 8. Can you look further into what tests are
failing and if there are any specific error messages?
Thanks

On Thu, Feb 16, 2017 at 10:00 AM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> I downloaded this one in here:
>
>- apache-any23-2.0-src.tar.gz
><https://dist.apache.org/repos/dist/dev/any23/apache-any23-2.0-src.tar.gz>
> wait a second, now that I think about it, it might be because of
>JAVA8? could that be it?
>
>
> 2017-02-16 18:57 GMT+01:00 Mcgibbney, Lewis J (398M) <
> lewis.j.mcgibb...@jpl.nasa.gov>:
>
>> Damn. No there should not be any issues. The build is stable
>> https://builds.apache.org/job/Any23-trunk/
>>
>> I’ll have a look later, I cannot reproduce this but maybe I am wrong.
>>
>>
>>
>> Dr. Lewis John McGibbney Ph.D., B.Sc.
>>
>> Data Scientist II
>>
>> Computer Science for Data Intensive Applications Group 398M
>>
>> Jet Propulsion Laboratory
>>
>> California Institute of Technology
>>
>> 4800 Oak Grove Drive
>>
>> Pasadena, California 91109-8099
>>
>> Mail Stop : 158-256C
>>
>> Tel:  (+1) (818)-393-7402 <(818)%20393-7402>
>>
>> Cell: (+1) (626)-487-3476 <(626)%20487-3476>
>>
>> Fax:  (+1) (818)-393-1190 <(818)%20393-1190>
>>
>> Email: lewis.j.mcgibb...@jpl.nasa.gov
>>
>>
>>
>>
>>
>>
>>
>>  Dare Mighty Things
>>
>>
>>
>> *From: *Renato Marroquín Mogrovejo 
>> *Date: *Thursday, February 16, 2017 at 9:55 AM
>> *To: *"Mcgibbney, Lewis J (398M)" ,
>> Lewis John Mcgibbney 
>> *Subject: *Fwd: [VOTE] Release Apache Any23 2.0
>>
>>
>>
>> I did and it's broken, I wrote to the list mate
>>
>>
>>
>> I tried doing  and core fails compiling in:
>>
>>
>>
>> Tests in error:
>>
>>   EmbeddedJSONLDExtractorTest.testEmbeddedJSONLDInHead:30->Abs
>> tractExtractorTestCase.assertExtract:223->AbstractExtractorTestCase.assertExtract:210
>> ? Runtime
>>
>>   EmbeddedJSONLDExtractorTest.testSeveralEmbeddedJSONLDInHead:
>> 37->AbstractExtractorTestCase.assertExtract:223->AbstractExt
>> ractorTestCase.assertExtract:210 ? Runtime
>>
>>
>>
>> Is this expected? are there any JIRA issues tracking this?
>>
>>
>>
>>
>>
>> -- Forwarded message --
>> From: *Mcgibbney, Lewis J (398M)* 
>> Date: 2017-02-16 18:46 GMT+01:00
>> Subject: Re: [VOTE] Release Apache Any23 2.0
>> To: Renato Marroquín Mogrovejo , Lewis John
>> Mcgibbney 
>>
>> Yeah, if you can. Also, can you please check the signatures of the
>> artifacts. Thanks.
>>
>>
>>
>> Dr. Lewis John McGibbney Ph.D., B.Sc.
>>
>> Data Scientist II
>>
>> Computer Science for Data Intensive Applications Group 398M
>>
>> Jet Propulsion Laboratory
>>
>> California Institute of Technology
>>
>> 4800 Oak Grove Drive
>>
>> Pasadena, California 91109-8099
>>
>> Mail Stop : 158-256C
>>
>> Tel:  (+1) (818)-393-7402 <(818)%20393-7402>
>>
>> Cell: (+1) (626)-487-3476 <(626)%20487-3476>
>>
>> Fax:  (+1) (818)-393-1190 <(818)%20393-1190>
>>
>> Email: lewis.j.mcgibb...@jpl.nasa.gov
>>
>>
>>
>>
>>
>>
>>
>>  Dare Mighty Things
>>
>>
>>
>> *From: *Renato Marroquín Mogrovejo 
>> *Date: *Wednesday, February 15, 2017 at 11:40 PM
>> *To: *Lewis John Mcgibbney , "Mcgibbney,
>> Lewis J (398M)" 
>> *Subject: *Re: [VOTE] Release Apache Any23 2.0
>>
>>
>>
>> How do i test this man?mvn clean package and that's it?
>>
>> I'm happy to help out
>>
>>
>>
>> On Feb 16, 2017 12:48 AM, "lewis john mcgibbney" 
>> wrote:
>>
>> PING folks.
>> Would be nice to get reviews on this release candidate if possible. If we
>> can't get any then I'll approach community@ and/or the Incubator.
>> Thank you
>>
>> On Fri, Feb 10, 2017 at 2:45 PM, lewis john mcgibbney > >
>> wrote:
>>
>> > Hi user@ and dev@,
>> >
>> > I would like to open a VOTE thread to release Apache Any23 2.0. This
>> VOTE
>> > will be open for at least 72 hours.
>> >
>> > We solved 40 issues:
>> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>> > projectId=12312323&version=12338100
>> >
>> >
>> > Git source tag signature (de5e1dbc4cd9e077062a5fbb02b9314fdae13df8):
>> > https://git-wip-us.apache.org/repos/asf?p=any23.git;a=tag;h=
>> > 653ef9dedb9417fe81ca4e8b2688e5c5343295f7
>> >
>> > Staging repo:
>> > https://repository.apache.org/content/repositories/orgapacheany23-1002/
>> >
>> > Staging binaries:
>> > https://dist.apache.org/repos/dist/dev/any23/
>> >
>> > PGP release keys (signed using 48BAEBF6):
>> > http://apache.org/dist/any23/KEYS
>> >
>> > Vote will be open for 72 hours.
>> >
>> > [ ] +1, let's get it rmblee!!!
>> > [ ] +/-0, fine, but consider to fix few issues before...
>> > [ ] -1, nope, because... (and please explain why)
>> >
>> > P.S. Here is my +1
>> >
>> > --
>> > http://home.apache.org/~lewismc/
>> > @hectorMcSpector
>> > http://www.linkedin.com/in/lmcgibbney
>> >
>>
>>
>>
>> --
>> http://home.apache.org/~lewismc/
>> @hectorMcSpector
>> http://www.linkedin.com/in/lmcgibbney
>>
>>
>>
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: [VOTE] Release Apache Any23 2.0

2017-02-15 Thread lewis john mcgibbney
PING folks.
Would be nice to get reviews on this release candidate if possible. If we
can't get any then I'll approach community@ and/or the Incubator.
Thank you

On Fri, Feb 10, 2017 at 2:45 PM, lewis john mcgibbney 
wrote:

> Hi user@ and dev@,
>
> I would like to open a VOTE thread to release Apache Any23 2.0. This VOTE
> will be open for at least 72 hours.
>
> We solved 40 issues:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12312323&version=12338100
>
>
> Git source tag signature (de5e1dbc4cd9e077062a5fbb02b9314fdae13df8):
> https://git-wip-us.apache.org/repos/asf?p=any23.git;a=tag;h=
> 653ef9dedb9417fe81ca4e8b2688e5c5343295f7
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1002/
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/
>
> PGP release keys (signed using 48BAEBF6):
> http://apache.org/dist/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, let's get it rmblee!!!
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[VOTE] Release Apache Any23 2.0

2017-02-10 Thread lewis john mcgibbney
Hi user@ and dev@,

I would like to open a VOTE thread to release Apache Any23 2.0. This VOTE
will be open for at least 72 hours.

We solved 40 issues:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12312323&version=12338100


Git source tag signature (de5e1dbc4cd9e077062a5fbb02b9314fdae13df8):
https://git-wip-us.apache.org/repos/asf?p=any23.git;a=tag;h=653ef9dedb9417fe81ca4e8b2688e5c5343295f7

Staging repo:
https://repository.apache.org/content/repositories/orgapacheany23-1002/

Staging binaries:
https://dist.apache.org/repos/dist/dev/any23/

PGP release keys (signed using 48BAEBF6):
http://apache.org/dist/any23/KEYS

Vote will be open for 72 hours.

[ ] +1, let's get it rmblee!!!
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)

P.S. Here is my +1

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Registering basic-crawler plugin

2017-01-25 Thread lewis john mcgibbney
Hi Folks,
I'm working off of master and trying to register the basic-crawler plugin.
I'm following our website documentation (working around the obvious
documentation error, which I will correct in a subsequent PR) which can be
found at [0] in order to register the basic-crawler, followed by [1] to
attempt to use it.
I've stepped through the code in Eclipse and can see the local plugin
repository being interpreted correctly as per the code at [2], however it
is not appearing in the ToolRunner invocation... which is puzzling me.
Can someone else please try this out and let me know how you get on?
Thanks


[0] http://any23.apache.org/any23-plugins.html#How_to_Register_a_Plugin
[1] http://any23.apache.org/getting-started.html#crawler-tool
[2]
https://github.com/apache/any23/blob/master/cli/src/main/java/org/apache/any23/cli/ToolRunner.java#L244-L261

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: Issues while building and using Any23

2016-06-15 Thread Lewis John Mcgibbney
Hi Wouter,

On Thu, Jun 9, 2016 at 4:17 AM,  wrote:

> From: Wouter Beek 
> To: user@any23.apache.org
> Cc:
> Date: Thu, 9 Jun 2016 14:16:37 +0300
> Subject: Issues while building and using Any23
> Hi Any23 maintainers,
>
> I'm trying to install from Git master.
>

Cool. Nice to hear more people running off of master branch.


> I've come across the following
> issues:
>
> 1. I had to add `true` to the Surefire plugin
> configuration in `pom.xml` in order to suppress the test-related errors in
> `mvn clean install`.  Maybe these tests could be put behind `mvn test` so
> that the casual user who compiles from sources does not have to bother with
> them?  (The tests also print a _lot_ of stuff to user output.  Not all of
> it seems useful under the default verbosity level.)
>

OK so we are aware of the tests failing this has to do with one of the
underlying SAX parsers (which actually exists over in semargl) being very
strict with its interpretation of the InputStream.
There is an open pull request to address this but it needs more work. If
you are interested then you can find current patch and discussion over at
https://github.com/apache/any23/pull/24

Second issue regarding verbose nature of logs has been addressed and pushed
to master branch cf. https://issues.apache.org/jira/browse/ANY23-293

This now also means that you only get INFO logging when running the Any23
core application.


>
> 2. Since my distro comes with JDK 1.8 (and switching JDK versions has
> always been somewhat of a Black Art for me) I had to remove
> `-XX:PermSize=128m` from the `` setting in `pom.xml`.
> This JVM feature is no longer supported in Java 8, aparently.
>

We are not fully migrated to JDK1.8 yet. There are a bunch of Javadoc
issues to deal with before we do that. Most likely we will do that for the
1.3 release of Any23 e.g. after the pending 1.2 release.


>
> 3. When I run `bin/any23` from the core package I always see the following
> at the top of user output:
>
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
> further details.
>
> To me this seems as if a default setup for the logging infrastructure is
> currently missing?
>

This has also been fixed cf. https://issues.apache.org/jira/browse/ANY23-293
and https://issues.apache.org/jira/browse/ANY23-292
If you pull from master branch the logging will be much more eye friendly
now!


>
> 4. The help flag does not seem to work for me in the CLI:
>
> $ any23 rover -h
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
> further details.
> Exception in thread "main" com.beust.jcommander.ParameterException:
> Unknown option: -h
> at com.beust.jcommander.JCommander.parseValues(JCommander.java:735)
> at com.beust.jcommander.JCommander.parse(JCommander.java:279)
> at com.beust.jcommander.JCommander.parse(JCommander.java:262)
> at com.beust.jcommander.JCommander.parseValues(JCommander.java:780)
> at com.beust.jcommander.JCommander.parse(JCommander.java:279)
> at com.beust.jcommander.JCommander.parse(JCommander.java:262)
> at org.apache.any23.cli.ToolRunner.execute(ToolRunner.java:96)
> at org.apache.any23.cli.ToolRunner.main(ToolRunner.java:69)
>
> Has something gone wrong during Maven install / can others reproduce this
> error?
>

It works absolutely fine for me

lmcgibbn@LMC-032857
/usr/local/any23/core/target/apache-any23-core-1.2-SNAPSHOT(master) $
./bin/any23 -h
Usage: any23 [options] [command] [command options]
  Options:
-h, --help
   Display help information.
   Default: false
--plugins-dir
   The Any23 plugins directory.
   Default: /Users/lmcgibbn/.any23/plugins
-X, --verbose
   Produce execution verbose output.
   Default: false
-v, --version
   Display version information.
   Default: false
  Commands:
extractor  Utility for obtaining documentation about metadata
extractors.
  Usage: extractor [options] Extractor name
Options:
  -a, --all
 shows a report about all available extractors
 Default: false
  -i, --input
 shows example input for the given extractor
 Default: false
  -l, --list
 shows the names of all available extractors
 Default: false
  -o, --outut
 shows example output for the given extractor
 Default: false

microdata  Commandline Tool for extracting Microdata from file/HTTP
source.
  Usage: microdata [options] Input document URL, {
http://path/to/resource.html|file:/path/to/localFile.html}

mimes  MIME Type Detector Tool.
  Usage: mimes [options

Re: Install Apache Any23

2016-05-26 Thread Lewis John Mcgibbney
Hi James,
CC user@,
The documentation on both is good AFAIK.
Are you looking for installation? Or programmatic usage? Both are available
from the Website.
If you are looking for advice on how to use the REST service available at
http://any23.org/, there is documentation at the bottom of that webservice
page.
Thanks
Lewis

On Thu, May 26, 2016 at 4:50 PM, James Scheibner 
wrote:

> Hi Lewis,
>
> I assume you're the lead developer on Apache Any23. Do I follow the
> installation instructions on the Apache Any23 website or on GitHub? Thanks.
>
> Cheers
>
> James Scheibner
>



-- 
*Lewis*


Re: Restrictive access rights for `bin' directory

2016-04-02 Thread Lewis John Mcgibbney
Hi,
Yes master branch is unstable right now. The Unit tests fail for parsing
microdata which is extracted from malformed HTML.
I am going to have a crack at fixing the tests this weekend again. Please
check out the most recent pull requests on the Github mirror for a better
description of the issue.
For the time being please just skip the tests. Thanks

On Saturday, April 2, 2016, Wouter Beek  wrote:

> Hi Lewis,
>
> > This is fixed in trunk cf.
> https://issues.apache.org/jira/browse/ANY23-79
> > Basically if you work from master branch then it is 'fixed'.
> Nice!  Running the latest and greatest from trunk is way cooler anyway :-)
>
> I run into an issue with `mvn clean install` on the latest commit
> (i.e., 108d...).  IIRC Surefire is for running unit tests.  Am I allowed
> to skip them for compilation?  Here's the Maven output:
>
> ~~~
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test (default-test)
> on project apache-any23-core: There are test failures.
> [ERROR]
> [ERROR] Please refer to /home/wbeek/Git/any23/core/target/surefire-reports
> for the individual test results.
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
> [ERROR]   mvn  -rf :apache-any23-core
> ~~~
>
> The Maven feedback is correct: there's a bunch of files in
> `$ANY23_HOME/core/target/surefire-reports` that should probly explain
> which test failed why, but I'm not knowledgeable enough to investigate
> them ATM.
>
> Here's my Maven/Java version:
>
> ~~~
> $ mvn -version
> Apache Maven 3.3.3 (NON-CANONICAL_2015-07-10T12:37:52_mockbuild;
> 2015-07-10T14:37:52+02:00)
> Maven home: /usr/share/maven
> Java version: 1.8.0_77, vendor: Oracle Corporation
> Java home: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.77-1.b03.fc23.x86_64/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "4.4.6-300.fc23.x86_64", arch: "amd64", family:
> "unix"
> ~~~
>
> --
> Cheers,
> Wouter.
>


-- 
*Lewis*


Re: Restrictive access rights for `bin' directory

2016-04-01 Thread Lewis John Mcgibbney
Hi Woulter,

On Thu, Mar 31, 2016 at 1:25 PM,  wrote:

> From: Wouter Beek 
> To: user@any23.apache.org
> Cc:
> Date: Thu, 31 Mar 2016 22:25:41 +0200
> Subject: Restrictive access rights for `bin' directory
> Hi!,
>
> My name is Wouter Beek, I'm principal developer of the LOD Laundromat
> (http:// lodlaundromat.org) which cleans tens
> of billions of RDF
> statements.  I'm currently using the RDF parsers from SWI-Prolog's
> Semweb library
> (
> http://www.swi-prolog.org/pldoc/doc_for?object=section%28%27packages/semweb.html%27%29
> )
> but I'm of course also very interested in trying out Any23.
>

Nice


>
> So... when I download the file `apache-any23-core-1.1.tar.gz' from
> the site I unpack it in my home dir.  First thing I notice is that the
> `bin' directory has very restrictive access rights `d-.', i.e.,
> it is not possible to run the Any23 CLI tools out-of-the-box.
>

This is fixed in trunk cf. https://issues.apache.org/jira/browse/ANY23-79
Basically if you work from master branch then it is 'fixed'.


>
> My questions are twofold:
>   1. Is there any particular reason for these access rights being so
>   strict by default?  Does it not hamper the out-of-the-box experience?
>

Yes it did. It was an issue with the Maven plugin.. we fixed it


>   2. Should the extra step of changing the access rights to `bin' be
>   mentioned as part of the `README.txt' file?
>
>
It should be if it was still an issue.
Thanks for registering the issue here. Please keep us updated on how
lodlaundromat.org goes.
Thanks


Re: Command Line Interface

2016-04-01 Thread Lewis John Mcgibbney
Hi Christian,
Replies inline

On Thu, Mar 31, 2016 at 1:25 PM,  wrote:

> From: "Christian T Stackhouse (Campus)" 
> To: "user@any23.apache.org" 
> Cc:
> Date: Thu, 24 Mar 2016 16:37:59 +
> Subject: Command Line Interface
>
> I need to batch convert several hundred files to RDF. I have split the
> files from larger files because of the 20 limit for the local server
> conversion. Is there a way to batch convert files through the command line
> and is there a way to convert larger files (approx 30mb)? I am converting
> text files to RDF.
>

In short no.
This needs tobe your own logic. We have an example in Nutch of how this can
work in a parallel manner
https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/parse/Parser.java

>
> Also, I have an owl doc that can stylize my files, but I'm not sure how to
> incorporate it. Any help or suggestions would be greatly appreciated!
>

In all honesty, with regards to styling, I have not got a clue. Are you
sure it is owl and not XSLT?

>
> I'm running a 64bit ubuntu VM on my 64bit windows 10 laptop. I also have
> access to a compute server with plenty of resources running centos and BASH
> interface.
>
Sounds good. Please let us know about the above and we can take it from
there.
Thanks


Re: GSOC 2016

2016-04-01 Thread Lewis John Mcgibbney
Hi Cihad,

On Thu, Mar 31, 2016 at 1:25 PM,  wrote:

>
> From: Cihad Guzel 
> To: user@any23.apache.org
> Cc:
> Date: Thu, 24 Mar 2016 10:35:01 +0200
> Subject: Re: GSOC 2016
> Hi Lewis.
>
> Yes. I am still interested.  I'm open to any suggestions
>
>
sI didn't see anything in the GSoC tracker. I may have missed it. Did you
have a link to your proposal please?
Thanks
Lewis


Re: GSOC 2016

2016-03-23 Thread Lewis John Mcgibbney
Hi Cihad,
Are you still interested in this? If so then please let us know and we can
find you a project.
Thank you
Lewis

On Mon, Mar 7, 2016 at 2:38 PM,  wrote:

>
> From: Cihad Guzel 
> To: d...@any23.apache.org, user@any23.apache.org
> Cc:
> Date: Tue, 8 Mar 2016 00:38:49 +0200
> Subject: GSOC 2016
> Hi,
>
> I want to apply GSOC 2016. I examined Any23. It sounds interesting. Could
> you suggest me any issue for GSOC ?
>
> I participated in GSOC 2015 for Apache Nutch last year and I completed it
> successfully.
>
> You can see my previous works for GSOC :
> https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler
>
> Kind Regards
> Cihad Guzel
>
>


-- 
*Lewis*


[GSoC] ANY23-249 Update all W3C and other Standards Compliance within Any23

2016-02-19 Thread Lewis John Mcgibbney
Hi user@ and dev@,
This year, again I am going to propose (and put myself forward to be a
mentor) for the above issue, the ticket for which can be located at [0].
I am going to announce this on the W3C lists shortly as well as over on
schema.org, microformats, dbpedia lists, etc.
If there are any students out there on this list and interested in
participating then please let me know by contacting me lewismc [at] apache
[dot] org  CC'ing dev [at] any23 [dot] apache [dot] org
Thanks
Lewis

[0] https://issues.apache.org/jira/browse/ANY23-249

-- 
*Lewis*


Fwd: private Digest 5 Feb 2016 18:05:42 -0000 Issue 149

2016-02-05 Thread Lewis John Mcgibbney
-- Forwarded message --
From: 
Date: Fri, Feb 5, 2016 at 10:05 AM
Subject: private Digest 5 Feb 2016 18:05:42 - Issue 149
To: priv...@any23.apache.org



private Digest 5 Feb 2016 18:05:42 - Issue 149

Topics (messages 407 through 407)

[REMINDER] ApacheCon NA 2016 Travel Assistance Applications now open!
407 by: lewis john mcgibbney

Administrivia:

-
To post to the list, e-mail: priv...@any23.apache.org
To unsubscribe, e-mail: private-digest-unsubscr...@any23.apache.org
For additional commands, e-mail: private-digest-h...@any23.apache.org

--



-- Forwarded message --
From: lewis john mcgibbney 
To: undisclosed-recipients:;
Cc:
Date: Fri, 5 Feb 2016 10:05:40 -0800
Subject: [REMINDER] ApacheCon NA 2016 Travel Assistance Applications now
open!
Hi pmcs@,

The Travel Assistance Committee (TAC) are pleased to announce that travel
assistance applications for ApacheCon North America 2016 are now open! This
announcement serves as a purpose for you (pmcs@) to let members of your
community know about both ApacheConNA 2016 and about the TAC assistance to
attend. Could you please forward this announcement to your community,
along  with (if possible) information on how your project is involved in
ApacheCon this year?

We will be supporting ApacheCon NA, Vancouver BC, May 9th - 13th 2016.

TAC exists to help those that would like to attend ApacheCon events, but
are unable to do so for financial reasons. For more info on this years
applications and qualifying criteria please visit the TAC website at <
http://www.apache.org/travel/ >.   Applications are already open, so don't
delay!

*Important dates*...

   - CFP Close: February 12, 2016
   - CFP Notifications: February 29, 2016
   - TAC Applications close:  March 2, 2016
   - Schedule Announced: March 3, 2016

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as required
to efficiently and accurately process your request), this will enable TAC
to announce successful awards shortly afterwards.

As usual TAC expects to deal with a range of applications from a diverse
range of backgrounds. We therefore encourage (as always) anyone thinking
about sending in an application to do so ASAP.

We look forward to greeting many of you in Vancouver, BC in May 2016!

Kind Regards

Lewis

(On behalf of the Travel Assistance Committee)




-- 
*Lewis*


Fwd: ApacheCon NA 2015 Travel Assistance Applications now open!

2015-12-07 Thread Lewis John Mcgibbney
Hi user@ and dev@,
Please see below for opportunities to obtain Travel Assistance funding to
attend the forthcoming ApacheCon NA which is being held Vancouver BC, May
9th - 13th 2016.
Would be great to see our project represented.
Best
Lewis

-- Forwarded message --
From: 
Date: Mon, Dec 7, 2015 at 8:15 PM
Subject: private Digest 8 Dec 2015 04:15:55 - Issue 143
To: priv...@any23.apache.org



private Digest 8 Dec 2015 04:15:55 - Issue 143

Topics (messages 399 through 400)

Re: Immediate change to git
399 by: Peter Ansell

ApacheCon NA 2015 Travel Assistance Applications now open!
400 by: lewis john mcgibbney

Administrivia:

-
To post to the list, e-mail: priv...@any23.apache.org
To unsubscribe, e-mail: private-digest-unsubscr...@any23.apache.org
For additional commands, e-mail: private-digest-h...@any23.apache.org

--



-- Forwarded message --
From: Peter Ansell 
To: "priv...@any23.apache.org" 
Cc: David Nalley 
Date: Wed, 4 Nov 2015 08:59:44 +1100
Subject: Re: Immediate change to git
Hi David,

Eventually, projects that create short-lived branches for each JIRA
issue, will not appreciate this measure, but that will not be a short
term effect. More of a medium term or long term effect.

I know I have been meaning to clean out the merged branches in the
Any23 git repository, but have not got around to it. Luckily, they
have not built up to the point where they are unmanageable yet, as
there is only 5 or so in the apache repository at this point so a
hiatus on that will not affect us.

Cheers,

Peter

On 4 November 2015 at 07:40, David Nalley  wrote:
> Hi folks,
>
> After the many emails you may have seen around Git, I am writing yet
another.
>
> To date, on our git repos, we've only 'protected' master, trunk, and
> release branches and tags. This has left other branches open to
> rewriting, force pushes, and branch deletion.
>
> Recently, we've discovered that many projects (just under 50) have one
> or more repos that are using something other than master or trunk as
> their main development branch. In some cases this is a 'develop'
> branch in others it's more like $project_version which leaves those
> branches open to deletion, rewriting, etc.
>
> So today, we're taking an interim step of disabling non-fast-forward
> pushes and branch deletion across all of our git repos. I emphasize
> interim, as it's a stop-gap measure to get us back to the level of
> protection we've set expectations for. I know that this will be
> disruptive to many folks' way of operating in their git environment,
> so we are hoping to make this interim solution short lived. If your
> project has immediate needs that you find are blocked by this, please
> do reach out to the Infrastructure team, and we will work to make sure
> we can help with a timely workaround for those specific cases.
>
> The longer term solution to this issue may be a policy decision or it
> might be a technical solution. I sadly don't know what that solution
> will be. We are going to be discussing this on the public
> infrastructure-dev mailing list, and I invite you to join us in that
> discussion.
>
> --David



-- Forwarded message --
From: lewis john mcgibbney 
To:
Cc: "travel-assista...@apache.org" 
Date: Mon, 7 Dec 2015 20:15:50 -0800
Subject: ApacheCon NA 2015 Travel Assistance Applications now open!
Hi pmcs@,

The Travel Assistance Committee (TAC) are pleased to announce that travel
assistance applications for ApacheCon NA 2016 are now open!

This announcement serves as a purpose for you (pmcs@) to let members of
your community know about both ApacheCon NA 2016 and about the TAC
assistance to attend. Could you please forward this announcement to your
community, along with (if possible) information on how your project is
involved in ApacheCon this year?

We will be supporting ApacheCon NA, Vancouver BC, May 9th - 13th 2016.

TAC exists to help those that would like to attend ApacheCon events, but
are unable to do so for financial reasons. For more info on this years
applications and qualifying criteria, please visit the TAC website at <
http://www.apache.org/travel/ >. Applications are already open, so don't
delay!

This year ApacheCon NA is split into two separate themed events - Apache
BigData
<http://events.linuxfoundation.org/events/apache-big-data-north-america>
and *ApacheCon*
<http://events.linuxfoundation.org/events/apachecon-north-america>. Due to
the small time frame of each event (3 days and 2 days) the Apache Travel
Assistance Committee will only be accepting applications from those people
that are able to attend BOTH events.


*Importa

Re: Extracting Meta Tags

2015-12-07 Thread Lewis John Mcgibbney
Hi Frank,

On Mon, Dec 7, 2015 at 3:50 PM,  wrote:

>
> I'm trying to extract meta tags from webpages.  I'm using the code below
> but am finding that only a small subset of meta tags are being returned.
> There are meta tags like those for facebook open graph that i am interested
> in that are not being returned?
>

By default Any23 Configuration [0] defines that HTML head meta tags should
be extracted by default. There is therefore no need to change this
behaviour as extraction of HTML meta tags 'should' be happening by default.
You are also correctly defining this within your code as below!
Can you please post an example of a URL we can test against?
Thanks
Lewis

[0]
https://github.com/apache/any23/blob/master/api/src/main/resources/default-configuration.properties#L70


Re: Processing Recipes

2015-12-07 Thread Lewis John Mcgibbney
Hi Frank,

Answer below

On Mon, Dec 7, 2015 at 3:50 PM,  wrote:

>
> Hi, Im trying to process recipes that are marked up, one example of such a
> recipe is:
>
> http://allrecipes.com/recipe/203229/moms-buttermilk-pancakes/
>
> This page can be processed by google rich snippets, but when I try the
> following it doesn't return results:
>
> any23 rover -e html-mf-hrecipe
> http://allrecipes.com/recipe/203229/moms-buttermilk-pancakes/
>
> Using the following I get json results but they are generic (not recipe
> specific):
>
> sudo any23 rover -e html-microdata
> http://allrecipes.com/recipe/203229/moms-buttermilk-pancakes/
>
> Am I missing something?  My ultimate goal is to get the recipe into a java
> object, what would be the best way to do that?
>
>
 When I try this with the Any23.org service at any23.org (running off of
Any23-trunk) I get the following error. Do you have another page we can try?
Thanks

org.apache.any23.extractor.ExtractionException: Error while parsing
RDF document.
at 
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:109)
at 
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:41)
at 
org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:463)
at 
org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:255)
at org.apache.any23.Any23.extract(Any23.java:298)
at org.apache.any23.Any23.extract(Any23.java:450)
at 
org.apache.any23.servlet.WebResponder.runExtraction(WebResponder.java:114)
at org.apache.any23.servlet.Servlet.doGet(Servlet.java:79)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:618)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:301)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:503)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:136)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:74)
at 
org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:526)
at 
org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:794)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:652)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1575)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1533)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.openrdf.rio.RDFParseException:
org.xml.sax.SAXParseException; lineNumber: 11; columnNumber: 788;
Element type "n.length" must be followed by either attribute
specifications, ">" or "/>".
at 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:111)
at 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:95)
at 
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:105)
... 29 more
Caused by: org.semarglproject.rdf.ParseException:
org.xml.sax.SAXParseException; lineNumber: 11; columnNumber: 788;
Element type "n.length" must be followed by either attribute
specifications, ">" or "/>".
at 
org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1130)
at org.semarglproject.source.XmlSource.process(XmlSource.java:50)
at 
org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87)
at 
org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167)
at 
org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154)
at 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:109)
... 31 more
Caused by: org.xml.sax.

[DISCUSS] Release Apache Any23 1.2?

2015-12-03 Thread Lewis John Mcgibbney
Hi Folks,
The title says it all?
Our Roadmap for 1.2 says that we have fixed 12 of 26 issues.
https://issues.apache.org/jira/browse/ANY23/?selectedTab=com.atlassian.jira.jira-projects-plugin:roadmap-panel
I am happy to be release manager for Any23 1.2. If I don't hear anything
bad then I will push a release candidate over the weekend.
Thanks
Lewis

-- 
*Lewis*


Re: New to Any23 - Need examples

2015-09-17 Thread Lewis John Mcgibbney
Hi Akhil,

On Thu, Sep 17, 2015 at 2:48 AM,  wrote:

>
> I'm a 3 year experienced Java dev.I found any23 on my search to find good
> open source projects to contribute on. The whole concept of RDF is new for
> me. I couldn't find any 'examples' of what any23 can do.
>
> 1. I installed any23 CLI in my local windows machine. There was an slf4j
> error that I fixed by adding an extra jar slf4j-simple-1.7.5.jar and adding
> it to any23.bat > set CLASSPATH
>

Can you please open an issue in Jira and submit a patch for this? It would
be nice to clean this up.


>
> 2. I did some rover queries to google.com / quora.com etc. I did not get
> any interesting data.
>

What did you get? Can you paste it somewhere external and provide a link or
else paste it here?


>
> Could some one please show me an interesting example usecase of how any23
> is practically applicable. I couldn't find any tutorials in the internet
> too. May be I can add those to the documentation once I get to know it.
>

W3C websites are usually quite rich in metadata. Also news websites and
other media outlets are usually quite good.


>
> Here is an example usecase I tried, but didnt understand what to do with
> this kind of functionality. Please show me few good usecases:
>
>
>
Typically large home pages like this are not particularly rich in embedded
structured metadata.
Lewis


Re: Converting between .ttl turtle format and rdf

2015-08-24 Thread Lewis John Mcgibbney
Hi Marco,

On Wed, Aug 5, 2015 at 4:44 AM,  wrote:

>
> I have a .ttl file and I would like to use any23 to convert it to an rdf
> file. Can you please advise on how to do so? Thanks!
>
>
This is very basic use of the Any23 API and can be done easily.
We have an example on the Any23 website which you can see
http://any23.apache.org/dev-data-conversion.html
hope this helps you out.
Thanks
Lewis

-- 
*Lewis*


Re: Using String as an input

2015-08-24 Thread Lewis John Mcgibbney
Hi Murik,

On Wed, Aug 5, 2015 at 4:44 AM,  wrote:

>
> Is it possible to have *DocumentSource* input as a String instead of a
> File? I looked through the *org.apache.any23.source* package and tried
> using *StringDocumentSource* class, but it requires URI as a parameter.
> If I enter some uri (fake or real) it doesn't parse the String contents.
> Thanks everyone in advance.
>
>
Yes.
 You can check out the Javadoc which explains to you what the syntax needs
to look like
http://any23.apache.org/apidocs/index.html?org/apache/any23/source/StringDocumentSource.html
*StringDocumentSource
*
(String

in,
String

 uri)
URI can be anything... an arbitrary placeholder for testing purposes.
Can you provide results?
Thanks
Lewis


Re: Content is not allowed in prolog error

2015-04-09 Thread Lewis John Mcgibbney
Hi Meraj,

On Sun, Mar 22, 2015 at 8:29 PM,  wrote:

>
> I can confirm that this is an issue starting with Apache Any23 1.0 and
> not before that release. And also that this happens when the triples
> are being retrieved from with in a multi-threaded execution and not so
> far when a single thread is used.
>
>
Apologies for horribly late reply!!!

I've tried it against the any23.org webservice and get the following

{ "quads" : [[{ "type" : "uri", "value" : "http://any23.org/tmp/"}, "
http://vocab.sindice.net/any23#Keywords";, {"type" : "literal", "value" :
"wedding registry, wedding planning, wedding gift,gift registry, wedding
gifts, wedding presents, bridal registry, wedding registry checklist,bridal
shower, top wedding registry, wedding registry search, wedding registry
ideas, wedding registry tips", "lang" : null, "datatype" : null}, null], [{
"type" : "uri", "value" : "http://any23.org/tmp/"}, "
http://vocab.sindice.net/any23#Description";, {"type" : "literal", "value" :
" Create or find a wedding registry on Walmart.com.Walmart Gift Registry
for wedding gifts and bridal showers.", "lang" : null, "datatype" : null},
null], [{ "type" : "uri", "value" : "http://any23.org/tmp/"}, "
http://vocab.sindice.net/any23#title";, {"type" : "literal", "value" :
"Wedding Registry - Walmart.com", "lang" : null, "datatype" : null}, null],
[{ "type" : "uri", "value" : "http://any23.org/tmp/"}, "
http://vocab.sindice.net/any23#viewport";, {"type" : "literal", "value" :
"width=1024", "lang" : null, "datatype" : null}, null]]}

So, if you can debug why the error occurs within a multi-threaded scenario
then please log an issue and we can possibly test for this. Thanks
Lewis


Building Triples from Oracle

2015-04-08 Thread Lewis John Mcgibbney
hi Folks,
A colleague recently asked me about building triples from an Oracle DB.
We had a good discussion on use of vocabularies and a whole bunch other
stuff relating to SPARQL but I was left with no concise answer as to how to
build triples graphs from Relational data in Oracle.
Who here has achieved this? Or has put thought into it before?
Thanks
Lewis


-- 
*Lewis*


[DEADLINE] Google Summer of Code Deadline Approaching Soon

2015-03-25 Thread Lewis John Mcgibbney
Hi All,
The deadline for this years GSoC student submissions is approaching fast
and I would be very keen to see more proposals from the communities above.
I've been involved on and off with several students from across all of the
above communtiies hence the reason I am emailing these lists.
I would strongly suggest that if any students are still planning on
submitting, to get the submissions in ASAP.
Thanks
Lewis


-- 
*Lewis*


Re: Concurrent HTTP requests?

2015-03-02 Thread Lewis John Mcgibbney
Hey Luca,

On Mon, Mar 2, 2015 at 1:08 PM,  wrote:

>
> I'm new to using Any23, and it's already been a great library to use.
>

great


> However I'm stuck with something rather basic. I followed this example
> on how to simply GET a URL and return the triples it contains:
> http://any23.apache.org/dev-data-extraction.html
>

OK


>
> I'd like to run many HTTP requests in a non-blocking fashion,
> concurrently. Are there facilities to do this using the HTTP code
> contained in Any23?
>
> There is no code in Any23 for this. You may wish to investigate the Any23
Basic HTTP crawler plugin however
https://github.com/apache/any23/tree/master/plugins/basic-crawler
You can define the number of crawlers on the command line
https://github.com/apache/any23/blob/master/plugins/basic-crawler/src/main/java/org/apache/any23/cli/Crawler.java#L67
As an alternative you could investigate using something like Crawler
Commons [0] or Apache Nutch [1] for dealing with the HTTP logic

[0] https://code.google.com/p/crawler-commons/
[1] http://nutch.apache.org


Re: Maven dependency entry for Apache Any23 1.2-SNAPSHOT

2015-02-25 Thread Lewis John Mcgibbney
Hi Meraj,

YES

On Wed, Feb 25, 2015 at 4:36 PM,  wrote:

>
>
> The following entry for a maven dependency to point to the SNAPSHOT
> release does not work , do you know what entry should be made to
> pom.xml to include the 1.2-SNAPSHOT
>
> 
> org.apache.any23
> apache-any23-core
> 1.2-SNAPSHOT
> 
>
>
>
You need to include the following snippet

https://github.com/apache/any23/blob/master/pom.xml#L499-L510

However substitute the contents with the following

https://repository.apache.org/content/repositories/snapshots/
Thanks
Lewis







-- 
*Lewis*


Re: Question about n-triple parsing

2015-02-10 Thread Lewis John Mcgibbney
Hi Souri,

On Tue, Feb 10, 2015 at 10:25 PM, souri datta 
wrote:

> Thanks for the link Lewis. Its a nice tool.
> What I want is an API to convert a string representation to subject ,
> predicate and object. Can you point me to some sample code (inside any23)
> which does this?
>
>
> Yes certainly
First some Javadoc. The DocumentSource [0] is our main interface
representing a data structure which we extract triples relations from.
In particular please see the StringDocumentSource [1]. It should be noted
that the more arguments provided as parameters the better chance of good
triples extraction.
Now for some code,
Please see our data extraction guide [2], in this case we use the
HTTPDoumentSource, however you can substitute this with your
StringDocumentSource, run it and then tell us how you get on.

Hope this helps.
Feel to subscribe to the mailing lists either at
user-subscr...@any23.apache.org or dev-subscr...@apache.org
They are both pretty low volume mailing lists.
Thanks
Lewis

[0]
http://any23.apache.org/apidocs/index.html?org/apache/any23/source/DocumentSource.html
[1]
http://any23.apache.org/apidocs/index.html?org/apache/any23/source/StringDocumentSource.html
[2] http://any23.apache.org/dev-data-extraction.html


Re: Question about n-triple parsing

2015-02-10 Thread Lewis John Mcgibbney
Hi Souri,
You are asking for a good library and Any23 immediately comes to mind.
http://any23.apache.org
We have a service at http://any23.org which you can use to test out any of
your queries.
Please give us a shout if Any23 is NOT what you are looking for and we can
discover exactly what is wrong.
Thanks
Lewis

On Tue, Feb 10, 2015 at 6:13 PM, souri datta 
wrote:

> Hi Lewis,
>  I work for Yahoo on the information extraction field. I am trying out
> various libraries for parsing n-triples but the problem is none of them
> seem to generate the same subject id if the node is a blank node (starting
> with "_:").
> Do you have some good library in mind which I can look at for this
> purpose? (Java preferred).
>
> Simply put, given a line as "subject predicate object .", I want to parse
> and get the subject , predicate , object without any modification to the
> values.
>
> Thanks,
> Souri
>



-- 
*Lewis*


[INVITATION] Google Summer of Code 2015

2015-01-29 Thread Lewis John Mcgibbney
Hi Folks,
I want to test the water to see if there is any interested in a Google
Summer of Code project for Any23 this year!
One possible project I am thinking about is to have a student engaged in
updating and upgrading out Microformats support to Microformats2. If this
is accomplished then we could look at integrating Any23 into the generic
RDFStream implementations out there permitting RDF triples to be directed
to a number of underlying triples stores.

http://microformats.org/

Any other ideas?
Thanks

-- 
*Lewis*


Re: [Fatal Error] :20:79: Element type "arguments.length" must be followed by either attribute specifications, ">" or "/>".

2015-01-25 Thread Lewis John Mcgibbney
Hi Meraj,

On Sun, Jan 25, 2015 at 8:37 PM,  wrote:

>
>
> For the URL http://www.techforless.com/cgi-bin/tech4less/60PN5000 ,
> the build from SNAPSHOT on the local machine gives no triples ,


I am not able to reproduce your results
When I use the service at any23-vm.apache.org, with output format being
best guessed and no validation, reporting or annotation I get the following

@prefix sindice:  .


 "refurbished, refurbashed,
open box, open-box, cheap, sale, discount, bargain, remanufactured,
copmuters, computers, laptops, notebooks, monitors, printers,
wireless, scanners, netwroking, networking, pc, desktop, lcd, LCD,
flatpanel, flat panel, TV, television, dell, Dell, Gateway, gateway,
syncmaster, digital camera, samsung, Samsung, bar code pen scanner,
speakers, DVD players, dvd recorders, burners, camcorders, laser,
inkjet ink, jet, photo, tft, TFT, widescreen, wide screen, phones,
software, projectors, servers, surge protectors, UPCs, memory, PDAs,
Palm, keyboards, mouse, iPod, MP3, components."@en ;
 "7 days"@en ;
 "Buy the LG 60PN5000
cheap open box plasma tv at Tech For Less and get a 30 day guarantee.
We offer a huge selection of open box plasma tvs."@en .
@prefix dcterms:  .

 dcterms:title
"LG 60PN5000 Plasma TV | open box Plasma TVs" .
@prefix foaf:  .
@prefix rdf:  .
@prefix doac:  .

_:nodea0ebb622207fb7d7740bb9a88a2684d a  ;
 "The LG 60-inch 60PN5000
1080p 600 Hz Plasma HDTV gives you a way to watch your favorite
movies, TV shows and sports events in high-definition. The 1920 x 1080
resolution and 16:9 aspect ratio produce clear and vivid images while
the wide screen lets you view movies in full, with no details left
out. When television programs are broadcast in high-definition, this
TV can let you see them the way they were intended to be seen. With
its slim profile, this 60-inch LG 1080p 600 Hz Plasma HDTV is easy to
fit in your living room. Up to two HDMI devices can be connected to
the TV. This LG 60-inch Plasma HDTV is also wall-mountable, allowing
you to save even more space."@en .

_:node46d378c040c2d5e2d4a13c6a8869ead a  ;

"http://schema.org/RefurbishedCondition"@en ;
 "$1,204.49"@en ;
 "USD"@en ;
 "In Stock"@en .

_:nodea0ebb622207fb7d7740bb9a88a2684d

_:node46d378c040c2d5e2d4a13c6a8869ead ;
 "LG 60PN5000 60-inch Widescreen
Plasma HDTV - 1080p - 600 Hz - 3,000,000:1 - HDMI - Black"@en ;

 ;
 "LG Electronics"@en ;
 "60PN5000"@en .



_:nodea0ebb622207fb7d7740bb9a88a2684d ;
dcterms:title "LG 60PN5000 Plasma TV | open box Plasma TVs"@en ;


;

 ;


, 
, 

, 

,  ,

;
 "Buy the LG 60PN5000
cheap open box plasma tv at Tech For Less and get a 30 day guarantee.
We offer a huge selection of open box plasma tvs."@en ;
 "refurbished,
refurbashed, open box, open-box, cheap, sale, discount, bargain,
remanufactured, copmuters, computers, laptops, notebooks, monitors,
printers, wireless, scanners, netwroking, networking, pc, desktop,
lcd, LCD, flatpanel, flat panel, TV, television, dell, Dell, Gateway,
gateway, syncmaster, digital camera, samsung,

Re: Extract a Open Graph value from a web page

2015-01-22 Thread Lewis John Mcgibbney
Hi Meraj,

Running the website you've provided through any23-vm.apache.org results in
the following output

@prefix foaf:  .
@prefix rdf:  .
@prefix doac:  .
@prefix dcterms:  .


dcterms:title "MacMall | Apple Mac mini dual-core Intel Core i5 1.4GHz
(Turbo Boost up to 2.7GHz), 4GB RAM, 500GB Hard Drive, Intel HD
Graphics 5000, Mac OS X Yosemite MGEM2LL/A"@en .

_:node7ab4123bafd8f45a207e47585841b13 a  ;
 "Mac mini Dual-Core Intel
Core i5 1.4GHz, 4GB DDR3 memory, 500GB SATA hard drive, Intel HD
Graphics 5000 processor, 802.11ac Wi-Fi, Bluetooth, Gigabit Ethernet,
HDMI, SDXC card slot, Two Thunderbolt 2 Ports, Audio in/out, IR
receiver"@en ;
 """
Apple Mac mini dual-core Intel Core i5 1.4GHz 
(Turbo Boost up to
2.7GHz), 4GB RAM, 500GB Hard Drive, Intel HD Graphics 5000, Mac OS X
Yosemite (MGEM2LL/A)
"""@en .

_:node9d41b06013eb3d847ae58af99799bbb a  ;
 "$479.00"@en ;
  .

_:node7ab4123bafd8f45a207e47585841b13

_:node9d41b06013eb3d847ae58af99799bbb .

_:nodeaa9a2f42eeabd0b7dfbbd7dfcf6f9 a  ;
 "Null"@en ;
 "Null"@en ;
 "2"@en .

_:node7ab4123bafd8f45a207e47585841b13

_:nodeaa9a2f42eeabd0b7dfbbd7dfcf6f9 .



_:node7ab4123bafd8f45a207e47585841b13 ;
dcterms:title "MacMall | Apple Mac mini dual-core Intel Core i5
1.4GHz (Turbo Boost up to 2.7GHz), 4GB RAM, 500GB Hard Drive, Intel HD
Graphics 5000, Mac OS X Yosemite MGEM2LL/A"@en ;

> ;


;


;


, 

, 

;
 "ToolTwist"@en ;
 "Apple Mac mini
dual-core Intel Core i5 1.4GHz (Turbo Boost up to 2.7GHz), 4GB RAM,
500GB Hard Drive, Intel HD Graphics 5000, Mac OS X Yosemite MGEM2LL/A
for $479.00 at macmall.com. Systems - Mac Mini - Mac Mini w/ Intel
Core i5 Duo Processor - 1.4 GHz Mac Mini Computers from
macmall.com."@en ;
 "Apple Mac mini
dual-core Intel Core i5 1.4GHz (Turbo Boost up to 2.7GHz), 4GB RAM,
500GB Hard Drive, Intel HD Graphics 5000, Mac OS X Yosemite, Apple Mac
Mini, Mac Mini w/ Intel Core i5 Duo Processor, 1.4 GHz Mac Mini
Computers, macmini, 3TED Systems"@en ;
 "telephone=no"@en 
;

"e02911354daa2202c515e76b11f9561b"@en ;
 "noodp,noydir"@en .

I think we can improve upon this by supporting both xmlns:og="
http://opengraphprotocol.org/schema/"; and xmlns:fb="
http://www.facebook.com/2008/fbml"; namespaces... right now it would appear
that we don't. In particular the overwhelming majority of triples coming
from thus page appear to be coming from the microdata parser as they are
exracted from the microdata itemProp's.

One thing I've noticed is that although the HTML TitleExtract [0] is being
called, the HTMLMetaExtractor [1] is not!
We need to investigate this further.

[0]
https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/html/TitleExtractor.java
[1]
https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/html/HTMLMetaExtractor.java

On Fri, Jan 16, 2015 at 9:07 AM,  wrote:

>
> I am trying to retrieve an object value as in subject-predicate-object
>

Re: [Fatal Error] :20:79: Element type "arguments.length" must be followed by either attribute specifications, ">" or "/>".

2015-01-22 Thread Lewis John Mcgibbney
Hey Meraj,

On Fri, Jan 16, 2015 at 9:07 AM,  wrote:

>
>
> Hi Lewis,
>
> I can confirm that the fix suggested in ANY23-131 is fixing this
> issue, I made a fix suggested there and I was able to get past this
> issue .
>
> Can you please check out the patch I've put here
https://github.com/apache/any23/pull/10
Also please check out the mechanisms for using the patch
https://issues.apache.org/jira/browse/ANY23-131
Thanks
Lewis


Re: [Fatal Error] :20:79: Element type "arguments.length" must be followed by either attribute specifications, ">" or "/>".

2015-01-20 Thread Lewis John Mcgibbney
Hi Meraj,

On Tue, Jan 13, 2015 at 11:51 PM,  wrote:

>
> The web page is http://www.adorama.com/ICAE135B.html
>

OK, just ran it through any23-vm.apache.org and I get the following

@prefix foaf:  .
@prefix rdf:  .
@prefix doac:  .
@prefix dcterms:  .

 dcterms:title "Canon PowerShot ELPH 135
Digital Camera, Black 9150B001" .

_:node7aec2e3e2888d5324234d4d14c4767b a  ;
 """




Cameras








Canon Digital Point & Shoot Cameras








Canon PowerShot ELPH 135 Digital Camera, Black





""" .

 
_:node7aec2e3e2888d5324234d4d14c4767b ;
dcterms:title "Canon PowerShot ELPH 135 Digital Camera, Black 9150B001" 
;


;


;


;


;


, 
, 
;


;


;


, 

;


;

 ;

 ;

 ,  ,
 ,
 ,
 ,
 ,  ,
 ;

 ;

 ;


;


;


,  ,
 ;


, 
;

 ;
 "buy, shop Canon
PowerShot ELPH 135 Digital Camera, 16MP, 8x Optical Zoom, 720p HD
Video, Digital Image Stabilization, Black, MPN: 9150B001 SKU:
ICAE135B" ;
 "Same Day Shipping
till 8PM on new Canon PowerShot ELPH 135 Digital Camera, 16MP, 8x
Optical Zoom, 720p HD Video, Digital Image Stabilization, Black. MPN
9150B001 SKU ICAE135B. From Adorama.com - more than a camera store." ;
 "@adorama" ;
 "@adorama" ;
 "product" ;
 "30.0

Re: [Fatal Error] :20:79: Element type "arguments.length" must be followed by either attribute specifications, ">" or "/>".

2015-01-11 Thread Lewis John Mcgibbney
Hi Meraj,

On Sat, Jan 10, 2015 at 8:51 PM,  wrote:

>
> I am using the latest Apache 1.1 release


Great


> and I am trying to extract
> the triples from an html page and , I consistently get this error
>
> [Fatal Error] :20:79: Element type "arguments.length" must be followed
> by either attribute specifications, ">" or "/>".
>

If possible can you provide the webpage? Or atleast some HTML?


>
> I have verified this error happening when I use both the API
> invocation as well as jetty service usage of Any23,


You mean any23-vm.apache.org web service?


> does any one know
> why this error is occuring ?
>

Why what is occurring? Can You please post some errors?


>
> To me this seems like some sort of a character encoding issue, but I
> am not sure.
>
> Neither am I until I see an error message or stack trace.
Thank you


Re: org.openrdf.rio.RDFHandlerException when attempting to parse CSV

2015-01-11 Thread Lewis John Mcgibbney
Hi Andy,

On Sat, Jan 10, 2015 at 8:51 PM,  wrote:

>
>
> This is unwritable in RDF/XML because of the predicate.
>
> http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords/
> sciencekeywords.csvKeywordVersion:8.0
>
> All properties become qnames in RDF/XML.  There is no way round it.
>
> The rules for qnames are quite strict.  The local part of a qname must
> start with a letter and can not contain a ':'
>
> So for ".../sciencekeywords.csvKeywordVersion:8.0"
> there is no split point to make a qname.
>
>
Thank you very much for explanation. I was not aware of this intricacy and
this would have taken me a decade or so to narrow down.
I do however this that we (possibly from within the openrdf codebase) can
provide a better message. I am not a big fan of TargetInnvocationExceptions
even when I've had a good nights sleep, this one is no different.


[ANNOUNCE] Apache Science and Healthcare Track @ApacheCon NA 2015

2015-01-08 Thread Lewis John Mcgibbney
Hi Folks,

Apologies for cross posting :(

As some of you may already know, @ApacheCon NA 2015 is happening in Austin,
TX April 13th-16th.

This email is specifically written to attract all folks interested in
Science and Healthcare... this is an official call to arms! I am aware that
there are many Science and Healthcare-type people lingering in the Apache
Semantic Web communities. This one is for you folks.

Over a number of years the Science track has been emerging as an attractive
and exciting, at times mind blowing non-traditional track running alongside
the resident HTTP server, Big Data, etc tracks. The Semantic Web Track is
another such emerging track which has proved popular. This year we want to
really get the message out there about how much Apache technology is
actually being used in Science and Healthcare. This is not *only* aimed at
attracting members of the communities below

but also at potentially attracting a brand new breed of conference
participants to ApacheCon  and
the Foundation e.g. Scientists who love Apache. We are looking for
exciting, invigorating, obscure, half-baked, funky, academic, practical and
impractical stories, use cases, experiments and down right successes alike
from within the Science domain. The only thing they need to have in common
is that they consume, contribute towards, advocate, disseminate or even
commercialize Apache technology within the Scientific domain and would be
relevant to that audience. It is fully open to interest whether this track
be combined with the proposed *healthcare track*... if there is interest to
do this then we can rename this track to Science and Healthcare. In essence
one could argue that they are one and the same however I digress [image: :)]

What I would like those of you that are interested to do, is to merely
check out the scope and intent of the Apache in Science content curation
which is currently ongoing and to potentially register your interest.

https://wiki.apache.org/apachecon/ACNA2015ContentCommittee#Apache_in_Science

I would love to see the Science and Healthcare track be THE BIGGEST track
@ApacheCon, and although we have some way to go, I'm sure many previous
track participants will tell you this is not to missed.

We are looking for content from a wide variety of Scientific use cases all
related to Apache technology.
Thanks in advance and I look forward to seeing you in Austin.
Lewis

-- 
*Lewis*


Re: org.openrdf.rio.RDFHandlerException when attempting to parse CSV

2015-01-08 Thread Lewis John Mcgibbney
I can extract triples from this CSV and serialize to turtle, ntriples, trix
and json.
The barrier here is serializing this as rdfxml.
Specifically:

Caused by: org.openrdf.rio.RDFHandlerException: Unable to create XML
namespace-qualified name for predicate:
http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords/sciencekeywords.csvKeywordVersion:8.0
at 
org.openrdf.rio.rdfxml.RDFXMLWriter.handleStatement(RDFXMLWriter.java:237)
at 
org.apache.any23.writer.RDFWriterTripleHandler.receiveTriple(RDFWriterTripleHandler.java:93)
... 48 more



On Thu, Jan 8, 2015 at 11:45 AM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Ah OK, I think that this is our problem.
>
> This only happens under the following conditions
>
> OUTPUT FORMAT: RDFXML
> VALIDATION: Validate + Fix
> REPORT: Yes
> ANNOTATE: Yes
>
> I am going to experiment a bit more with where this goes wrong.
>
>
> On Thu, Jan 8, 2015 at 11:38 AM, Lewis John Mcgibbney <
> lewis.mcgibb...@gmail.com> wrote:
>
>> Hi Folks,
>> With the Any23 webservice [0] when I try to extract triples from the
>> following CSV [1], I get the following stack trace
>>
>> 
>> 
>> Internal error.
>> 
>> 
>> 
>> 
>>
>> The Any23 service is effectively running of of trunk. Therefore I am
>> going to head over to the openrdf.rio lists and check this out out. I just
>> wanted to post it here first though.
>> Thanks
>> Lewis
>>
>> [0] http://any23-vm.apache.org
>> [1]
>> http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords/sciencekeywords.csv
>>
>> --
>> *Lewis*
>>
>
>
>
> --
> *Lewis*
>



-- 
*Lewis*


Re: org.openrdf.rio.RDFHandlerException when attempting to parse CSV

2015-01-08 Thread Lewis John Mcgibbney
Ah OK, I think that this is our problem.

This only happens under the following conditions

OUTPUT FORMAT: RDFXML
VALIDATION: Validate + Fix
REPORT: Yes
ANNOTATE: Yes

I am going to experiment a bit more with where this goes wrong.


On Thu, Jan 8, 2015 at 11:38 AM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Hi Folks,
> With the Any23 webservice [0] when I try to extract triples from the
> following CSV [1], I get the following stack trace
>
> 
> 
> Internal error.
> 
> 
> 
> 
>
> The Any23 service is effectively running of of trunk. Therefore I am going
> to head over to the openrdf.rio lists and check this out out. I just wanted
> to post it here first though.
> Thanks
> Lewis
>
> [0] http://any23-vm.apache.org
> [1]
> http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords/sciencekeywords.csv
>
> --
> *Lewis*
>



-- 
*Lewis*


org.openrdf.rio.RDFHandlerException when attempting to parse CSV

2015-01-08 Thread Lewis John Mcgibbney
Hi Folks,
With the Any23 webservice [0] when I try to extract triples from the
following CSV [1], I get the following stack trace



Internal error.





The Any23 service is effectively running of of trunk. Therefore I am going
to head over to the openrdf.rio lists and check this out out. I just wanted
to post it here first though.
Thanks
Lewis

[0] http://any23-vm.apache.org
[1]
http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords/sciencekeywords.csv

-- 
*Lewis*


Re: Unable to connect to any23 svn repo

2014-12-03 Thread Lewis John Mcgibbney
Hi Jaikit,

On Wed, Dec 3, 2014 at 7:48 PM,  wrote:

>
> Team,
>
> Since today afternoon we are unable to resolve any23 snv repo and hence
> our builds are failing. Does anyone have any workaround or solution for
> this ?
>
>  Could not transfer artifact 
> org.apache.commons:commons-csv:pom:1.0-SNAPSHOT-rev1148315 from/to 
> any23-repository-external (http://svn.apache.org/repos/asf/any23/repo-ext/): 
> Connect to svn.apache.org:80 [svn.apache.org/140.211.11.4] failed: Connection 
> refused -> [Help 1]
>
>
> Appreciate any help.
>
>
> There was a failure of the SVN master node on the Apache infrastructure
earlier today [0].
In the meantime I would hack your pom.xml and use the system
and  nodes for the dependency.
An example can be seen here
https://github.com/maestros/gora-oraclenosql/blob/master/gora-oracle/pom.xml#L120

Hope this helps
Lewis

[0] https://twitter.com/infrabot/status/540192293572337664


[ANNOUNCE] Apache Any23 1.1 Released

2014-10-29 Thread lewis john mcgibbney
Good Afternoon Folks,

The Any23 PMC are proud to announce the release of Apache Any23 1.1.
Any23 (Anything to Triples) is a library, a web service and a command line
tool that extracts structured data in RDF format from a variety of Web
documents additionally it supports a whole variety of input and output
formats.

This minor release is our first since hitting the big 1 point Oh! A full
breakdown of the issues addressed in this release can be found in our
release report:
http://s.apache.org/any231.1

**It should be noted that this release is not tested and verified to be
stable on Java 1.8**

To check out more of what Any23 can do we urge you to navigate to our
website:
http://any23.apache.org

To download, please visit our downloads page
http://any23.apache.org/download.html

To join the community please visit out mailing lists:
http://any23.apache.org/mail-lists.html

Additionally, you can try out our webservice which is currently available
from:
http://any23-vm.apache.org

Thank you, have a great day
Lewis
(On behalf of the Any23 PMC)

-- 

` :
:   , :
 #+`. ,,`,
` ;##`  .`,.  ;;':;`
 `` ##@.;.;: ,;+;;;';;';;';'`
  ```,###:  .,;; +;;'';;+;;;';;`
```#+##'``;+ '';;;'';;';;;';;;`
 ```,##+#@:: ''';';;';+;;';;':::+:
   ```.#'';';+;;';';';;';;';;':,;:
 '#+#+#';';''';;';';;';;';'::
  ;;:';,##''';'';;';';;'';;;'::';;;':.```
`.,`;;;++';'';;';'';;';;;';;'::';;:;';;;::
:`,.,.`:';+#+;;''';'';';';;';;';;';;;'::;';:;.
   .`..;,:`';;';';;;'+#+';;''+';;';:'';;';';;;':::;,:`
` ,`:. ;;;';';;;++#+'';''';''+;;';;';::';';;:..
  ` `` ;;;';';';';;'+###+';';'';;';;';;';;';;;';;',:.
  ` `  `;:;;';';';;;'+;';';;';;';;';;';;'';;';';::;
   
`.;,:::;::;';';;'#++''';;';;;'';+';:::''::;;..:

```:,'::,;';';;;';;;''##+++'';;';;';;;''';;':,,,:.:,.`

```..::,;';:;';';';;;';';';';'''++###+'+;';;;';;;';;:;.:..:..,

,;;:;:;';''';''++##+++.:..:.,;
`

`.``,,:,';;::;;::';';;;';';;';';;';';;';;';';';'++#+###@#++:...,,.;:.

`:.';.,;;',,;;;';';;';;':;;;';';;';;';';';;';;;''.:,:.,:'#@'::,

```.:,';;.::':';';',;;;';;':;';;';;';;;';;';'';;.;.,.:..,:.::

``:::',:;';;,:;;',:';';;':';';;;';;'::';;;,..,.,.,:+`

`..:'+:';;',;';,:;:';;;,,';::,';;',,';;.:.:;,

``,.';;:':,;:;,,:;:::``..,:,``

:`;;`

``: ,:`







http://people.apache.org/~lewismc || @hectorMcSpector ||
http://www.linkedin.com/in/lmcgibbney

Apache Gora V.P || Apache Nutch PMC || Apache Any23 V.P || Apache OODT PMC ||
 Apache Open Climate Workbench PMC || Apache Tika PMC || Apache TAC


[RESULT] WAS Re: [VOTE] Release Apache Any23 1.1

2014-10-29 Thread Lewis John Mcgibbney
Hi Folks,
I am bringing this VOTE thread to a close now as the 72 hours has lapsed.
I am very glad to state that the vote has passed with the following result :

+1 (binding):

Chris Mattmann
Lewis John Mcgibbney
Andy Seaborne

I will continue with the remainder of the release procedure and send out
announcements.
Thank you to everyone that was able to review the release candidate and
that contributed to Any23 1.1.
Best
Lewis

On Thu, Oct 16, 2014 at 5:02 PM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Evening Folks,
>
> I would like to open a VOTE on the following Apache Any23 1.1 release
> candidate.
>
> We solved a number of issues which can be seen in our release report:
> http://s.apache.org/any231.1
>
> Git source tag signature (4d5a022f71d2199c2d2cf83f4c51397249973052):
> http://s.apache.org/any231.1tag
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1001/
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/1.1/
>
> PGP release keys (signed using 48BAEBF6 Lewis John McGibbney (CODE SIGNING
> KEY) ):
> http://any23.apache.org/dist/KEYS
>
> I would like to say thank you to everyone that contributed to this minor
> release of Any23.
> Vote will be open for 72 hours.
>
> [ ] +1, let's get it rmblee!!!
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
>
> p.s. here is my +1
>
> --
> *Lewis*
>



-- 
*Lewis*


Re: [VOTE] Release Apache Any23 1.1

2014-10-19 Thread Lewis John Mcgibbney
Is anyone else able to review this RC?
Thanks in advance folks
Lewis

On Thu, Oct 16, 2014 at 5:02 PM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Evening Folks,
>
> I would like to open a VOTE on the following Apache Any23 1.1 release
> candidate.
>
> We solved a number of issues which can be seen in our release report:
> http://s.apache.org/any231.1
>
> Git source tag signature (4d5a022f71d2199c2d2cf83f4c51397249973052):
> http://s.apache.org/any231.1tag
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1001/
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/1.1/
>
> PGP release keys (signed using 48BAEBF6 Lewis John McGibbney (CODE SIGNING
> KEY) ):
> http://any23.apache.org/dist/KEYS
>
> I would like to say thank you to everyone that contributed to this minor
> release of Any23.
> Vote will be open for 72 hours.
>
> [ ] +1, let's get it rmblee!!!
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
>
> p.s. here is my +1
>
> --
> *Lewis*
>



-- 
*Lewis*


[VOTE] Release Apache Any23 1.1

2014-10-16 Thread Lewis John Mcgibbney
Evening Folks,

I would like to open a VOTE on the following Apache Any23 1.1 release
candidate.

We solved a number of issues which can be seen in our release report:
http://s.apache.org/any231.1

Git source tag signature (4d5a022f71d2199c2d2cf83f4c51397249973052):
http://s.apache.org/any231.1tag

Staging repo:
https://repository.apache.org/content/repositories/orgapacheany23-1001/

Staging binaries:
https://dist.apache.org/repos/dist/dev/any23/1.1/

PGP release keys (signed using 48BAEBF6 Lewis John McGibbney (CODE SIGNING
KEY) ):
http://any23.apache.org/dist/KEYS

I would like to say thank you to everyone that contributed to this minor
release of Any23.
Vote will be open for 72 hours.

[ ] +1, let's get it rmblee!!!
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)


p.s. here is my +1

-- 
*Lewis*


[DISCUSS] Any23 1.1 Release

2014-09-08 Thread Lewis John Mcgibbney
Hi Folks,

I would like to push a point release, namely 1.1.
Is anyone interested in setting a roadmap for this?
Right now we have all issues bulk tagged as 1.1 however it would be nice to
get a roadmap here
*http://s.apache.org/UGq *
Thanks
Lewis

-- 
*Lewis*


Re: opengraph not being extracted

2014-08-18 Thread Lewis John Mcgibbney
Thanks Stéphane

On Mon, Aug 18, 2014 at 6:52 PM,  wrote:

>
> here is the link:
>
> https://issues.apache.org/jira/browse/ANY23-227?focusedCommentId=14083838&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14083838
>
>
>
>


  1   2   >