CVE-2022-25312: An XML external entity (XXE) injection vulnerability exists in the Apache Any23 RDFa XSLTStylesheet extractor

2022-03-04 Thread lewis john mcgibbney
Description:

An XML external entity (XXE) injection vulnerability was discovered in
the Any23 RDFa XSLTStylesheet extractor and is known to affect Any23
versions < 2.7. XML external entity injection (also known as XXE) is a
web security vulnerability that allows an attacker to interfere with
an application's processing of XML data. It often allows an attacker
to view files on the application server filesystem, and to interact
with any back-end or external systems that the application itself can
access.

Resolution:

This issue is fixed in Apache Any23 2.7 which can be downloaded from
https://any23.apache.org/download.html. We strongly encourage all
Any23 users to upgrade to Apache Any23 2.7.

Credit:

The Apache Any23 Project Management Committee would like to thank Lion
Tree a.k.a liontree0110 for reporting this issue.

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[ANNOUNCE] Apache Any23 2.7

2022-03-04 Thread lewis john mcgibbney
The Apache Any23 Project Management Committee is pleased to announce
the release of Apache Any23 2.7.

Apache Anything To Triples (Any23) is a library, a web service and a
command line tool that extracts structured data in RDF format from a
variety of Web documents.

Any23 2.7 requires JDK11 to build and run.

Release Notes: https://github.com/apache/any23/blob/any23-2.7/RELEASE-NOTES.md

Download: http://any23.apache.org/download.html

Maven Artifacts:
https://search.maven.org/search?q=g:org.apache.any23%20AND%20v:2.7

DOAP: https://github.com/apache/any23-committers/blob/master/doap_Any23.rdf

Have Fun,
(Lewis), on behalf of the Apache Any23 PMC
N.B. The release artifacts can take a bit of time to reach the
distribution servers, please be patient.



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.7

2022-03-03 Thread lewis john mcgibbney
72 hrs has expired so I am happy to bring this VOTE to a close. Thanks to
everyone able to VOTE and review this release candidate. The RESULT is as
follows

[4] +1, Release as Apache Any23 ${latestStableRelease}
Andy Seaborne*
Hans Brende*
David Cockbill
Lewis John McGibbney*
*Any23 PMC-binding


[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

I'll go ahead and make the release :)
Thanks
lewismc

On Mon, Feb 21, 2022 at 11:16 AM lewis john mcgibbney 
wrote:

> Hi,
>
> Please VOTE on the release candidate for Apache Any23 2.7.
>
> Note, this release candidate requires JDK11 to build and run.
>
> We solved 18 
> issues:https://issues.apache.org/jira/projects/ANY23/versions/12350742
>
> Git source tag 
> (a8ee3fc67536bf702c06db67011095c3fdd6cf3a):https://gitbox.apache.org/repos/asf?p=any23.git;a=commit;h=a8ee3fc67536bf702c06db67011095c3fdd6cf3a
>
> Staging 
> repo:https://repository.apache.org/content/repositories/orgapacheany23-1012
>
> Sources and CLI binaries area:https://dist.apache.org/repos/dist/dev/any23
>
> PGP release keys (signed using 
> 48BAEBF6):https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, Release as Apache Any23 ${latestStableRelease}
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
>
> P.S. Here is my +1
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[VOTE] Release Apache Any23 2.7

2022-02-21 Thread lewis john mcgibbney
Hi,

Please VOTE on the release candidate for Apache Any23 2.7.

Note, this release candidate requires JDK11 to build and run.

We solved 18 
issues:https://issues.apache.org/jira/projects/ANY23/versions/12350742

Git source tag 
(a8ee3fc67536bf702c06db67011095c3fdd6cf3a):https://gitbox.apache.org/repos/asf?p=any23.git;a=commit;h=a8ee3fc67536bf702c06db67011095c3fdd6cf3a

Staging 
repo:https://repository.apache.org/content/repositories/orgapacheany23-1012

Sources and CLI binaries area:https://dist.apache.org/repos/dist/dev/any23

PGP release keys (signed using
48BAEBF6):https://dist.apache.org/repos/dist/release/any23/KEYS

Vote will be open for 72 hours.

[ ] +1, Release as Apache Any23 ${latestStableRelease}
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)


P.S. Here is my +1


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[ANNOUNCE] Apache Any23 2.6 Release

2022-01-08 Thread lewis john mcgibbney
The Apache Any23 Team is pleased to announce the release of Apache Any23
2.6.

Apache Anything To Triples (Any23) is a library, a web service and a
command line tool that extracts structured data in RDF format from a
variety of Web documents.

Any23 2.6 requires JDK11 to build and run.

Release Notes:
https://github.com/apache/any23/blob/any23-2.6/RELEASE-NOTES.md

Download: http://any23.apache.org/download.html

Dependency information: https://any23.apache.org/dependency-info.html

DOAP: https://github.com/apache/any23-committers/blob/master/doap_Any23.rdf

Community and support: https://any23.apache.org/mailing-lists.html

Have Fun,
(lewismc), on behalf of the Apache Any23 PMC

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.6 RC#2

2022-01-07 Thread lewis john mcgibbney
Hi user@ and dev@,
72 hours have expired. I'm closing off this VOTE. Thank you everyone able
to VOTE. The RESULT is as follows

[3] +1, release as Any23 2.6
Hans Brende*
Andy Seaborne*
Lewis John McGibbney*

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

*Any23 PMC-binding

This is excellent. i will close out the release and promote it.
Have a great weekend.
lewismc

On Mon, Jan 3, 2022 at 9:45 PM lewis john mcgibbney 
wrote:

> Hi user@ and dev@,
>
> Please VOTE on the 2nd release candidate for Apache Any23 2.6. Most
> notably this RC addresses several security vulnerabilities by upgrading
> every single Any23 dependency.
>
> We solved 62 issues:
> https://issues.apache.org/jira/projects/ANY23/versions/12350556
>
> Git source tag (7ea496991f3a053b00cba2ec82ef8a8a4d7e401e):
> https://gitbox.apache.org/repos/asf?p=any23.git;a=tag;h=refs/tags/any23-2.6
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1011
> <https://repository.apache.org/content/repositories/orgapacheany23-1010>
>
> Staging source:
> https://dist.apache.org/repos/dist/dev/any23/2.6/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, release as Any23 2.6
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[VOTE] Release Apache Any23 2.6 RC#2

2022-01-03 Thread lewis john mcgibbney
Hi user@ and dev@,

Please VOTE on the 2nd release candidate for Apache Any23 2.6. Most notably
this RC addresses several security vulnerabilities by upgrading every
single Any23 dependency.

We solved 62 issues:
https://issues.apache.org/jira/projects/ANY23/versions/12350556

Git source tag (7ea496991f3a053b00cba2ec82ef8a8a4d7e401e):
https://gitbox.apache.org/repos/asf?p=any23.git;a=tag;h=refs/tags/any23-2.6

Staging repo:
https://repository.apache.org/content/repositories/orgapacheany23-1011


Staging source:
https://dist.apache.org/repos/dist/dev/any23/2.6/

PGP release keys (signed using 48BAEBF6):
https://dist.apache.org/repos/dist/release/any23/KEYS

Vote will be open for 72 hours.

[ ] +1, release as Any23 2.6
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)

P.S. Here is my +1
-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.6

2022-01-03 Thread lewis john mcgibbney
Hi user@, dev@,
I'm going to bring this VOTE thread to a close with the following results.

[3] +1, release as Any23 2.6
Andy Seaborne*
Lewis John McGibbney*
David Cockbill

*Any23 PMC binding

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

Unfortunately on this occasion we were unable to get enough VOTE's from PMC
members to progress with the release.

There have been several improvements to Any23 since this release candidate
was produced, so I will go ahead and produce a new release candidate and
see if we have better luck with RC#2.

lewismc

On Wed, Nov 3, 2021 at 10:10 PM lewis john mcgibbney 
wrote:

> Hi user@ and dev@,
>
> Please VOTE on the release candidate for Apache Any23 2.6.
>
> We solved 45 issues:
> https://issues.apache.org/jira/projects/ANY23/versions/12350556
>
> Git source tag (ac9c507bf7aedb4909d5bc135944a5bd3c474bd7):
> https://gitbox.apache.org/repos/asf?p=any23.git;a=tag;h=refs/tags/any23-2.6
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1010
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.6/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, release as Any23 2.6
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Release Apache Any23 2.6

2021-11-12 Thread lewis john mcgibbney
Hi Any23 PMC,
Sorry to do it this way... but can anyone review this release candidate?
Thank you
lewismc

On Wed, Nov 3, 2021 at 10:10 PM lewis john mcgibbney 
wrote:

> Hi user@ and dev@,
>
> Please VOTE on the release candidate for Apache Any23 2.6.
>
> We solved 45 issues:
> https://issues.apache.org/jira/projects/ANY23/versions/12350556
>
> Git source tag (ac9c507bf7aedb4909d5bc135944a5bd3c474bd7):
> https://gitbox.apache.org/repos/asf?p=any23.git;a=tag;h=refs/tags/any23-2.6
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1010
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.6/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, release as Any23 2.6
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Release Apache Any23 2.6

2021-11-10 Thread lewis john mcgibbney
Hi Folks,
Anyone else able to VOTE?
Thank you
lewismc

On Wed, Nov 3, 2021 at 10:10 PM lewis john mcgibbney 
wrote:

> Hi user@ and dev@,
>
> Please VOTE on the release candidate for Apache Any23 2.6.
>
> We solved 45 issues:
> https://issues.apache.org/jira/projects/ANY23/versions/12350556
>
> Git source tag (ac9c507bf7aedb4909d5bc135944a5bd3c474bd7):
> https://gitbox.apache.org/repos/asf?p=any23.git;a=tag;h=refs/tags/any23-2.6
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1010
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.6/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, release as Any23 2.6
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


CVE-2021-40146: A Remote Code Execution (RCE) vulnerability exists in Apache Any23 YAMLExtractor.java

2021-09-10 Thread lewis john mcgibbney
Description:

A Remote Code Execution (RCE) vulnerability was discovered in the
Any23 YAMLExtractor.java file and is known to affect Any23 versions <
2.5. RCE vulnerabilities allow a malicious actor to execute any code
of their choice on a remote machine over LAN, WAN, or internet. RCE
belongs to the broader class of arbitrary code execution (ACE)
vulnerabilities.

Credit:

The Apache Any23 Project Management Committee would like to thank
Zhuxuan Wu for reporting the security vulnerability.



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


CVE-2021-38555: An XML external entity (XXE) injection vulnerability exists in Apache Any23 StreamUtils.java

2021-09-10 Thread lewis john mcgibbney
Severity: critical

Description:

An XML external entity (XXE) injection vulnerability was discovered in
the Any23 StreamUtils.java file and is known to affect Any23 versions
< 2.5. XML external entity injection (also known as XXE) is a web
security vulnerability that allows an attacker to interfere with an
application's processing of XML data. It often allows an attacker to
view files on the application server filesystem, and to interact with
any back-end or external systems that the application itself can
access.

Credit:

The Apache Any23 Project Management Committee would like to thank
Zhuxuan Wu for reporting the security vulnerability.



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[ANNOUNCE] Apache Any23 2.5 Release

2021-09-10 Thread lewis john mcgibbney
*What?*
The Apache Any23 Team is pleased to announce the release of Apache Any23 2.5.

Apache Anything To Triples (Any23) is a library, a web service and a
command line tool that
extracts structured data in RDF format from a variety of Web documents.
*Where?*
Download: http://any23.apache.org/download.html
Maven Artifacts: https://s.apache.org/3lai8
DOAP: https://github.com/apache/any23-committers/blob/master/doap_Any23.rdf
Release Notes: 
https://dist.apache.org/repos/dist/release/any23/2.5/RELEASE-NOTES.txt

Have Fun,
Lewis
(on behalf of the Apache Any23 PMC)

N.B. The release artifacts can take a bit of time to reach the
distribution servers, please be patient.



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Fwd: WebDataCommons releases 86.3 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 15.3 million websites

2021-01-21 Thread lewis john mcgibbney
FYI folks

-- Forwarded message -
From: Lewis John Mcgibbney 
Date: Thu, Jan 21, 2021 at 1:04 PM
Subject: Re: WebDataCommons releases 86.3 billion quads Microdata, Embedded
JSON-LD, RDFa, and Microformat data originating from 15.3 million websites
To: Web Data Commons 


Congratulations on the new dataset release.
The statistics are really interesting.
Really good to hear that Any23 is performing nominally. That is good. :)

On Thursday, 21 January 2021 at 02:00:44 UTC-8 apri...@gmail.com wrote:

> Hi all,
>
> we are happy to announce the new release of the WebDataCommons Microdata,
> JSON-LD, RDFa and Microformat data corpus.
>
> The data has been extracted from the September 2020 version of the Common
> Crawl covering 3.4 billion HTML pages which originate from 34.5 million
> websites (pay-level domains). For the extraction of structured data, the
> newest version 2.4 of the any23 library was used.
>
> In summary, we found structured data within 1.7 billion HTML pages out of
> the 3.4 billion pages contained in the crawl (50%). These pages originate
> from 15.3 million different pay-level domains out of the 34.5 million
> pay-level-domains covered by the crawl (44.3%). Last year, we only found
> structured data in 37% of the pages and on 37.2% of the pay-level-domains.
>
> Approximately 7.8 million of the 2020 websites use Microdata, 7.6 million
> websites use JSON-LD, and 3.3 million websites make use of RDFa.
> Microformats are used by more than 4 million websites within the crawl.
>
>
>
> *Statistics about the December 2020 Release:*
>
> Basic statistics about the December 2020 Microdata, JSON-LD, RDFa, and
> Microformat data sets as well as the vocabularies that are used together
> with each markup format are found at:
>
> http://webdatacommons.org/structureddata/2020-12/stats/stats.html
>
>
>
> *Markup Format Adoption*
>
> The page below provides an overview of trends in the adoption of the
> different markup formats as well as widely used schema.org classes in the
> timespan 2012 to 2020:
>
> http://webdatacommons.org/structureddata/#toc3
>
> Comparing the statistics from the new 2020 release to the statistics about
> the 2019 release of the data sets
>
> http://webdatacommons.org/structureddata/2019-12/stats/stats.html
>
> we can observe that although the overall number of pages in the crawl is
> by 38.9% larger in comparison to the crawl used for the 2019 release, the
> corresponding growth in terms of domains is only 7.9%, indicating that the
> crawl corpus used this year is much deeper in comparison to the one of last
> year. However, we see that more and more websites annotate their content,
> as the yearly increase of the domains having annotated data was more than
> 28%. The markup format with the largest domain growth in adoption (>50%) is
> JSON-LD. The growing trend of the JSON-LD format becomes even more obvious
> in certain domains, such as hotels.com and yahoo.com, which have switched
> from using Microdata to using JSON-LD as dominant markup language.
> Concerning the vocabulary adoption, schema.org continues to be the most
> dominant vocabulary. More concretely, the classes schema:WebPage,
> schema:Product, schema:Rating, schema:Organization and schema:Person saw a
> major adoption increase in comparison to 2019 (>40%). Looking at the
> richness of JSON-LD descriptions, we notice that the average number of
> triples per URL has grown from 29 in 2019 to 41 in 2020 and has now reached
> a similar level of detail as the Microdata annotations (avg 39 triples per
> URL).
>
>
>
> *Download *
>
> The overall size of the December 2020 RDFa, Microdata, Embedded JSON-LD
> and Microformat data sets is 86.3 billion RDF quads. For download, we split
> the data into 21,346 files with a total size of 1.9 TB.
>
>
> http://webdatacommons.org/structureddata/2020-12/stats/how_to_get_the_data.html
>
> In addition, we have created for over 43 different schema.org classes
> separate files, including all quads extracted from pages, using a specific
> schema.org class.
>
>
> http://webdatacommons.org/structureddata/2020-12/stats/schema_org_subsets.html
>
>
>
> *Lots of thanks to:*
>
> + the Common Crawl project for providing their great web crawl and
> thus enabling the WebDataCommons project.
> + the Any23 project for providing and maintaining their great library of
> structured data parsers.
> + Amazon Web Services in Education Grant for supporting WebDataCommons.
>
>
> *General Information about the WebDataCommons Project*
>
> The WebDataCommons project extracts yearly since 2012 structured data from
> the Common Crawl, the largest web corpus available to the public, and
> provides the extract

[ANNOUNCEMENT] Apache Any23 2.4 Release

2020-10-06 Thread lewis john mcgibbney
The Apache Any23 Team is pleased to announce the release of Apache Any23
2.4.

Apache Anything To Triples (Any23) is a library, a web service and a
command line tool that
extracts structured data in RDF format from a variety of Web documents.

Release Notes:
https://github.com/apache/any23/blob/any23-2.4/RELEASE-NOTES.txt

Download: http://any23.apache.org/download.html

Maven Artifacts: https://s.apache.org/l6sg9

Community mailing lists: http://any23.apache.org/mailing-lists.html

Have Fun,
(Lewis), on behalf of the Apache Any23 PMC
N.B. The release artifacts can take a bit of time to reach the distribution
servers, please be patient.

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.4

2020-10-06 Thread lewis john mcgibbney
Thank you to everyone able to cast a VOTE. 72 hours has come and gone and
it is time to close this VOTE with the following RESULT

[4] +1, release this as Apache Any23 2.4
Andy Seaborne*
Lewis John McGibbney*
Shashanka Balakuntala
Jacek Grzebyta*

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

*Any23 PMC binding

I'll progress with the remainder of the release now.

lewismc

On Sun, Sep 20, 2020 at 8:14 PM lewis john mcgibbney 
wrote:

> Hi,
>
> Please VOTE on the release candidate for Apache Any23 2.4
>
> We solved 39 issues:
> https://issues.apache.org/jira/projects/ANY23/versions/12344593
>
> Git source tag (151b916
> <https://github.com/apache/any23/commit/151b916cfd0c526d9b34407549202a8849cb70d6>
> ):
> https://github.com/apache/any23/tree/any23-2.4
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1008
>
> Staging binaries:
> https://dist.apache.org/repos/dist/release/any23/2.3/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, release this as Apache Any23 2.4
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: LD-JSON Embedded not working

2020-10-02 Thread Lewis John McGibbney
Hi Mauro,

On 2020/10/01 09:41:19, Mauro Asprea  wrote: 
> Thank you Lewis!
> 
> Then I should assume that 2.3 is "broken"? I'll try the upcoming 2.4 as you
> suggested.

Unfortunately it looks like this is one that slipped through the net... which 
is disappointing as we have tests for this extractor. 
https://github.com/apache/any23/blob/any23-2.3/core/src/test/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractorTest.java
.. I don't have an explanation right now!

> 
> I still have one more question, apart from that, what is the best way to
> debug Any23 issues like this?

I would suggest that tis mailing list is the best place. If you want, I suppose 
we could set up Slack on the ASF slack channel... 

Any suggestions?
Thanks


Re: LD-JSON Embedded not working

2020-09-30 Thread Lewis John McGibbney
Hi Mauro,

On 2020/09/24 11:28:24, Mauro Asprea  wrote: 
> Hello, what am I doing wrong?
> 
> I downloaded the CLI binary distribution and verified that google does find
> the embedded LD-JSON triplets as you see here
> https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Fwww.monster.com%2Fjobs%2Fsearch%2F%3Fq%3DRuby%26where%3DAustin__2C-TX

Just a quick note. It is impossible to know what kind of 'standard' Google uses 
for the structure data testing tool. As you can see, it is also being 
deprecated pretty quickly. 

> 
> Then I run any23 but I get no quads/triplets...
> 
> hamilcar:apache-any23-cli-2.3:> bin/any23 rover -p -s -t "
> > https://www.monster.com/jobs/search/?q=Ruby=Austin__2C-TX; -f json
> > -e html-embedded-jsonld -l monster-jsonld.log

Using the 2.4 RC#1 (https://dist.apache.org/repos/dist/dev/any23/2.4/) I get 
the following results which include the one Organization, one ItemList and 29 
itemListElement's

./bin/any23 rover -p -s -t 
"https://www.monster.com/jobs/search/?q=Ruby=Austin__2C-TX; -f json -e 
html-embedded-jsonld -l monster-jsonld.log -o monster.json


Apache Any23 :: rover


>Summary:
   -total calls: 1
   -total triples: 128
   -total runtime: 30 ms!
   -tripls/ms: 4
   -ms/calls: 30
>Extractor: html-embedded-jsonld
   -total calls: 1
   -total triples: 128
   -total runtime: 30 ms!
   -tripls/ms: 4
   -ms/calls: 30


Apache Any23 SUCCESS
Total time: 2s
Finished at: Wed Sep 30 11:11:04 PDT 2020
Final Memory: 107M/367M

I suggest that you should upgrade to 2.4.

> 
> 
> How can I increase the logging level to see any hidden debug messages?

You would need to literally hack the source code to add  more debug logging. 

I created a ticket to address the log4j appender issue - 
https://issues.apache.org/jira/browse/ANY23-454

> 
> Also as you can see, this webpage has an embedded LD+JSON script that is
> not being picked up by the extractor. Help?
> 

If I remove the extractor flag e.g. -e html-embedded-jsonld, then I get lots 
more results. Some of these are however trivial in nature so would need to be 
filtered out.

>Summary:
   -total calls: 21
   -total triples: 189
   -total runtime: 688 ms!
   -tripls/ms: 0
   -ms/calls: 32
>Extractor: html-head-icbm
   -total calls: 1
   -total triples: 0
   -total runtime: 6 ms!
   -tripls/ms: 0
   -ms/calls: 6
>Extractor: html-mf-geo
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-head-meta
   -total calls: 1
   -total triples: 16
   -total runtime: 5 ms!
   -tripls/ms: 3
   -ms/calls: 5
>Extractor: html-mf-adr
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-hcalendar
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-hresume
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-hreview
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: consolidation-extractor
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-xpath
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-head-title
   -total calls: 1
   -total triples: 1
   -total runtime: 1 ms!
   -tripls/ms: 1
   -ms/calls: 1
>Extractor: html-mf-hcard
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-rdfa11
   -total calls: 1
   -total triples: 44
   -total runtime: 33 ms!
   -tripls/ms: 1
   -ms/calls: 33
>Extractor: html-mf-hreview-aggregate
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-license
   -total calls: 1
   -total triples: 0
   -total runtime: 3 ms!
   -tripls/ms: 0
   -ms/calls: 3
>Extractor: html-mf-xfn
   -total calls: 1
   -total triples: 0
   -total runtime: 2 ms!
   -tripls/ms: 0
   -ms/calls: 2
>Extractor: html-mf-species
   -total calls: 1
   -total triples: 0
   -total runtime: 1 ms!
   -tripls/ms: 0
   -ms/calls: 1
>Extractor: html-mf-hlisting
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-microdata
   -total calls: 1
   -total triples: 0
   -total runtime: 2 ms!
   -tripls/ms: 0
   -ms/calls: 2
>Extractor: html-mf-hrecipe
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: html-embedded-jsonld
   -total calls: 1
   -total triples: 128
   -total runtime: 627 ms!
   -tripls/ms: 0
   -ms/calls: 627
>Extractor: html-head-links
   -total calls: 

Re: [ROLL CALL] Apache Any23 project interest

2020-09-22 Thread Lewis John McGibbney
Hi Claude,

Thanks for your input. I agree with you. 

I looked at the issues registered in JIRA 
https://issues.apache.org/jira/projects/ANY23/issues/ANY23-294?filter=allopenissues

Many of these are optimizations rather than new features. Also, there are only 
39 issues identified for the entire project right now... which indicates to me 
that the library is pretty stable.

Lewis

On 2020/09/22 21:24:59, Claude Warren  wrote: 
> I have not had time to develop anything for any23 but I have used it and
> think that not retiring the project is the best course of action.  As long
> as someone is around to check on security issues and make sure the code is
> at a minimum java level, perhaps it does not need more development.
> 
> Claude
> 
> On Sun, Sep 20, 2020 at 8:24 PM lewis john mcgibbney 
> wrote:
> 
> > Hi user@, dev@,
> > I was unable to file last week's board report for the project :(
> > I did see the board feedback which encouraged us to hold a roll call to
> > see who is around and what interest there is in Any23.
> > Is there anyone out there? A simple response here gives me an idea of
> > whether it is worthwhile continuing with the project.
> > Thanks all,
> > lewismc
> >
> > --
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
> >
> 
> 
> -- 
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
> 


[VOTE] Release Apache Any23 2.4

2020-09-20 Thread lewis john mcgibbney
Hi,

Please VOTE on the release candidate for Apache Any23 2.4

We solved 39 issues:
https://issues.apache.org/jira/projects/ANY23/versions/12344593

Git source tag (151b916

):
https://github.com/apache/any23/tree/any23-2.4

Staging repo:
https://repository.apache.org/content/repositories/orgapacheany23-1008

Staging binaries:
https://dist.apache.org/repos/dist/release/any23/2.3/

PGP release keys (signed using 48BAEBF6):
https://dist.apache.org/repos/dist/release/any23/KEYS

Vote will be open for 72 hours.

[ ] +1, release this as Apache Any23 2.4
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)

P.S. Here is my +1

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[ROLL CALL] Apache Any23 project interest

2020-09-20 Thread lewis john mcgibbney
Hi user@, dev@,
I was unable to file last week's board report for the project :(
I did see the board feedback which encouraged us to hold a roll call to see
who is around and what interest there is in Any23.
Is there anyone out there? A simple response here gives me an idea of
whether it is worthwhile continuing with the project.
Thanks all,
lewismc

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RELEASE] Apache Any23 2.3

2019-03-04 Thread lewis john mcgibbney
The Apache Any23 Team is pleased to announce the release of Apache Any23
2.3.

Apache Anything To Triples (Any23) is a library, a web service and a
command line tool that
extracts structured data in RDF format from a variety of Web documents.

Release Notes:
https://github.com/apache/any23/blob/any23-2.3/RELEASE-NOTES.txt

Download: http://any23.apache.org/download.html

Maven Artifacts: https://s.apache.org/mwOE

DOAP: https://github.com/apache/any23-committers/blob/master/doap_Any23.rdf

Have Fun,
(Lewis), on behalf of the Apache Any23 PMC
N.B. The release artifacts can take a bit of time to reach the distribution
servers, please be patient.
-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[ANNOUNCE] Apache Any23 2.2

2018-03-23 Thread lewis john mcgibbney
The Apache Any23 Team is pleased to announce the release of Apache Any23
2.2.

*What is Any23?*

Anything To Triples (Any23) is a library, a web service and a command line
tool that extracts structured data in RDF format from a variety of Web
documents. Currently it supports the following input formats:

   - RDF/XML , Turtle
   , Notation 3
   
   - RDFa  with RDFa1.1 prefix
   mechanism
   
   - Microformats1  and Microformats2
   : hAdr, hCard, hCalendar,
   hEntry, hEvent, hGeo, hItem, hListing, hProduct, hProduct, hRecipie,
   hResume, hReview, License, Species, XFN, etc
   - JSON-LD : JSON for Linking Data. a lightweight
   Linked Data format based on the already successful JSON format and provides
   a way to help JSON data interoperate at Web-scale.
   - HTML5 Microdata : (such as Schema.org
   )
   - CSV : Comma Separated Values with
   separator autodetection.
   - Vocabularies: Extraction support for Dublin Core Terms
   , Description of a Career
   , Description Of
   A Project , Friend Of A Friend
   , GEO Names
   , ICAL
   , lkif-core
   , Open Graph Protocol
   , BBC Programmes Ontology ,
   RDF Review Vocabulary , schema.org,
   VCard , BBC Wildlife Ontology
    and XHTML
   ... and more!
   - YAML : human friendly data serialization
   standard for all programming languages.
   - Additionally, as of 2.1 Any23 provides functionality to extract
   triples using the Open Information Extraction (Open IE) system
   . The Open IE system runs
   over sentences and creates extractions that represent relations in text, in
   the case of Any23, this results in triples.

*Downloads*
http://any23.apache.org/download.html

*Release Notes:*

https://s.apache.org/YmRb

Have Fun,
Lewis, on behalf of the Apache Any23 Project Management Committee

-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.2 RC#2

2018-03-23 Thread lewis john mcgibbney
Hi Folks,
I am bringing this VOTE thread to a close.
Thank you to everyone that VOTE'd, the RESULT is below.


[4] +1, Release Apache Any23 2.2 RC#2
Hans Brende*
Lewis John McGibbney*
Jacek Grzebyta*
Reto Gmür*

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

*Any23 PMC

I'll go ahead and make the release.
Thanks
Lewis

On Mon, Mar 12, 2018 at 11:00 PM, lewis john mcgibbney <lewi...@apache.org>
wrote:

> Hi user@ and dev@,
> We need one more PMC VOTE to release here.
> Thanks to anyone able to take the time to do this.
> Best
> Lewis
>
> On Thu, Mar 1, 2018 at 3:49 PM, lewis john mcgibbney <lewi...@apache.org>
> wrote:
>
>> Hi Folks,
>>
>> I would like to open a VOTE on the Any23 2.2 RC#1
>>
>> We addressed 43 issues:
>> https://s.apache.org/LUYt
>>
>> Git source tag (b0d1d21b110b603c23f0ee12bd7e70ccc5c5c8bbf):
>> https://s.apache.org/SlRO
>>
>> Staging repo:
>> https://repository.apache.org/content/repositories/orgapacheany23-1006
>>
>> Staging binaries:
>> https://dist.apache.org/repos/dist/dev/any23/2.2/
>>
>> PGP release keys (signed using 48BAEBF6):
>> https://dist.apache.org/repos/dist/release/any23/KEYS
>>
>> Vote will be open for 72 hours.
>>
>> [ ] +1, Release Apache Any23 2.2 RC#2
>> [ ] +/-0, fine, but consider to fix few issues before...
>> [ ] -1, nope, because... (and please explain why)
>>
>> P.S. Here is my +1
>>
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc
>>
>
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Release Apache Any23 2.2 RC#2

2018-03-13 Thread lewis john mcgibbney
Hi user@ and dev@,
We need one more PMC VOTE to release here.
Thanks to anyone able to take the time to do this.
Best
Lewis

On Thu, Mar 1, 2018 at 3:49 PM, lewis john mcgibbney <lewi...@apache.org>
wrote:

> Hi Folks,
>
> I would like to open a VOTE on the Any23 2.2 RC#1
>
> We addressed 43 issues:
> https://s.apache.org/LUYt
>
> Git source tag (b0d1d21b110b603c23f0ee12bd7e70ccc5c5c8bbf):
> https://s.apache.org/SlRO
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1006
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.2/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, Release Apache Any23 2.2 RC#2
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Release Apache Any23 2.2 RC#2

2018-03-09 Thread lewis john mcgibbney
PING, we need one more PMC +1

On Thu, Mar 1, 2018 at 3:49 PM, lewis john mcgibbney <lewi...@apache.org>
wrote:

> Hi Folks,
>
> I would like to open a VOTE on the Any23 2.2 RC#1
>
> We addressed 43 issues:
> https://s.apache.org/LUYt
>
> Git source tag (b0d1d21b110b603c23f0ee12bd7e70ccc5c5c8bbf):
> https://s.apache.org/SlRO
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1006
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.2/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, Release Apache Any23 2.2 RC#2
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Release Apache Any23 2.2 RC#2

2018-03-01 Thread lewis john mcgibbney
Hi Folks,
There is a temporary issue uploading the larger service artifacts to the
staging SVN server.
https://issues.apache.org/jira/browse/INFRA-16130
Please wait until this has been addressed before VOTE'ing on the RC.
Thank you
Lewis

On Thu, Mar 1, 2018 at 3:49 PM, lewis john mcgibbney <lewi...@apache.org>
wrote:

> Hi Folks,
>
> I would like to open a VOTE on the Any23 2.2 RC#1
>
> We addressed 43 issues:
> https://s.apache.org/LUYt
>
> Git source tag (b0d1d21b110b603c23f0ee12bd7e70ccc5c5c8bbf):
> https://s.apache.org/SlRO
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1006
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.2/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, Release Apache Any23 2.2 RC#2
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


[RESULT] WAS Re: [VOTE] Release Apache Any23 2.2

2018-02-09 Thread lewis john mcgibbney
Hi Folks,
Thank you to everyone who was able to VOTE on the Any23 2.2 RC#1.
The RESULT is below


[4] +1, Release Apache Any23 2.2
Lewis McGibbney*
Chris Mattmann*
Hans Brende
Jacek Grzebyta*

[ ] +/-0, fine, but consider to fix few issues before...

[1] -1, nope, because... (and please explain why)
Andy Seaborne* - https://s.apache.org/SnNQ

* Any23 PMC

The VOTE does not pass as there are blocking issues as highlighted by Andy.
I will rollback and reproduce an RC#2.
Lewis

On Thu, Jan 25, 2018 at 11:24 AM, lewis john mcgibbney <lewi...@apache.org>
wrote:

> Hi Folks,
>
> I would like to open a VOTE on the Any23 2.2 RC#1
>
> We solved 40 issues:
> https://s.apache.org/BT4V
>
> Git source tag (b6ed4cfa288b29068c5d822f666ff38814c947c9):
> https://s.apache.org/GOLk
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1004
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.2/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, Release Apache Any23 2.2
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: [VOTE] Release Apache Any23 2.2

2018-02-02 Thread lewis john mcgibbney
PING please, thank you

On Thu, Jan 25, 2018 at 11:24 AM, lewis john mcgibbney <lewi...@apache.org>
wrote:

> Hi Folks,
>
> I would like to open a VOTE on the Any23 2.2 RC#1
>
> We solved 40 issues:
> https://s.apache.org/BT4V
>
> Git source tag (b6ed4cfa288b29068c5d822f666ff38814c947c9):
> https://s.apache.org/GOLk
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1004
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/2.2/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/release/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, Release Apache Any23 2.2
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>



-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc


Re: parse broken uri

2017-12-12 Thread lewis john mcgibbney
Hello Alfonso,
This took me a while longer than expected to get around to providing a fix.
Please see https://issues.apache.org/jira/browse/ANY23-314 and proposed fix
at https://github.com/apache/any23/pull/49
For the service specifically, the primary issue was that the TripleHandler
was not being closed upon an unsuccessful extraction e.g. we were failing
to write successful triples even if some were detected prior to a parse
error. That has now been fixed.
Thanks
Lewis

On Tue, Dec 12, 2017 at 4:46 AM,  wrote:

> From: alfonso.debi...@libero.it
> To: user@any23.apache.org
> Cc:
> Bcc:
> Date: Thu, 30 Nov 2017 17:49:45 +0100 (CET)
> Subject: Re: parse broken uri
>
> Hi Lewis,
>
> l'Uri is this: https://www.jobcluster.de,
>
> Thanks for the reply.
>
>


Re: Model for any32 core vs nquads

2017-12-12 Thread lewis john mcgibbney
Hello Anna,
As you've seen the any23-nquads module and associated functionality [0] was
removed (I believe with 2.0 release) with the functionality being baked
back in to the relevant any23-core extractor [1] and writer [2] packages
respectively.
You should be able to use the NQUADS implementation more easily now via the
common extractor and writer interfaces provided.
With regards to the result, any feedback on inconsistencies or anomalies
would be appreciated.
hth
Lewis

[0]
https://github.com/apache/any23/tree/any23-1.1/nquads/src/main/java/org/apache/any23/io/nquads
[1]
https://github.com/apache/any23/tree/master/core/src/main/java/org/apache/any23/extractor/rdf
[2]
https://github.com/apache/any23/tree/master/core/src/main/java/org/apache/any23/writer

On Tue, Dec 12, 2017 at 4:46 AM,  wrote:

>
> From: Anna Primpeli 
> To: 
> Cc:
> Bcc:
> Date: Tue, 12 Dec 2017 13:46:08 +0100
> Subject: Model for any32 core vs nquads
>
> Hello Any23 team,
>
>
>
> I am using any23-core (version 1.1) for my project and I would like to
> update to the current version, 2.1.
>
> In the same project we make use of any23-nquads as well. As it seems there
> is an incompatibility on the rdf model dependency. The current version of
> any23-core uses RDF4J while any23-nquads uses sesame dependencies.
>
>
>
> Is there a plan of updating the any23-nquads model to RDF4J as well? Is
> there any workaround possible or should I stick to the old version of
> any23-core?
>
>
>
> Thank you very much in advance!
>
>
>
> Best,
>
> Anna
>
>
>
>
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: parse broken uri

2017-11-30 Thread lewis john mcgibbney
Hi Alfonso,
Can you please provide us with a URI which reproduces this issue?
If we can reproduce it, then we can register a ticket over at
https://issues.apache.org/jira/projects/ANY23
Thanks

On Thu, Nov 30, 2017 at 6:48 AM,  wrote:

>
> From: alfonso.debi...@libero.it
> To: user@any23.apache.org
> Cc:
> Bcc:
> Date: Thu, 30 Nov 2017 15:48:01 +0100 (CET)
> Subject: parse broken uri
>
> Hi users, I’m using any23 version 2.0 in my project, I have tested the
> extraction of RDF microformats from HTML pages. In this HTML there is an
> inconsistent URI, without protocol specification (example: //
> any23.apache.org instead of https://any23.apache.org )
>
> The library gives me the log:
>
> WARN rdf.Any23ValueFactoryWrapper: Not a valid (absolute) IRI:
>
> INFO extractor.SingleDocumentExtraction: Processing null
>
> I am seeing the method fixIRIWithException that fixes some potentially
> broken relative or absolute URI, but for this case it doesn’t fix this
> problem.Is it possible to integrate a patch to solve this problem? Thanks
>
> Best regards,
>
> Alfonso
>
>


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: [VOTE] Release Apache Any23 2.1 RC#1

2017-09-19 Thread lewis john mcgibbney
Hi Folks,
@Andy,
Thanks for pointing this out, it is a shortcoming in my VOTE email.
The actual proposed release artifacts ca be located at
https://dist.apache.org/repos/dist/dev/any23/2.1/
Thanks
Lewis

On Fri, Sep 15, 2017 at 1:11 AM, lewis john mcgibbney <lewi...@apache.org>
wrote:

> Hi user@ and dev@,
> Thank you to everyone who worked and used Any23 on it's 2.1-SNAPSHOT
> development drive, I would likVOTE to release Any23 2.1.
>
> We solved 10 issues:
> https://s.apache.org/34tJ
>
> Git source tag (fc6fd91df338f793da58bb80368ac421f544f7ee):
> https://s.apache.org/wO73
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1003/
>
> PGP release keys (signed using 48BAEBF6):
> https://dist.apache.org/repos/dist/dev/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, release Any23 2.1
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> N.B. +1 from me
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


any23.org Web Service Restored

2017-06-14 Thread lewis john mcgibbney
Hi Folks,
I recently worked on restoring our service to any23.org (via
any23-vm2.apache.org). It is running off of *Apache Any23 v.2.1-SNAPSHOT
(2017-06-05 19:11:41+) *as shown on the bottom of the page.
Please report any issue to our JIRA instance.
Thank you to INFRA for all of the assistance.
Best
Lewis

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: Use of Parentheses in IRI's

2017-04-26 Thread lewis john mcgibbney
Hi Andy,
Thanks for the response on this one.

On Wed, Apr 26, 2017 at 6:46 AM,  wrote:

>
> From: Andy Seaborne 
> To: user@any23.apache.org
> Cc:
> Bcc:
> Date: Wed, 12 Apr 2017 08:41:24 +0100
> Subject: Re: Use of Parentheses in IRI's
> It's not a W3C standard
>
> IRIs are RFC 3987
> URIs are RFC 3986
>
> Parentheses are legal in the path part, in the query string and in the
> fragment.
>
>ipchar = iunreserved / pct-encoded / sub-delims / ":"
>   / "@"
>
>sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
>   / "*" / "+" / "," / ";" / "="
>
> Andy
>
> http://www.sparql.org/iri-validator.html
>
>


Re: [VOTE] Release Apache Any23 2.0

2017-02-21 Thread lewis john mcgibbney
Hi Folks,
Another PING to on this thread. Thank you to everyone who has been able to
review and VOTE.
@Renato, did you try again to test and are you able to review?
Lewis

On Fri, Feb 10, 2017 at 2:45 PM, lewis john mcgibbney <lewi...@apache.org>
wrote:

> Hi user@ and dev@,
>
> I would like to open a VOTE thread to release Apache Any23 2.0. This VOTE
> will be open for at least 72 hours.
>
> We solved 40 issues:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12312323=12338100
>
>
> Git source tag signature (de5e1dbc4cd9e077062a5fbb02b9314fdae13df8):
> https://git-wip-us.apache.org/repos/asf?p=any23.git;a=tag;h=
> 653ef9dedb9417fe81ca4e8b2688e5c5343295f7
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1002/
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/
>
> PGP release keys (signed using 48BAEBF6):
> http://apache.org/dist/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, let's get it rmblee!!!
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: [VOTE] Release Apache Any23 2.0

2017-02-17 Thread lewis john mcgibbney
Hi Renato,
I can't reproduce the errors you are getting. With the same artifact your
trying to test, I get the following

[INFO]

[INFO] Reactor Summary:
[INFO]
[INFO] Apache Any23 ... SUCCESS [
3.655 s]
[INFO] Apache Any23 :: Base API ... SUCCESS [
3.620 s]
[INFO] Apache Any23 :: Test Resources . SUCCESS [
0.387 s]
[INFO] Apache Any23 :: CSV Utilities .. SUCCESS [
1.129 s]
[INFO] Apache Any23 :: Mime Type Detection  SUCCESS [
3.373 s]
[INFO] Apache Any23 :: Encoding Detection . SUCCESS [
1.968 s]
[INFO] Apache Any23 :: Core ... SUCCESS [
13.073 s]
[INFO] Apache Any23 :: CLI  SUCCESS [
11.624 s]
[INFO] Apache Any23 :: Plugins :: Basic Crawler ... SUCCESS [
21.691 s]
[INFO] Apache Any23 :: Plugins :: HTML Scraper  SUCCESS [
2.831 s]
[INFO] Apache Any23 :: Plugins :: Office Scraper .. SUCCESS [
3.829 s]
[INFO] Apache Any23 :: Plugins :: Integration Test  SUCCESS [
47.188 s]
[INFO] Apache Any23 :: Service  SUCCESS [
21.753 s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 02:16 min
[INFO] Finished at: 2017-02-17T10:42:08-08:00
[INFO] Final Memory: 71M/930M
[INFO]


On Thu, Feb 16, 2017 at 10:33 AM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

>
>
>   EmbeddedJSONLDExtractorTest.testEmbeddedJSONLDInHead:30->Abs
> tractExtractorTestCase.assertExtract:223->AbstractExtractorTestCase.assertExtract:210
> ? Runtime
>
>
>   EmbeddedJSONLDExtractorTest.testSeveralEmbeddedJSONLDInHead:
> 37->AbstractExtractorTestCase.assertExtract:223->AbstractExt
> ractorTestCase.assertExtract:210 ? Runtime
>
> 2017-02-16 19:25 GMT+01:00 lewis john mcgibbney <lewi...@apache.org>:
>
>> No, we build against Java 8. Can you look further into what tests are
>> failing and if there are any specific error messages?
>> Thanks
>>
>>
>> On Thu, Feb 16, 2017 at 10:00 AM, Renato Marroquín Mogrovejo <
>> renatoj.marroq...@gmail.com> wrote:
>>
>>> I downloaded this one in here:
>>>
>>>- apache-any23-2.0-src.tar.gz
>>>
>>> <https://dist.apache.org/repos/dist/dev/any23/apache-any23-2.0-src.tar.gz>
>>> wait a second, now that I think about it, it might be because
>>>of JAVA8? could that be it?
>>>
>>>
>>> 2017-02-16 18:57 GMT+01:00 Mcgibbney, Lewis J (398M) <
>>> lewis.j.mcgibb...@jpl.nasa.gov>:
>>>
>>>> Damn. No there should not be any issues. The build is stable
>>>> https://builds.apache.org/job/Any23-trunk/
>>>>
>>>> I’ll have a look later, I cannot reproduce this but maybe I am wrong.
>>>>
>>>>
>>>>
>>>> Dr. Lewis John McGibbney Ph.D., B.Sc.
>>>>
>>>> Data Scientist II
>>>>
>>>> Computer Science for Data Intensive Applications Group 398M
>>>>
>>>> Jet Propulsion Laboratory
>>>>
>>>> California Institute of Technology
>>>>
>>>> 4800 Oak Grove Drive
>>>>
>>>> Pasadena, California 91109-8099
>>>>
>>>> Mail Stop : 158-256C
>>>>
>>>> Tel:  (+1) (818)-393-7402 <(818)%20393-7402>
>>>>
>>>> Cell: (+1) (626)-487-3476 <(626)%20487-3476>
>>>>
>>>> Fax:  (+1) (818)-393-1190 <(818)%20393-1190>
>>>>
>>>> Email: lewis.j.mcgibb...@jpl.nasa.gov
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  Dare Mighty Things
>>>>
>>>>
>>>>
>>>> *From: *Renato Marroquín Mogrovejo <renatoj.marroq...@gmail.com>
>>>> *Date: *Thursday, February 16, 2017 at 9:55 AM
>>>> *To: *"Mcgibbney, Lewis J (398M)" <lewis.j.mcgibb...@jpl.nasa.gov>,
>>>> Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
>>>> *Subject: *Fwd: [VOTE] Release Apache Any23 2.0
>>>>
>>>>
>>>>
>>>> I did and it's broken, I wrote to the list mate
>>>>
>>>>
>>>>
>>>> I tried doing  and core fails compiling in:

Re: [VOTE] Release Apache Any23 2.0

2017-02-16 Thread lewis john mcgibbney
No, we build against Java 8. Can you look further into what tests are
failing and if there are any specific error messages?
Thanks

On Thu, Feb 16, 2017 at 10:00 AM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> I downloaded this one in here:
>
>- apache-any23-2.0-src.tar.gz
><https://dist.apache.org/repos/dist/dev/any23/apache-any23-2.0-src.tar.gz>
> wait a second, now that I think about it, it might be because of
>JAVA8? could that be it?
>
>
> 2017-02-16 18:57 GMT+01:00 Mcgibbney, Lewis J (398M) <
> lewis.j.mcgibb...@jpl.nasa.gov>:
>
>> Damn. No there should not be any issues. The build is stable
>> https://builds.apache.org/job/Any23-trunk/
>>
>> I’ll have a look later, I cannot reproduce this but maybe I am wrong.
>>
>>
>>
>> Dr. Lewis John McGibbney Ph.D., B.Sc.
>>
>> Data Scientist II
>>
>> Computer Science for Data Intensive Applications Group 398M
>>
>> Jet Propulsion Laboratory
>>
>> California Institute of Technology
>>
>> 4800 Oak Grove Drive
>>
>> Pasadena, California 91109-8099
>>
>> Mail Stop : 158-256C
>>
>> Tel:  (+1) (818)-393-7402 <(818)%20393-7402>
>>
>> Cell: (+1) (626)-487-3476 <(626)%20487-3476>
>>
>> Fax:  (+1) (818)-393-1190 <(818)%20393-1190>
>>
>> Email: lewis.j.mcgibb...@jpl.nasa.gov
>>
>>
>>
>>
>>
>>
>>
>>  Dare Mighty Things
>>
>>
>>
>> *From: *Renato Marroquín Mogrovejo <renatoj.marroq...@gmail.com>
>> *Date: *Thursday, February 16, 2017 at 9:55 AM
>> *To: *"Mcgibbney, Lewis J (398M)" <lewis.j.mcgibb...@jpl.nasa.gov>,
>> Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
>> *Subject: *Fwd: [VOTE] Release Apache Any23 2.0
>>
>>
>>
>> I did and it's broken, I wrote to the list mate
>>
>>
>>
>> I tried doing  and core fails compiling in:
>>
>>
>>
>> Tests in error:
>>
>>   EmbeddedJSONLDExtractorTest.testEmbeddedJSONLDInHead:30->Abs
>> tractExtractorTestCase.assertExtract:223->AbstractExtractorTestCase.assertExtract:210
>> ? Runtime
>>
>>   EmbeddedJSONLDExtractorTest.testSeveralEmbeddedJSONLDInHead:
>> 37->AbstractExtractorTestCase.assertExtract:223->AbstractExt
>> ractorTestCase.assertExtract:210 ? Runtime
>>
>>
>>
>> Is this expected? are there any JIRA issues tracking this?
>>
>>
>>
>>
>>
>> -- Forwarded message --
>> From: *Mcgibbney, Lewis J (398M)* <lewis.j.mcgibb...@jpl.nasa.gov>
>> Date: 2017-02-16 18:46 GMT+01:00
>> Subject: Re: [VOTE] Release Apache Any23 2.0
>> To: Renato Marroquín Mogrovejo <renatoj.marroq...@gmail.com>, Lewis John
>> Mcgibbney <lewis.mcgibb...@gmail.com>
>>
>> Yeah, if you can. Also, can you please check the signatures of the
>> artifacts. Thanks.
>>
>>
>>
>> Dr. Lewis John McGibbney Ph.D., B.Sc.
>>
>> Data Scientist II
>>
>> Computer Science for Data Intensive Applications Group 398M
>>
>> Jet Propulsion Laboratory
>>
>> California Institute of Technology
>>
>> 4800 Oak Grove Drive
>>
>> Pasadena, California 91109-8099
>>
>> Mail Stop : 158-256C
>>
>> Tel:  (+1) (818)-393-7402 <(818)%20393-7402>
>>
>> Cell: (+1) (626)-487-3476 <(626)%20487-3476>
>>
>> Fax:  (+1) (818)-393-1190 <(818)%20393-1190>
>>
>> Email: lewis.j.mcgibb...@jpl.nasa.gov
>>
>>
>>
>>
>>
>>
>>
>>  Dare Mighty Things
>>
>>
>>
>> *From: *Renato Marroquín Mogrovejo <renatoj.marroq...@gmail.com>
>> *Date: *Wednesday, February 15, 2017 at 11:40 PM
>> *To: *Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>, "Mcgibbney,
>> Lewis J (398M)" <lewis.j.mcgibb...@jpl.nasa.gov>
>> *Subject: *Re: [VOTE] Release Apache Any23 2.0
>>
>>
>>
>> How do i test this man?mvn clean package and that's it?
>>
>> I'm happy to help out
>>
>>
>>
>> On Feb 16, 2017 12:48 AM, "lewis john mcgibbney" <lewi...@apache.org>
>> wrote:
>>
>> PING folks.
>> Would be nice to get reviews on this release candidate if possible. If we
>> can't get any then I'll approach community@ and/or the Incubator.
>> Thank you
>>
>> On Fri, Feb 10, 2017 at 2:45 PM, lewis john mcgibbney <lewi...@apache.org
>> >
>> wr

Re: [VOTE] Release Apache Any23 2.0

2017-02-15 Thread lewis john mcgibbney
PING folks.
Would be nice to get reviews on this release candidate if possible. If we
can't get any then I'll approach community@ and/or the Incubator.
Thank you

On Fri, Feb 10, 2017 at 2:45 PM, lewis john mcgibbney <lewi...@apache.org>
wrote:

> Hi user@ and dev@,
>
> I would like to open a VOTE thread to release Apache Any23 2.0. This VOTE
> will be open for at least 72 hours.
>
> We solved 40 issues:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12312323=12338100
>
>
> Git source tag signature (de5e1dbc4cd9e077062a5fbb02b9314fdae13df8):
> https://git-wip-us.apache.org/repos/asf?p=any23.git;a=tag;h=
> 653ef9dedb9417fe81ca4e8b2688e5c5343295f7
>
> Staging repo:
> https://repository.apache.org/content/repositories/orgapacheany23-1002/
>
> Staging binaries:
> https://dist.apache.org/repos/dist/dev/any23/
>
> PGP release keys (signed using 48BAEBF6):
> http://apache.org/dist/any23/KEYS
>
> Vote will be open for 72 hours.
>
> [ ] +1, let's get it rmblee!!!
> [ ] +/-0, fine, but consider to fix few issues before...
> [ ] -1, nope, because... (and please explain why)
>
> P.S. Here is my +1
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>



-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Registering basic-crawler plugin

2017-01-25 Thread lewis john mcgibbney
Hi Folks,
I'm working off of master and trying to register the basic-crawler plugin.
I'm following our website documentation (working around the obvious
documentation error, which I will correct in a subsequent PR) which can be
found at [0] in order to register the basic-crawler, followed by [1] to
attempt to use it.
I've stepped through the code in Eclipse and can see the local plugin
repository being interpreted correctly as per the code at [2], however it
is not appearing in the ToolRunner invocation... which is puzzling me.
Can someone else please try this out and let me know how you get on?
Thanks


[0] http://any23.apache.org/any23-plugins.html#How_to_Register_a_Plugin
[1] http://any23.apache.org/getting-started.html#crawler-tool
[2]
https://github.com/apache/any23/blob/master/cli/src/main/java/org/apache/any23/cli/ToolRunner.java#L244-L261

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


Re: Issues while building and using Any23

2016-06-16 Thread Lewis John Mcgibbney
Hi Wouter,

On Thu, Jun 9, 2016 at 4:17 AM,  wrote:

> From: Wouter Beek 
> To: user@any23.apache.org
> Cc:
> Date: Thu, 9 Jun 2016 14:16:37 +0300
> Subject: Issues while building and using Any23
> Hi Any23 maintainers,
>
> I'm trying to install from Git master.
>

Cool. Nice to hear more people running off of master branch.


> I've come across the following
> issues:
>
> 1. I had to add `true` to the Surefire plugin
> configuration in `pom.xml` in order to suppress the test-related errors in
> `mvn clean install`.  Maybe these tests could be put behind `mvn test` so
> that the casual user who compiles from sources does not have to bother with
> them?  (The tests also print a _lot_ of stuff to user output.  Not all of
> it seems useful under the default verbosity level.)
>

OK so we are aware of the tests failing this has to do with one of the
underlying SAX parsers (which actually exists over in semargl) being very
strict with its interpretation of the InputStream.
There is an open pull request to address this but it needs more work. If
you are interested then you can find current patch and discussion over at
https://github.com/apache/any23/pull/24

Second issue regarding verbose nature of logs has been addressed and pushed
to master branch cf. https://issues.apache.org/jira/browse/ANY23-293

This now also means that you only get INFO logging when running the Any23
core application.


>
> 2. Since my distro comes with JDK 1.8 (and switching JDK versions has
> always been somewhat of a Black Art for me) I had to remove
> `-XX:PermSize=128m` from the `` setting in `pom.xml`.
> This JVM feature is no longer supported in Java 8, aparently.
>

We are not fully migrated to JDK1.8 yet. There are a bunch of Javadoc
issues to deal with before we do that. Most likely we will do that for the
1.3 release of Any23 e.g. after the pending 1.2 release.


>
> 3. When I run `bin/any23` from the core package I always see the following
> at the top of user output:
>
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
> further details.
>
> To me this seems as if a default setup for the logging infrastructure is
> currently missing?
>

This has also been fixed cf. https://issues.apache.org/jira/browse/ANY23-293
and https://issues.apache.org/jira/browse/ANY23-292
If you pull from master branch the logging will be much more eye friendly
now!


>
> 4. The help flag does not seem to work for me in the CLI:
>
> $ any23 rover -h
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
> further details.
> Exception in thread "main" com.beust.jcommander.ParameterException:
> Unknown option: -h
> at com.beust.jcommander.JCommander.parseValues(JCommander.java:735)
> at com.beust.jcommander.JCommander.parse(JCommander.java:279)
> at com.beust.jcommander.JCommander.parse(JCommander.java:262)
> at com.beust.jcommander.JCommander.parseValues(JCommander.java:780)
> at com.beust.jcommander.JCommander.parse(JCommander.java:279)
> at com.beust.jcommander.JCommander.parse(JCommander.java:262)
> at org.apache.any23.cli.ToolRunner.execute(ToolRunner.java:96)
> at org.apache.any23.cli.ToolRunner.main(ToolRunner.java:69)
>
> Has something gone wrong during Maven install / can others reproduce this
> error?
>

It works absolutely fine for me

lmcgibbn@LMC-032857
/usr/local/any23/core/target/apache-any23-core-1.2-SNAPSHOT(master) $
./bin/any23 -h
Usage: any23 [options] [command] [command options]
  Options:
-h, --help
   Display help information.
   Default: false
--plugins-dir
   The Any23 plugins directory.
   Default: /Users/lmcgibbn/.any23/plugins
-X, --verbose
   Produce execution verbose output.
   Default: false
-v, --version
   Display version information.
   Default: false
  Commands:
extractor  Utility for obtaining documentation about metadata
extractors.
  Usage: extractor [options] Extractor name
Options:
  -a, --all
 shows a report about all available extractors
 Default: false
  -i, --input
 shows example input for the given extractor
 Default: false
  -l, --list
 shows the names of all available extractors
 Default: false
  -o, --outut
 shows example output for the given extractor
 Default: false

microdata  Commandline Tool for extracting Microdata from file/HTTP
source.
  Usage: microdata [options] Input document URL, {
http://path/to/resource.html|file:/path/to/localFile.html}

mimes 

Re: GSOC 2016

2016-04-01 Thread Lewis John Mcgibbney
Hi Cihad,

On Thu, Mar 31, 2016 at 1:25 PM,  wrote:

>
> From: Cihad Guzel 
> To: user@any23.apache.org
> Cc:
> Date: Thu, 24 Mar 2016 10:35:01 +0200
> Subject: Re: GSOC 2016
> Hi Lewis.
>
> Yes. I am still interested.  I'm open to any suggestions
>
>
sI didn't see anything in the GSoC tracker. I may have missed it. Did you
have a link to your proposal please?
Thanks
Lewis


Re: GSOC 2016

2016-03-23 Thread Lewis John Mcgibbney
Hi Cihad,
Are you still interested in this? If so then please let us know and we can
find you a project.
Thank you
Lewis

On Mon, Mar 7, 2016 at 2:38 PM,  wrote:

>
> From: Cihad Guzel 
> To: d...@any23.apache.org, user@any23.apache.org
> Cc:
> Date: Tue, 8 Mar 2016 00:38:49 +0200
> Subject: GSOC 2016
> Hi,
>
> I want to apply GSOC 2016. I examined Any23. It sounds interesting. Could
> you suggest me any issue for GSOC ?
>
> I participated in GSOC 2015 for Apache Nutch last year and I completed it
> successfully.
>
> You can see my previous works for GSOC :
> https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler
>
> Kind Regards
> Cihad Guzel
>
>


-- 
*Lewis*


Fwd: private Digest 5 Feb 2016 18:05:42 -0000 Issue 149

2016-02-05 Thread Lewis John Mcgibbney
-- Forwarded message --
From: <private-digest-h...@any23.apache.org>
Date: Fri, Feb 5, 2016 at 10:05 AM
Subject: private Digest 5 Feb 2016 18:05:42 - Issue 149
To: priv...@any23.apache.org



private Digest 5 Feb 2016 18:05:42 - Issue 149

Topics (messages 407 through 407)

[REMINDER] ApacheCon NA 2016 Travel Assistance Applications now open!
407 by: lewis john mcgibbney

Administrivia:

-
To post to the list, e-mail: priv...@any23.apache.org
To unsubscribe, e-mail: private-digest-unsubscr...@any23.apache.org
For additional commands, e-mail: private-digest-h...@any23.apache.org

--



-- Forwarded message --
From: lewis john mcgibbney <lewi...@apache.org>
To: undisclosed-recipients:;
Cc:
Date: Fri, 5 Feb 2016 10:05:40 -0800
Subject: [REMINDER] ApacheCon NA 2016 Travel Assistance Applications now
open!
Hi pmcs@,

The Travel Assistance Committee (TAC) are pleased to announce that travel
assistance applications for ApacheCon North America 2016 are now open! This
announcement serves as a purpose for you (pmcs@) to let members of your
community know about both ApacheConNA 2016 and about the TAC assistance to
attend. Could you please forward this announcement to your community,
along  with (if possible) information on how your project is involved in
ApacheCon this year?

We will be supporting ApacheCon NA, Vancouver BC, May 9th - 13th 2016.

TAC exists to help those that would like to attend ApacheCon events, but
are unable to do so for financial reasons. For more info on this years
applications and qualifying criteria please visit the TAC website at <
http://www.apache.org/travel/ >.   Applications are already open, so don't
delay!

*Important dates*...

   - CFP Close: February 12, 2016
   - CFP Notifications: February 29, 2016
   - TAC Applications close:  March 2, 2016
   - Schedule Announced: March 3, 2016

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as required
to efficiently and accurately process your request), this will enable TAC
to announce successful awards shortly afterwards.

As usual TAC expects to deal with a range of applications from a diverse
range of backgrounds. We therefore encourage (as always) anyone thinking
about sending in an application to do so ASAP.

We look forward to greeting many of you in Vancouver, BC in May 2016!

Kind Regards

Lewis

(On behalf of the Travel Assistance Committee)




-- 
*Lewis*


Re: Processing Recipes

2015-12-07 Thread Lewis John Mcgibbney
Hi Frank,

Answer below

On Mon, Dec 7, 2015 at 3:50 PM,  wrote:

>
> Hi, Im trying to process recipes that are marked up, one example of such a
> recipe is:
>
> http://allrecipes.com/recipe/203229/moms-buttermilk-pancakes/
>
> This page can be processed by google rich snippets, but when I try the
> following it doesn't return results:
>
> any23 rover -e html-mf-hrecipe
> http://allrecipes.com/recipe/203229/moms-buttermilk-pancakes/
>
> Using the following I get json results but they are generic (not recipe
> specific):
>
> sudo any23 rover -e html-microdata
> http://allrecipes.com/recipe/203229/moms-buttermilk-pancakes/
>
> Am I missing something?  My ultimate goal is to get the recipe into a java
> object, what would be the best way to do that?
>
>
 When I try this with the Any23.org service at any23.org (running off of
Any23-trunk) I get the following error. Do you have another page we can try?
Thanks

org.apache.any23.extractor.ExtractionException: Error while parsing
RDF document.
at 
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:109)
at 
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:41)
at 
org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:463)
at 
org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:255)
at org.apache.any23.Any23.extract(Any23.java:298)
at org.apache.any23.Any23.extract(Any23.java:450)
at 
org.apache.any23.servlet.WebResponder.runExtraction(WebResponder.java:114)
at org.apache.any23.servlet.Servlet.doGet(Servlet.java:79)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:618)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:301)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:503)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:136)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:74)
at 
org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:526)
at 
org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:794)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:652)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1575)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1533)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.openrdf.rio.RDFParseException:
org.xml.sax.SAXParseException; lineNumber: 11; columnNumber: 788;
Element type "n.length" must be followed by either attribute
specifications, ">" or "/>".
at 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:111)
at 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:95)
at 
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:105)
... 29 more
Caused by: org.semarglproject.rdf.ParseException:
org.xml.sax.SAXParseException; lineNumber: 11; columnNumber: 788;
Element type "n.length" must be followed by either attribute
specifications, ">" or "/>".
at 
org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1130)
at org.semarglproject.source.XmlSource.process(XmlSource.java:50)
at 
org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87)
at 
org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167)
at 
org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154)
at 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:109)

Re: Extracting Meta Tags

2015-12-07 Thread Lewis John Mcgibbney
Hi Frank,

On Mon, Dec 7, 2015 at 3:50 PM,  wrote:

>
> I'm trying to extract meta tags from webpages.  I'm using the code below
> but am finding that only a small subset of meta tags are being returned.
> There are meta tags like those for facebook open graph that i am interested
> in that are not being returned?
>

By default Any23 Configuration [0] defines that HTML head meta tags should
be extracted by default. There is therefore no need to change this
behaviour as extraction of HTML meta tags 'should' be happening by default.
You are also correctly defining this within your code as below!
Can you please post an example of a URL we can test against?
Thanks
Lewis

[0]
https://github.com/apache/any23/blob/master/api/src/main/resources/default-configuration.properties#L70


Fwd: ApacheCon NA 2015 Travel Assistance Applications now open!

2015-12-07 Thread Lewis John Mcgibbney
Hi user@ and dev@,
Please see below for opportunities to obtain Travel Assistance funding to
attend the forthcoming ApacheCon NA which is being held Vancouver BC, May
9th - 13th 2016.
Would be great to see our project represented.
Best
Lewis

-- Forwarded message --
From: <private-digest-h...@any23.apache.org>
Date: Mon, Dec 7, 2015 at 8:15 PM
Subject: private Digest 8 Dec 2015 04:15:55 - Issue 143
To: priv...@any23.apache.org



private Digest 8 Dec 2015 04:15:55 - Issue 143

Topics (messages 399 through 400)

Re: Immediate change to git
399 by: Peter Ansell

ApacheCon NA 2015 Travel Assistance Applications now open!
400 by: lewis john mcgibbney

Administrivia:

-
To post to the list, e-mail: priv...@any23.apache.org
To unsubscribe, e-mail: private-digest-unsubscr...@any23.apache.org
For additional commands, e-mail: private-digest-h...@any23.apache.org

--



-- Forwarded message --
From: Peter Ansell <ansell.pe...@gmail.com>
To: "priv...@any23.apache.org" <priv...@any23.apache.org>
Cc: David Nalley <da...@gnsa.us>
Date: Wed, 4 Nov 2015 08:59:44 +1100
Subject: Re: Immediate change to git
Hi David,

Eventually, projects that create short-lived branches for each JIRA
issue, will not appreciate this measure, but that will not be a short
term effect. More of a medium term or long term effect.

I know I have been meaning to clean out the merged branches in the
Any23 git repository, but have not got around to it. Luckily, they
have not built up to the point where they are unmanageable yet, as
there is only 5 or so in the apache repository at this point so a
hiatus on that will not affect us.

Cheers,

Peter

On 4 November 2015 at 07:40, David Nalley <da...@gnsa.us> wrote:
> Hi folks,
>
> After the many emails you may have seen around Git, I am writing yet
another.
>
> To date, on our git repos, we've only 'protected' master, trunk, and
> release branches and tags. This has left other branches open to
> rewriting, force pushes, and branch deletion.
>
> Recently, we've discovered that many projects (just under 50) have one
> or more repos that are using something other than master or trunk as
> their main development branch. In some cases this is a 'develop'
> branch in others it's more like $project_version which leaves those
> branches open to deletion, rewriting, etc.
>
> So today, we're taking an interim step of disabling non-fast-forward
> pushes and branch deletion across all of our git repos. I emphasize
> interim, as it's a stop-gap measure to get us back to the level of
> protection we've set expectations for. I know that this will be
> disruptive to many folks' way of operating in their git environment,
> so we are hoping to make this interim solution short lived. If your
> project has immediate needs that you find are blocked by this, please
> do reach out to the Infrastructure team, and we will work to make sure
> we can help with a timely workaround for those specific cases.
>
> The longer term solution to this issue may be a policy decision or it
> might be a technical solution. I sadly don't know what that solution
> will be. We are going to be discussing this on the public
> infrastructure-dev mailing list, and I invite you to join us in that
> discussion.
>
> --David



-- Forwarded message --
From: lewis john mcgibbney <lewi...@apache.org>
To:
Cc: "travel-assista...@apache.org" <travel-assista...@apache.org>
Date: Mon, 7 Dec 2015 20:15:50 -0800
Subject: ApacheCon NA 2015 Travel Assistance Applications now open!
Hi pmcs@,

The Travel Assistance Committee (TAC) are pleased to announce that travel
assistance applications for ApacheCon NA 2016 are now open!

This announcement serves as a purpose for you (pmcs@) to let members of
your community know about both ApacheCon NA 2016 and about the TAC
assistance to attend. Could you please forward this announcement to your
community, along with (if possible) information on how your project is
involved in ApacheCon this year?

We will be supporting ApacheCon NA, Vancouver BC, May 9th - 13th 2016.

TAC exists to help those that would like to attend ApacheCon events, but
are unable to do so for financial reasons. For more info on this years
applications and qualifying criteria, please visit the TAC website at <
http://www.apache.org/travel/ >. Applications are already open, so don't
delay!

This year ApacheCon NA is split into two separate themed events - Apache
BigData
<http://events.linuxfoundation.org/events/apache-big-data-north-america>
and *ApacheCon*
<http://events.linuxfoundation.org/events/apachecon-north-america>. Due to
the small time frame of each event (3 days a

Re: [Fatal Error] :20:79: Element type arguments.length must be followed by either attribute specifications, or /.

2015-01-20 Thread Lewis John Mcgibbney
Hi Meraj,

On Tue, Jan 13, 2015 at 11:51 PM, user-digest-h...@any23.apache.org wrote:


 The web page is http://www.adorama.com/ICAE135B.html


OK, just ran it through any23-vm.apache.org and I get the following

@prefix foaf: http://xmlns.com/foaf/0.1/ .
@prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# .
@prefix doac: http://ramonantonio.net/doac/0.1/# .
@prefix dcterms: http://purl.org/dc/terms/ .

http://any23.org/tmp/ dcterms:title Canon PowerShot ELPH 135
Digital Camera, Black 9150B001 .

_:node7aec2e3e2888d5324234d4d14c4767b a http://schema.org/WebPage ;
http://schema.org/WebPage/breadcrumb 




Cameras








Canon Digital Point  Shoot Cameras








Canon PowerShot ELPH 135 Digital Camera, Black





 .

http://any23.org/tmp/ http://www.w3.org/1999/xhtml/microdata#item
_:node7aec2e3e2888d5324234d4d14c4767b ;
dcterms:title Canon PowerShot ELPH 135 Digital Camera, Black 9150B001 
;
http://www.w3.org/1999/xhtml/vocab#quickshippromo
http://any23.org/tmp//searchsite/popups/global.aspx?type=homepage_click_by_7
;
http://www.w3.org/1999/xhtml/vocab#productslistpop
http://any23.org/tmp//SearchSite/NewDesign/Controls/Popups/Configurations.aspx?sku=ICAE135BDebug=FType=Other
;
http://www.w3.org/1999/xhtml/vocab#popup500
http://any23.org/tmp//SearchSite/NewDesign/Controls/Popups/PriceNotificationControl.aspx?sku=ICAE135B
;
http://www.w3.org/1999/xhtml/vocab#pricealertpop
http://any23.org/tmp//SearchSite/NewDesign/Controls/Popups/PriceNotificationControl.aspx?sku=ICAE135B
;
http://www.w3.org/1999/xhtml/vocab#whyadoramapop
http://any23.org/tmp//SearchSite/Popups/Global.aspx?type=whyadorama-returns
, http://any23.org/tmp//SearchSite/Popups/Global.aspx?type=whyadorama-shipping
, http://any23.org/tmp//SearchSite/Popups/Global.aspx?type=whyadorama-trusted
;
http://www.w3.org/1999/xhtml/vocab#popup800
http://any23.org/tmp//SearchSite/Popups/Global.aspx?type=flexShopperInfo
;
http://www.w3.org/1999/xhtml/vocab#popup870
http://any23.org/tmp//searchsite/newdesign/controls/warrantyModal.aspx?sku=NLRPL1U100
;
http://www.w3.org/1999/xhtml/vocab#mediapopup
http://any23.org/tmp//SearchSite/NewDesign/Controls/Popups/Media.aspx?sku=ICAE135B
, 
http://any23.org/tmp//SearchSite/NewDesign/Controls/Popups/Media.aspx?sku=ICAE135B
;
http://www.w3.org/1999/xhtml/vocab#quickshippromo
http://any23.org/tmp//searchsite/popups/global.aspx?type=homepage_click_by_7
;
http://www.w3.org/1999/xhtml/vocab#nofollow
http://pinterest.com/pin/create/button/ ;
http://www.w3.org/1999/xhtml/vocab#publisher
https://plus.google.com/110837982014303057310 ;
http://www.w3.org/1999/xhtml/vocab#nofollow
https://facebook.com/Adorama , https://twitter.com/adorama ,
https://twitter.com/adorama ,
https://plus.google.com/110837982014303057310/posts ,
http://www.youtube.com/user/adoramatv ,
http://pinterest.com/adorama/ , http://instagram.com/adorama/ ,
http://instagram.com/adorama/ ;
http://www.w3.org/1999/xhtml/vocab#popup500
http://any23.org/tmp//catalog.tpl?op=privacy_policy_popup ;
http://www.w3.org/1999/xhtml/vocab#privacypop
http://any23.org/tmp//catalog.tpl?op=privacy_policy_popup ;
http://www.w3.org/1999/xhtml/vocab#popup500
http://any23.org/tmp//searchsite/popups/global.aspx?type=generalFeedback
;
http://www.w3.org/1999/xhtml/vocab#feedbackpopup
http://any23.org/tmp//searchsite/popups/global.aspx?type=generalFeedback
;
http://www.w3.org/1999/xhtml/vocab#nofollow
http://any23.org/tmp///www.scanalert.com/RatingVerify?ref=www.adorama.com
, https://www.bbb.org/online/consumer/cks.aspx?id=121000109 ,
http://www.bizrate.com/ratings_guide/cust_reviews__mid--22495.html ;
http://www.w3.org/1999/xhtml/vocab#ALTERNATE-STYLESHEET
http://any23.org/tmp//SearchSite/combres.axd/CurrentHeaderCss/-1495139106/
, http://any23.org/tmp//SearchSite/combres.axd/NewProductPageCss/1799281587/
;
http://www.w3.org/1999/xhtml/vocab#canonical
http://www.adorama.com/ICAE135B.html ;
http://www.w3.org/1999/xhtml/vocab#keywords buy, shop Canon
PowerShot ELPH 135 Digital Camera, 16MP, 8x Optical Zoom, 720p HD
Video, Digital Image Stabilization, Black, MPN: 9150B001 SKU:
ICAE135B ;
http://www.w3.org/1999/xhtml/vocab#description Same Day Shipping
till 8PM on new Canon PowerShot ELPH 135 Digital Camera, 16MP, 8x
Optical Zoom, 720p HD Video, Digital Image Stabilization, Black. MPN
9150B001 SKU ICAE135B. From Adorama.com - more than a camera store. ;
http://www.w3.org/1999/xhtml/vocab#twitter:site @adorama ;
http://www.w3.org/1999/xhtml/vocab#twitter:creator @adorama ;
http://www.w3.org/1999/xhtml/vocab#twitter:card product ;
http://www.w3.org/1999/xhtml/vocab#twitter:data1 30.00 ;
http://www.w3.org/1999/xhtml/vocab#twitter:label1 Instant Rebate ;

Re: Unable to connect to any23 svn repo

2014-12-03 Thread Lewis John Mcgibbney
Hi Jaikit,

On Wed, Dec 3, 2014 at 7:48 PM, user-digest-h...@any23.apache.org wrote:


 Team,

 Since today afternoon we are unable to resolve any23 snv repo and hence
 our builds are failing. Does anyone have any workaround or solution for
 this ?

  Could not transfer artifact 
 org.apache.commons:commons-csv:pom:1.0-SNAPSHOT-rev1148315 from/to 
 any23-repository-external (http://svn.apache.org/repos/asf/any23/repo-ext/): 
 Connect to svn.apache.org:80 [svn.apache.org/140.211.11.4] failed: Connection 
 refused - [Help 1]


 Appreciate any help.


 There was a failure of the SVN master node on the Apache infrastructure
earlier today [0].
In the meantime I would hack your pom.xml and use the scopesystem/scope
and systemPath nodes for the dependency.
An example can be seen here
https://github.com/maestros/gora-oraclenosql/blob/master/gora-oracle/pom.xml#L120

Hope this helps
Lewis

[0] https://twitter.com/infrabot/status/540192293572337664


[VOTE] Release Apache Any23 1.1

2014-10-16 Thread Lewis John Mcgibbney
Evening Folks,

I would like to open a VOTE on the following Apache Any23 1.1 release
candidate.

We solved a number of issues which can be seen in our release report:
http://s.apache.org/any231.1

Git source tag signature (4d5a022f71d2199c2d2cf83f4c51397249973052):
http://s.apache.org/any231.1tag

Staging repo:
https://repository.apache.org/content/repositories/orgapacheany23-1001/

Staging binaries:
https://dist.apache.org/repos/dist/dev/any23/1.1/

PGP release keys (signed using 48BAEBF6 Lewis John McGibbney (CODE SIGNING
KEY) lewi...@apache.org):
http://any23.apache.org/dist/KEYS

I would like to say thank you to everyone that contributed to this minor
release of Any23.
Vote will be open for 72 hours.

[ ] +1, let's get it rmblee!!!
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)


p.s. here is my +1

-- 
*Lewis*


Re: Extracting Blank Nodes instead of IRIs

2014-07-14 Thread Lewis John Mcgibbney
Hi Bianca,

On Mon, Jul 14, 2014 at 5:55 AM, user-digest-h...@any23.apache.org wrote:

  In order to reproduce this specific case I used the following commands:

   wget http://www.imdb.com/title/tt0286560/?ref_=fn_al_tt_4

   ./apache-any23-core-1.0/bin/rover  -f ntriples -o
 index.html?ref_=fn_al_tt_4.nt  index.html?ref_=fn_al_tt_4


OK I will spoke this later today. Thanks for the example this time around.


 I tried to look into another website (Rotten Tomatoes) and I found the
 same pattern.

  Again, IMHO, the url could be used as the subject of the triples. I am
 not sure if it is valid for all triples in all websites but in those
 examples it seems to work fine. Here goes one example from the webpage
 http://www.rottentomatoes.com/m/sex_tape_2014/


OK, I doubt I would be able to navigate to that URL whilst on my work
laptop ;)
However, I wonder if you have discovered the XPATH extractor?
http://any23.apache.org/apidocs/org/apache/any23/extractor/xpath/XPathExtractor.html
This is marked as experimental but might do the trick for you if this is
something that needs to be addressed for your ongoing work.




 I don't know if it is a better way or not. Actually I was hoping that
 someone could tell me if it is a reasonable idea or not =) As it is the
 first time I really work with data which is not already in triples format.


Yeah unfortunately this is the real life scenario and we need to accept
that not all data is going to be in the form we want or need. There is
sometimes preprocessing required before the data can get to your target
requirements. I am therefore interested in hearing how we can address this
one.
For me, having a subject value referenced as a blank node e.g.
node_0974638e093e23 (or something similar) is difficult to both interpret
and relate to unless it can be visualized within the web page.
We 'can' do this type of thing with Any23 but from what I can see, you are
using the command line tools and not the Java API directly.




 Sorry my ignorance but I don't know which extractor was used =/ I just
 used the rover asking the format to be given in ntriples. How can I know
 which extractor was used?


Well a quick and easy way to do this would be to navigate to
http://any23.org and run an extraction with validation and fixing set to
true, this will generate a report with the extractors which have been used.
Although this doesn't repost the exact extractor, it will save you time in
narrowing down which one was used. You will most likely need to step
through code in a debugger to see which extractor extracted which triple.

Thanks
Lewis


[RESULT] WAS Re: [VOTE] Apache Any23 Release 1.0

2014-05-16 Thread Lewis John Mcgibbney
Hi Folks,

OK so 72 hours has come and gone.

Please see below for VOTE'ing stats:

[6] +1, let's get it rmblee!!!
Davide Palmisano
Michele Mostarda
Szymon Danielczyk
Andy Seaborne
Tommaso Teofili
Lewis John McGibbney

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

I am really happy to say that the VOTE has passed ;)
Thank you to everyone that was able to VOTE it is excellent to see so many
people making the bill.
I'll progress with remainder of the release process and make relevant
announcements.
Thank you everyone
Lewis

On Tue, May 13, 2014 at 11:33 AM, Lewis John Mcgibbney 
lewis.mcgibb...@gmail.com wrote:

 Hi Folks,

 I apologize for taking a while to get this release candidate together. I
 found some bus in Maven plugins and it took a while to debug where they
 were and which aspects of the release procedure they affected. Now however
 it is fixed so that is all good. Please VOTE below on releasing Apache
 Any23 1.0 (notice the jump from 0.9.1 -- 1.0). This will represent a MAJOR
 release for Any23. ...



Re: [ANNOUNCE] Apache Any23 1.0 Release

2014-05-16 Thread Lewis John Mcgibbney
Hi Greg,

Thanks for your email.
Please see my replies inline.

On Fri, May 16, 2014 at 2:51 PM, Gregg Kellogg gr...@greggkellogg.netwrote:

 Do you ever expect to fully support RDFa 1.1?


Yeah we do. Eventually someone will get around to it. If this is an itch
you have, we would very much encourage you to itch it and push it back into
the Any23 codebase ;)


 Look like it's just RDFa 1.0 with prefix support, even that fails many
 tests.


Based on which version of Any23? Have you tried this new release or are you
just stating this based on your previous experience/assumptions?
Please refer to thhe following ticket for a new RDFa parser proposal which
can be considered as the flagship issue of the 1.0 release.
https://issues.apache.org/jira/browse/ANY23-137


I think in the past, we've suggested using alternative Java RDFa
 implementations; both Semargl [1] and clj-rdfa [2] are quite performant
 (substantially faster than the any23 endpoint [3]) and fully conforms to
 the spec.


The end point you refer to has recently been overhauled and will be updated
with nightly SNAPSHOT's of stable Any23 code from now on. I've just jumped
on to leading this initiative so please bare with us until our scripts pull
the next build some time tonight.



 Running the basic test suite[4] shows you passing only 58 out of 170 tests.


Again, based on what version of Any23?
We are currently at 1.0. I would however really like to work with whoever
runs this service to improve the stats as much as possible... once I know
what the stats are actually quantifying and representing.

Thanks
Lewis


Re: # of Triples extracted available via reporting?

2014-03-30 Thread Lewis John Mcgibbney
Hi S.L.,

On Sun, Mar 30, 2014 at 11:19 PM, user-digest-h...@any23.apache.org wrote:

 Thanks for sharing this information with us Lewis, earlier I was using the
 embedded server and it was giving me the number of triples extracted on the
 console for each URL



No probs :)
If you're _just_ looking for triples extracted then use the relevant one
above. It does just the job.
Ta
Lewis


Re: [VOTE] Release Apache ANY23 0.9.0

2013-11-02 Thread Lewis John Mcgibbney
p.s. here's my +1 :)


On Mon, Oct 28, 2013 at 10:49 PM, Lewis John Mcgibbney 
lewis.mcgibb...@gmail.com wrote:

 Hi Everyone,
 (Hi dev@tika, I hope you don't mind me cross-pollinating this thread)

 This thread is a formal VOTE to release Apache Any23 0.9.0.

 In this release cycle we solved 11 issues:
 http://s.apache.org/6l1

 Git source tag (86daaf897513efce88573c10e06b71d5eb36ad1a):
 http://s.apache.org/ORc

 Staging repo:
 https://repository.apache.org/content/repositories/orgapacheany23-041/

 Staging binaries and maven artifacts:
 https://dist.apache.org/repos/dist/dev/any23/

 PGP release keys (all artifacts signed using 48BAEBF6 - lewismc):
 http://any23.apache.org/dist/KEYS

 This VOTE will be open for 72 hours.
 Thank you in advance to everyone able to review.

 Best

 Lewis

 [ ] +1, let's ship it :)
 [ ] +/-0, fine, but consider to fix few issues before...
 [ ] -1, nope, because... (and please explain why)

 --
 *Lewis*




-- 
*Lewis*


Re: user Digest 20 Oct 2013 15:17:20 -0000 Issue 48

2013-10-22 Thread Lewis John Mcgibbney
On Sun, Oct 20, 2013 at 4:17 PM, user-digest-h...@any23.apache.org wrote:


 user Digest 20 Oct 2013 15:17:20 - Issue 48

 Topics (messages 110 through 111)

 Re: URL Encoding Issues in Apache Any23
 110 by: S.L

 Foaf:Depiction Values in Any23 getting transformed.
 111 by: S.L

 Administrivia:

 -
 To post to the list, e-mail: user@any23.apache.org
 To unsubscribe, e-mail: user-digest-unsubscr...@any23.apache.org
 For additional commands, e-mail: user-digest-h...@any23.apache.org

 --



 Lewis,

 That is correct , that is the only discrepancy that I have noticed so far
 , I think whats happening here is that any23 is encoding an already encoded
 URL , I have not found a way to avoid that in Java i.e avoid encoding an
 already encoded URL. Is there a way to do so ? Does any23 consider the
 possibility of the URL being already encoded ?

 Thanks.


 On Wed, Oct 2, 2013 at 8:39 PM, Lewis John Mcgibbney 
 lewis.mcgibb...@gmail.com wrote:

 Hi,

 On Sun, Sep 29, 2013 at 6:44 PM, user-digest-h...@any23.apache.orgwrote:


 I seem to be running into issues where the URL that is submitted to
 Any23 is being encoded in a format that is causing the URL to become
 invalid , I am not sure if the URL that is being encoded was already
 encoded by Any23 or if Any23 just encoded the URL in a wrong format.


 From what I can see below the 2nd (encoded) URL includes hash # as the
 only difference. Is this correct? Are there any other discrepancies which
 you've noticed.
 I checked out Jira instance and nothing like this has been reported
 before.
 Thanks
 Lewis



 Please see the example below and advise.

 URL (submitted to Any23 i.e before encoding happens) :


 http://www.xxx.com/site/searchpage.jsp?_dyncharset=ISO-8859-1id=pcat17071type=pageks=960st=Just_Dance_Disney_Party_67055sc=Globalcp=1sp=qp=crootcategoryid%23%23-1%23%23-1~~q4a7573745f44616e63655f4469736e65795f50617274795f3637303535~~ncabcat070%23%231%23%231list=yusc=All+Categoriesnrp=15iht=n

 URL After Encoding ( I know this by printing the URL from
 DefaultHttpCleint.java):


 http://www.xxx.com/site/searchpage.jsp?_dyncharset=ISO-8859-1id=pcat17071type=pageks=960st=Just_Dance_Disney_Party_67055sc=Globalcp=1sp=qp=crootcategoryid#%23-1%23%23-1~~q4a7573745f44616e63655f4469736e65795f50617274795f3637303535~~ncabcat070%23%231%23%231list=yusc=All%20Categoriesnrp=15iht=n





 --
 *Lewis*



 I am parsing the below URL and Iam interested in the foaf:depiction


 http://www.kmart.com/canon-eos-rebel-t3i-18-55mm-is-ii/p-00339693000P?prdNo=1blockNo=1blockType=G1

 The foaf:depcition I get is the following


 http://s.shld.net/is/image/Sears/http://c.shld.net/rpx/i/s/i/spin/image/spin_prod_ec_463043901?hei=wid=op_sharpen=resMode=sharpop_usm=0.9,0.5,0,0

 However from the HTML markup the foaf:depiction is


 http://s.shld.net/is/image/Sears/http://c.shld.net/rpx/i/s/i/spin/image/spin_prod_ec_463043901?hei=amp;wid=amp;op_sharpen=amp;resMode=sharpamp;op_usm=0.9,0.5,0,0http://s.shld.net/is/image/Sears/http://c.shld.net/rpx/i/s/i/spin/image/spin_prod_ec_463043901?hei=wid=op_sharpen=resMode=sharpop_usm=0.9,0.5,0,0

 Looks like the foafdepiction URL is getting transformed in the Any23
 parsing , I am using the latest Any23 trunk code .

 The  is getting replaced with  amp;

 Please advise.




-- 
*Lewis*


Re: Status of BUG ANY23-115

2013-08-26 Thread Lewis John Mcgibbney
Hi,

On Mon, Aug 26, 2013 at 10:51 AM, user-digest-h...@any23.apache.org wrote:


 1. Remove  any white space characters after first one.
 2. If there is a tab or a series of tabs convert it to a single white
 space character as well ?


I pushed a fix for the issue in ANY23-115 yesterday. The fix basically
fixed the MicrodataParser breaking on empty spans. The other issue we
noticed however, regarding replacing tab's and unduly white space, etc
would also be another improvement.



 Please also let me know who I can build the embedded server tar file from
 the modified code , is there a script for that ?


You can simply run mvn clean install from CLI. This will package the server
into server/target you can then grab whichever artifact suits you and you
will be good to go. Please let us know how you get on.



 Also there are two folders in SVN under trunk core and service , which
 folder should I check out for me to be able to use it as an embedded server
 to test out?


Don't use SVN anymore for Any23. We switched to git.
You can checkout the git source from here https://github.com/apache/any23



 Can you please answer my questions ? I can start working on it today.


I apologies for the  delay. I get the message through as a batch digest. As
the list is not so busy these days the delay can sometimes be days. Sorry.
Thanks for your persistence on this one.
Lewis


Re: Status of BUG ANY23-115

2013-08-23 Thread Lewis John Mcgibbney
Hi,

On Fri, Aug 23, 2013 at 11:53 AM, user-digest-h...@any23.apache.org wrote:

 Lewis,

 Looks like you put in a patch for this Bug and rolled back ,


Yep my fix was not the correct one and IIRC removed all spaces and tab's.
This was not desired at all and broke tests as well!


 you also commented on JIRA that you want to only replace spaces after the
 first space and need a regex pattern for that . Is that the only reason you
 rolled the changes back for this bug?


Yes this is all that should need to be done here.


 If that's not the case ,can you please let me know what changes need to be
 made and I ll be more than happy to do that . I need some guidance from you
 with regards to what changes need to be made.


If you are able to fork the code and send a pull request then I will make
best efforts to review, test and commit ASAP. I am keen to get this one
sorted out.



 Another unrelated question with regards to the 0.8.0 distribution is that
 there is no embedded server distributed anymore or atleast not in 0.8.0 ,
 why has this been removed. It was a handy jar that you can just untar and
 run out of the box.


 This was an error or my part as release manager for the 0.8.0 release.
We will certainly be releasing the above artifacts in the next version of
Apache Any23. I agree with what you are saying and it is a shame that we
did not release the artifact.

Thanks
Lewis


Re: Status of BUG ANY23-115

2013-07-21 Thread Lewis John Mcgibbney
Hi,

On Sat, Jul 20, 2013 at 9:53 PM, user-digest-h...@any23.apache.org wrote:


 Unfortunately I do not have the familiarity with the code to submit a
 patch , can you please give me a few pointers, is this a non-trivial task
 which requires a significant amount of dev and that is why is being
 postponed and not addressed ?.


Well you can check the Any23 code out from
https://git-wip-us.apache.org/repos/asf/any23-committers.git
This will let you play around with it.
IIRC, this particular problem stems from the parsing/extraction of
microdata from (XHTML pages.
There are actually a number of (suspiciously similar) issues open in the
Jira tracker
https://issues.apache.org/jira/browse/ANY23-154
https://issues.apache.org/jira/browse/ANY23-111
https://issues.apache.org/jira/browse/ANY23-115
https://issues.apache.org/jira/browse/ANY23-131

The problem is that people come and go and in most cases as you can see the
commentary, enabling us to reproduce the bugs is dribble.



 This looks like a critical piece of functionality to me , I am not sure
 what other uses cases Any23 addresses if parsing schema.org fails, can
 you please enlighten me ?

 I agree with you here. I use Any23 purely for RDFa, RDF/XML stuff. I am
not bothered with Microdata right now.

If you are keen, then I am certainly keen to work on this with you. It
would be nice to clear it up as it is annoying me now.
Thanks


[RESULT] WAS Re: [VOTE] Apache Any23 0.8.0 RC#2

2013-06-25 Thread Lewis John Mcgibbney
Hi All,

OK I'm going to close this thread as we have achieved a binding quorum for
the VOTE'ing from Any23 PMC members.

The VOTE'es tally as follows

[3] +1, let's release
Chris Mattmann
Simone Tripodi
Lewis John McGibbney

[0] +/-0, fine, but consider to fix few issues before...
[0] -1, nope, because... (and please explain why)

Which means I am very very glad to say that we are able to release for the
first time since Apache Any23 graduated from the Apache Incubator.
I will deal with pushing the release tonight.
Thank you very much to those that  reviewed and VOTE'ed this time around
and in times past with RC#1. It's not exactly been an easy journey to get
here but persistence has paid off and we can now release the improved code
for others to use.

Thank you

Best
Lewis

On Fri, Jun 21, 2013 at 12:43 PM, Lewis John Mcgibbney 
lewis.mcgibb...@gmail.com wrote:

 Hi All,





[VOTE] Apache Any23 0.8.0 RC#2

2013-06-21 Thread Lewis John Mcgibbney
Hi All,

OK after a bit of trouble, we are getting back on track here. Please review
the release candidate and provide comments/VOTE's based on what is below.

Issues resolved within 0.8.0:
http://s.apache.org/iO

Git source tag (220224f5f06106c983d92b27d19c648d78335fd4):
http://s.apache.org/el

Staging repo:
https://repository.apache.org/content/repositories/orgapacheany23-028/

Staging binaries:
http://people.apache.org/~lewismc/any23/0.8.0

PGP release keys (signed using BEF70CB4):
http://www.apache.org/dist/incubator/any23/KEYShttp://any23.apache.org/dist/KEYS

Please double verify the signatures. I have admittedly been dancing between
workplaces recently, I am trying to retain the one key but I may have made
a mistake.

I have not produced a staging site as the release procedure has been
tweaked recently. I missed the boat on this one and will need to push the
new site once the VOTE is closed.

Vote will be open for at least 72 hours.

[ ] +1, let's release
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)

-- 
*Lewis*