Re: Status of BUG ANY23-115

2013-08-26 Thread Lewis John Mcgibbney
Hi,

On Mon, Aug 26, 2013 at 10:51 AM, user-digest-h...@any23.apache.org wrote:


 1. Remove  any white space characters after first one.
 2. If there is a tab or a series of tabs convert it to a single white
 space character as well ?


I pushed a fix for the issue in ANY23-115 yesterday. The fix basically
fixed the MicrodataParser breaking on empty spans. The other issue we
noticed however, regarding replacing tab's and unduly white space, etc
would also be another improvement.



 Please also let me know who I can build the embedded server tar file from
 the modified code , is there a script for that ?


You can simply run mvn clean install from CLI. This will package the server
into server/target you can then grab whichever artifact suits you and you
will be good to go. Please let us know how you get on.



 Also there are two folders in SVN under trunk core and service , which
 folder should I check out for me to be able to use it as an embedded server
 to test out?


Don't use SVN anymore for Any23. We switched to git.
You can checkout the git source from here https://github.com/apache/any23



 Can you please answer my questions ? I can start working on it today.


I apologies for the  delay. I get the message through as a batch digest. As
the list is not so busy these days the delay can sometimes be days. Sorry.
Thanks for your persistence on this one.
Lewis


Re: Status of BUG ANY23-115

2013-08-23 Thread S.L
Lewis,

Looks like you put in a patch for this Bug and rolled back , you also
commented on JIRA that you want to only replace spaces after the first
space and need a regex pattern for that . Is that the only reason you
rolled the changes back for this bug? If that's not the case ,can you
please let me know what changes need to be made and I ll be more than happy
to do that . I need some guidance from you with regards to what changes
need to be made.

Another unrelated question with regards to the 0.8.0 distribution is that
there is no embedded server distributed anymore or atleast not in 0.8.0 ,
why has this been removed. It was a handy jar that you can just untar and
run out of the box.

Thanks.


On Tue, Jul 16, 2013 at 12:52 PM, S.L simpleliving...@gmail.com wrote:

 Just wondering if the latest GA candidate for Apache Any23 has the Bug
 ANY23-115 https://issues.apache.org/jira/browse/ANY23-115 fixed in it ,
 this is a major flaw , because many pages are not being parsed at all
 because of this.


 Thanks.



Re: Status of BUG ANY23-115

2013-08-23 Thread Lewis John Mcgibbney
Hi,

On Fri, Aug 23, 2013 at 11:53 AM, user-digest-h...@any23.apache.org wrote:

 Lewis,

 Looks like you put in a patch for this Bug and rolled back ,


Yep my fix was not the correct one and IIRC removed all spaces and tab's.
This was not desired at all and broke tests as well!


 you also commented on JIRA that you want to only replace spaces after the
 first space and need a regex pattern for that . Is that the only reason you
 rolled the changes back for this bug?


Yes this is all that should need to be done here.


 If that's not the case ,can you please let me know what changes need to be
 made and I ll be more than happy to do that . I need some guidance from you
 with regards to what changes need to be made.


If you are able to fork the code and send a pull request then I will make
best efforts to review, test and commit ASAP. I am keen to get this one
sorted out.



 Another unrelated question with regards to the 0.8.0 distribution is that
 there is no embedded server distributed anymore or atleast not in 0.8.0 ,
 why has this been removed. It was a handy jar that you can just untar and
 run out of the box.


 This was an error or my part as release manager for the 0.8.0 release.
We will certainly be releasing the above artifacts in the next version of
Apache Any23. I agree with what you are saying and it is a shame that we
did not release the artifact.

Thanks
Lewis


Re: Status of BUG ANY23-115

2013-07-21 Thread Lewis John Mcgibbney
Hi,

On Sat, Jul 20, 2013 at 9:53 PM, user-digest-h...@any23.apache.org wrote:


 Unfortunately I do not have the familiarity with the code to submit a
 patch , can you please give me a few pointers, is this a non-trivial task
 which requires a significant amount of dev and that is why is being
 postponed and not addressed ?.


Well you can check the Any23 code out from
https://git-wip-us.apache.org/repos/asf/any23-committers.git
This will let you play around with it.
IIRC, this particular problem stems from the parsing/extraction of
microdata from (XHTML pages.
There are actually a number of (suspiciously similar) issues open in the
Jira tracker
https://issues.apache.org/jira/browse/ANY23-154
https://issues.apache.org/jira/browse/ANY23-111
https://issues.apache.org/jira/browse/ANY23-115
https://issues.apache.org/jira/browse/ANY23-131

The problem is that people come and go and in most cases as you can see the
commentary, enabling us to reproduce the bugs is dribble.



 This looks like a critical piece of functionality to me , I am not sure
 what other uses cases Any23 addresses if parsing schema.org fails, can
 you please enlighten me ?

 I agree with you here. I use Any23 purely for RDFa, RDF/XML stuff. I am
not bothered with Microdata right now.

If you are keen, then I am certainly keen to work on this with you. It
would be nice to clear it up as it is annoying me now.
Thanks