Re: Status of BUG ANY23-115
Hi, On Mon, Aug 26, 2013 at 10:51 AM, user-digest-h...@any23.apache.org wrote: 1. Remove any white space characters after first one. 2. If there is a tab or a series of tabs convert it to a single white space character as well ? I pushed a fix for the issue in ANY23-115 yesterday. The fix basically fixed the MicrodataParser breaking on empty spans. The other issue we noticed however, regarding replacing tab's and unduly white space, etc would also be another improvement. Please also let me know who I can build the embedded server tar file from the modified code , is there a script for that ? You can simply run mvn clean install from CLI. This will package the server into server/target you can then grab whichever artifact suits you and you will be good to go. Please let us know how you get on. Also there are two folders in SVN under trunk core and service , which folder should I check out for me to be able to use it as an embedded server to test out? Don't use SVN anymore for Any23. We switched to git. You can checkout the git source from here https://github.com/apache/any23 Can you please answer my questions ? I can start working on it today. I apologies for the delay. I get the message through as a batch digest. As the list is not so busy these days the delay can sometimes be days. Sorry. Thanks for your persistence on this one. Lewis
Re: Status of BUG ANY23-115
Lewis, Looks like you put in a patch for this Bug and rolled back , you also commented on JIRA that you want to only replace spaces after the first space and need a regex pattern for that . Is that the only reason you rolled the changes back for this bug? If that's not the case ,can you please let me know what changes need to be made and I ll be more than happy to do that . I need some guidance from you with regards to what changes need to be made. Another unrelated question with regards to the 0.8.0 distribution is that there is no embedded server distributed anymore or atleast not in 0.8.0 , why has this been removed. It was a handy jar that you can just untar and run out of the box. Thanks. On Tue, Jul 16, 2013 at 12:52 PM, S.L simpleliving...@gmail.com wrote: Just wondering if the latest GA candidate for Apache Any23 has the Bug ANY23-115 https://issues.apache.org/jira/browse/ANY23-115 fixed in it , this is a major flaw , because many pages are not being parsed at all because of this. Thanks.
Re: Status of BUG ANY23-115
Hi, On Fri, Aug 23, 2013 at 11:53 AM, user-digest-h...@any23.apache.org wrote: Lewis, Looks like you put in a patch for this Bug and rolled back , Yep my fix was not the correct one and IIRC removed all spaces and tab's. This was not desired at all and broke tests as well! you also commented on JIRA that you want to only replace spaces after the first space and need a regex pattern for that . Is that the only reason you rolled the changes back for this bug? Yes this is all that should need to be done here. If that's not the case ,can you please let me know what changes need to be made and I ll be more than happy to do that . I need some guidance from you with regards to what changes need to be made. If you are able to fork the code and send a pull request then I will make best efforts to review, test and commit ASAP. I am keen to get this one sorted out. Another unrelated question with regards to the 0.8.0 distribution is that there is no embedded server distributed anymore or atleast not in 0.8.0 , why has this been removed. It was a handy jar that you can just untar and run out of the box. This was an error or my part as release manager for the 0.8.0 release. We will certainly be releasing the above artifacts in the next version of Apache Any23. I agree with what you are saying and it is a shame that we did not release the artifact. Thanks Lewis
Re: Status of BUG ANY23-115
Hi, On Sat, Jul 20, 2013 at 9:53 PM, user-digest-h...@any23.apache.org wrote: Unfortunately I do not have the familiarity with the code to submit a patch , can you please give me a few pointers, is this a non-trivial task which requires a significant amount of dev and that is why is being postponed and not addressed ?. Well you can check the Any23 code out from https://git-wip-us.apache.org/repos/asf/any23-committers.git This will let you play around with it. IIRC, this particular problem stems from the parsing/extraction of microdata from (XHTML pages. There are actually a number of (suspiciously similar) issues open in the Jira tracker https://issues.apache.org/jira/browse/ANY23-154 https://issues.apache.org/jira/browse/ANY23-111 https://issues.apache.org/jira/browse/ANY23-115 https://issues.apache.org/jira/browse/ANY23-131 The problem is that people come and go and in most cases as you can see the commentary, enabling us to reproduce the bugs is dribble. This looks like a critical piece of functionality to me , I am not sure what other uses cases Any23 addresses if parsing schema.org fails, can you please enlighten me ? I agree with you here. I use Any23 purely for RDFa, RDF/XML stuff. I am not bothered with Microdata right now. If you are keen, then I am certainly keen to work on this with you. It would be nice to clear it up as it is annoying me now. Thanks