Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)
That would be really great. As I mentioned before, I am using the CogVM since its release and it is pretty stable (with the exception of crashes due to this socket problem). Is there a place to report possible bugs related to it, or is this mailing list the most appropriate place? Cheers, Doru On 19 Aug 2010, at 01:26, John M McIntosh wrote: I will try to push a CogVM for the mac this weekend, Eliot and I are planing some time then to get this out the door. On 2010-08-18, at 2:05 PM, stephane ducasse wrote: no CogVM is not ready for us. -- = = = = = == John M. McIntosh john...@smalltalkconsulting.com Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http:// www.smalltalkconsulting.com = = = = = == ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project -- www.tudorgirba.com Speaking louder won't make the point worthier. ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)
On Thu, Aug 19, 2010 at 8:14 AM, Tudor Girba tudor.gi...@gmail.com wrote: That would be really great. As I mentioned before, I am using the CogVM since its release and it is pretty stable (with the exception of crashes due to this socket problem). The socket problem was fixed by henrik. Is there a place to report possible bugs related to it, or is this mailing list the most appropriate place? Eliot said vm-...@lists.squeakfoundation.org was better Cheers, Doru On 19 Aug 2010, at 01:26, John M McIntosh wrote: I will try to push a CogVM for the mac this weekend, Eliot and I are planing some time then to get this out the door. On 2010-08-18, at 2:05 PM, stephane ducasse wrote: no CogVM is not ready for us. -- === John M. McIntosh john...@smalltalkconsulting.com Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com === ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project -- www.tudorgirba.com Speaking louder won't make the point worthier. ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)
Web page scraping. XML parser chokes on bad html input. On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont laurent.laff...@gmail.com wrote: On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov lisper...@gmail.com wrote: I've been looking for a nice and fast HTML parser. I've found Zulq Alam's Soup (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice but it's way too slow for me (takes 5 sec to parse the page, my current lisp parser takes about 1 sec for that.) I found another one, Todd Blanchard's HTML and CSS parser (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I couldn't load it into Pharo 1.1 or Squeak 4.1. It complains about some syntax error and leaves the progress bar which I can't kill... I wonder if anyone (Todd?) can take a look at the parser and figure out how to fix it? What other options I have for an HTML parser? Looking at Pharo speed I wonder if there is any way to optimize it? Is JIT or some other speed optimization in plans for Pharo/Squeak? What do you need to do ? There's XMLSupport http://www.squeaksource.com/XMLSupport.html Scamper might have a standalone HTML parser http://www.squeaksource.com/Scamper.html The CogVM has JIT. Laurent. Thank you, Andrei ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)
On 18 August 2010 16:01, Andrei Stebakov lisper...@gmail.com wrote: Web page scraping. XML parser chokes on bad html input. How about using Selenium: http://pharocasts.blogspot.com/2010/08/web-application-testing-through.html ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)
I tried to load Scamper's Network-HTML, I got a Syntax Error during reloading: HtmlTokenizer private-initialization initialize: initialize: s text _ s withSqueakLineEndings. pos _ Nothing more expected -1. textAreaLevel _ 0. On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont laurent.laff...@gmail.com wrote: On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov lisper...@gmail.com wrote: I've been looking for a nice and fast HTML parser. I've found Zulq Alam's Soup (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice but it's way too slow for me (takes 5 sec to parse the page, my current lisp parser takes about 1 sec for that.) I found another one, Todd Blanchard's HTML and CSS parser (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I couldn't load it into Pharo 1.1 or Squeak 4.1. It complains about some syntax error and leaves the progress bar which I can't kill... I wonder if anyone (Todd?) can take a look at the parser and figure out how to fix it? What other options I have for an HTML parser? Looking at Pharo speed I wonder if there is any way to optimize it? Is JIT or some other speed optimization in plans for Pharo/Squeak? What do you need to do ? There's XMLSupport http://www.squeaksource.com/XMLSupport.html Scamper might have a standalone HTML parser http://www.squeaksource.com/Scamper.html The CogVM has JIT. Laurent. Thank you, Andrei ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)
On Wed, Aug 18, 2010 at 5:55 PM, Andrei Stebakov lisper...@gmail.comwrote: I tried to load Scamper's Network-HTML, I got a Syntax Error during reloading: HtmlTokenizer private-initialization initialize: initialize: s text _ s withSqueakLineEndings. pos _ Nothing more expected -1. textAreaLevel _ 0. That code is using underscore as assigment, don't allowed anymore in Pharo 1.1 unless you explicity set a specific setting. Soor set that setting or update the code (in another image) cheers mariano On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont laurent.laff...@gmail.com wrote: On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov lisper...@gmail.com wrote: I've been looking for a nice and fast HTML parser. I've found Zulq Alam's Soup (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice but it's way too slow for me (takes 5 sec to parse the page, my current lisp parser takes about 1 sec for that.) I found another one, Todd Blanchard's HTML and CSS parser (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I couldn't load it into Pharo 1.1 or Squeak 4.1. It complains about some syntax error and leaves the progress bar which I can't kill... I wonder if anyone (Todd?) can take a look at the parser and figure out how to fix it? What other options I have for an HTML parser? Looking at Pharo speed I wonder if there is any way to optimize it? Is JIT or some other speed optimization in plans for Pharo/Squeak? What do you need to do ? There's XMLSupport http://www.squeaksource.com/XMLSupport.html Scamper might have a standalone HTML parser http://www.squeaksource.com/Scamper.html The CogVM has JIT. Laurent. Thank you, Andrei ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)
Where can I make this setting? On Wed, Aug 18, 2010 at 1:14 PM, Mariano Martinez Peck marianop...@gmail.com wrote: On Wed, Aug 18, 2010 at 5:55 PM, Andrei Stebakov lisper...@gmail.com wrote: I tried to load Scamper's Network-HTML, I got a Syntax Error during reloading: HtmlTokenizer private-initialization initialize: initialize: s text _ s withSqueakLineEndings. pos _ Nothing more expected -1. textAreaLevel _ 0. That code is using underscore as assigment, don't allowed anymore in Pharo 1.1 unless you explicity set a specific setting. Soor set that setting or update the code (in another image) cheers mariano On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont laurent.laff...@gmail.com wrote: On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov lisper...@gmail.com wrote: I've been looking for a nice and fast HTML parser. I've found Zulq Alam's Soup (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice but it's way too slow for me (takes 5 sec to parse the page, my current lisp parser takes about 1 sec for that.) I found another one, Todd Blanchard's HTML and CSS parser (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I couldn't load it into Pharo 1.1 or Squeak 4.1. It complains about some syntax error and leaves the progress bar which I can't kill... I wonder if anyone (Todd?) can take a look at the parser and figure out how to fix it? What other options I have for an HTML parser? Looking at Pharo speed I wonder if there is any way to optimize it? Is JIT or some other speed optimization in plans for Pharo/Squeak? What do you need to do ? There's XMLSupport http://www.squeaksource.com/XMLSupport.html Scamper might have a standalone HTML parser http://www.squeaksource.com/Scamper.html The CogVM has JIT. Laurent. Thank you, Andrei ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)
On Wed, Aug 18, 2010 at 7:34 PM, Andrei Stebakov lisper...@gmail.comwrote: Where can I make this setting? In Pharo, go to System - Settings. In the search for type underscore and hit enter. You will see a setting that says allow underscore as assigment On Wed, Aug 18, 2010 at 1:14 PM, Mariano Martinez Peck marianop...@gmail.com wrote: On Wed, Aug 18, 2010 at 5:55 PM, Andrei Stebakov lisper...@gmail.com wrote: I tried to load Scamper's Network-HTML, I got a Syntax Error during reloading: HtmlTokenizer private-initialization initialize: initialize: s text _ s withSqueakLineEndings. pos _ Nothing more expected -1. textAreaLevel _ 0. That code is using underscore as assigment, don't allowed anymore in Pharo 1.1 unless you explicity set a specific setting. Soor set that setting or update the code (in another image) cheers mariano On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont laurent.laff...@gmail.com wrote: On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov lisper...@gmail.com wrote: I've been looking for a nice and fast HTML parser. I've found Zulq Alam's Soup (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice but it's way too slow for me (takes 5 sec to parse the page, my current lisp parser takes about 1 sec for that.) I found another one, Todd Blanchard's HTML and CSS parser (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I couldn't load it into Pharo 1.1 or Squeak 4.1. It complains about some syntax error and leaves the progress bar which I can't kill... I wonder if anyone (Todd?) can take a look at the parser and figure out how to fix it? What other options I have for an HTML parser? Looking at Pharo speed I wonder if there is any way to optimize it? Is JIT or some other speed optimization in plans for Pharo/Squeak? What do you need to do ? There's XMLSupport http://www.squeaksource.com/XMLSupport.html Scamper might have a standalone HTML parser http://www.squeaksource.com/Scamper.html The CogVM has JIT. Laurent. Thank you, Andrei ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)
Is there a one-click image for CogVM somewhere so I can download it? On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont laurent.laff...@gmail.com wrote: On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov lisper...@gmail.com wrote: I've been looking for a nice and fast HTML parser. I've found Zulq Alam's Soup (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice but it's way too slow for me (takes 5 sec to parse the page, my current lisp parser takes about 1 sec for that.) I found another one, Todd Blanchard's HTML and CSS parser (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I couldn't load it into Pharo 1.1 or Squeak 4.1. It complains about some syntax error and leaves the progress bar which I can't kill... I wonder if anyone (Todd?) can take a look at the parser and figure out how to fix it? What other options I have for an HTML parser? Looking at Pharo speed I wonder if there is any way to optimize it? Is JIT or some other speed optimization in plans for Pharo/Squeak? What do you need to do ? There's XMLSupport http://www.squeaksource.com/XMLSupport.html Scamper might have a standalone HTML parser http://www.squeaksource.com/Scamper.html The CogVM has JIT. Laurent. Thank you, Andrei ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)
As for Scamper when I try to evaluate (in Pharo 1.1) tok := HtmlTokenizer on: 'html /'. There is an error: Error: My subclass should have overridden #contents Proceed Abandon Debug HtmlTokenizer(Object)error: HtmlTokenizer(Object)subclassResponsibility HtmlTokenizer(Stream)contents HtmlTokenizer(Stream)printOn: [] in HtmlTokenizer(Object)printStringLimitedTo: String class(SequenceableCollection class)streamContents:limitedTo: HtmlTokenizer(Object)printStringLimitedTo: HtmlTokenizer(Object)printString TextMorphForShoutEditor(ParagraphEditor)printIt [] in TextMorphForShoutEditor(ParagraphEditor)printIt: TextMorphForShoutEditor(ParagraphEditor)terminateAndInitializeAround: TextMorphForShoutEditor(ParagraphEditor)printIt: TextMorphForShoutEditor(ParagraphEditor)dispatchOnKeyEvent:with: TextMorphForShoutEditor(TextMorphEditor)dispatchOnKeyEvent:with: TextMorphForShoutEditor(ParagraphEditor)keystroke: TextMorphForShoutEditor(TextMorphEditor)keystroke: [] in [] in TextMorphForShout(TextMorph)keyStroke: TextMorphForShout(TextMorph)handleInteraction: TextMorphForShout(TextMorphForEditView)handleInteraction: [] in TextMorphForShout(TextMorph)keyStroke: On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont laurent.laff...@gmail.com wrote: On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov lisper...@gmail.com wrote: I've been looking for a nice and fast HTML parser. I've found Zulq Alam's Soup (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice but it's way too slow for me (takes 5 sec to parse the page, my current lisp parser takes about 1 sec for that.) I found another one, Todd Blanchard's HTML and CSS parser (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I couldn't load it into Pharo 1.1 or Squeak 4.1. It complains about some syntax error and leaves the progress bar which I can't kill... I wonder if anyone (Todd?) can take a look at the parser and figure out how to fix it? What other options I have for an HTML parser? Looking at Pharo speed I wonder if there is any way to optimize it? Is JIT or some other speed optimization in plans for Pharo/Squeak? What do you need to do ? There's XMLSupport http://www.squeaksource.com/XMLSupport.html Scamper might have a standalone HTML parser http://www.squeaksource.com/Scamper.html The CogVM has JIT. Laurent. Thank you, Andrei ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)
I am sorry, this error only happens when I try to print it instead of do it. On Wed, Aug 18, 2010 at 2:10 PM, Andrei Stebakov lisper...@gmail.com wrote: As for Scamper when I try to evaluate (in Pharo 1.1) tok := HtmlTokenizer on: 'html /'. There is an error: Error: My subclass should have overridden #contents Proceed Abandon Debug HtmlTokenizer(Object)error: HtmlTokenizer(Object)subclassResponsibility HtmlTokenizer(Stream)contents HtmlTokenizer(Stream)printOn: [] in HtmlTokenizer(Object)printStringLimitedTo: String class(SequenceableCollection class)streamContents:limitedTo: HtmlTokenizer(Object)printStringLimitedTo: HtmlTokenizer(Object)printString TextMorphForShoutEditor(ParagraphEditor)printIt [] in TextMorphForShoutEditor(ParagraphEditor)printIt: TextMorphForShoutEditor(ParagraphEditor)terminateAndInitializeAround: TextMorphForShoutEditor(ParagraphEditor)printIt: TextMorphForShoutEditor(ParagraphEditor)dispatchOnKeyEvent:with: TextMorphForShoutEditor(TextMorphEditor)dispatchOnKeyEvent:with: TextMorphForShoutEditor(ParagraphEditor)keystroke: TextMorphForShoutEditor(TextMorphEditor)keystroke: [] in [] in TextMorphForShout(TextMorph)keyStroke: TextMorphForShout(TextMorph)handleInteraction: TextMorphForShout(TextMorphForEditView)handleInteraction: [] in TextMorphForShout(TextMorph)keyStroke: On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont laurent.laff...@gmail.com wrote: On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov lisper...@gmail.com wrote: I've been looking for a nice and fast HTML parser. I've found Zulq Alam's Soup (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice but it's way too slow for me (takes 5 sec to parse the page, my current lisp parser takes about 1 sec for that.) I found another one, Todd Blanchard's HTML and CSS parser (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I couldn't load it into Pharo 1.1 or Squeak 4.1. It complains about some syntax error and leaves the progress bar which I can't kill... I wonder if anyone (Todd?) can take a look at the parser and figure out how to fix it? What other options I have for an HTML parser? Looking at Pharo speed I wonder if there is any way to optimize it? Is JIT or some other speed optimization in plans for Pharo/Squeak? What do you need to do ? There's XMLSupport http://www.squeaksource.com/XMLSupport.html Scamper might have a standalone HTML parser http://www.squeaksource.com/Scamper.html The CogVM has JIT. Laurent. Thank you, Andrei ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)
On Wed, Aug 18, 2010 at 7:48 PM, Andrei Stebakov lisper...@gmail.comwrote: Is there a one-click image for CogVM somewhere so I can download it? It's planned but for now it seems you have to build it yourself. Laurent On Wed, Aug 18, 2010 at 2:34 AM, laurent laffont laurent.laff...@gmail.com wrote: On Wed, Aug 18, 2010 at 7:50 AM, Andrei Stebakov lisper...@gmail.com wrote: I've been looking for a nice and fast HTML parser. I've found Zulq Alam's Soup (http://www.squeaksource.com/@vHckXt8_6gVtXFxy/XMrjDbIs) it looks nice but it's way too slow for me (takes 5 sec to parse the page, my current lisp parser takes about 1 sec for that.) I found another one, Todd Blanchard's HTML and CSS parser (http://www.squeaksource.com/@iMgHmTKVxU00wEdz/A0jkqk71) but I couldn't load it into Pharo 1.1 or Squeak 4.1. It complains about some syntax error and leaves the progress bar which I can't kill... I wonder if anyone (Todd?) can take a look at the parser and figure out how to fix it? What other options I have for an HTML parser? Looking at Pharo speed I wonder if there is any way to optimize it? Is JIT or some other speed optimization in plans for Pharo/Squeak? What do you need to do ? There's XMLSupport http://www.squeaksource.com/XMLSupport.html Scamper might have a standalone HTML parser http://www.squeaksource.com/Scamper.html The CogVM has JIT. Laurent. Thank you, Andrei ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
Re: [Pharo-project] [squeak-dev] Re: HTML parser (again)
I will try to push a CogVM for the mac this weekend, Eliot and I are planing some time then to get this out the door. On 2010-08-18, at 2:05 PM, stephane ducasse wrote: no CogVM is not ready for us. -- === John M. McIntosh john...@smalltalkconsulting.com Twitter: squeaker68882 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com === ___ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project