Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2020-01-08 Thread Ben Coman
On Wed, 8 Jan 2020 at 06:32, LawsonEnglish  wrote:

> “Simple inspect” works fine.
>
> THe trace is:
>
> UndefinedObject(Object)>>doesNotUnderstand: #new
> Message>>sentTo:
> UndefinedObject(Object)>>doesNotUnderstand: #new
> XMLDocumentHighlightDefaults class(XMLHighlightDefaults
> class)>>textColorForShoutProperty:
> XMLDocumentHighlightDefaults class(XMLHighlightDefaults
> class)>>defaultDefaultColor
> XMLDocumentHighlightDefaults(XMLHighlightDefaults)>>defaultColor
> XMLDocumentHighlighter(XMLHighlighter)>>initializeColorsWithDefaults:
> XMLDocumentHighlighter>>initializeColorsWithDefaults:
> XMLDocumentHighlighter(XMLHighlighter)>>initialize
> XMLDocumentHighlighter class(Behavior)>>new
> XMLHighlightingWriter>>on:
> XMLHighlightingWriter class(XMLWriter class)>>on:
> XMLHighlightingWriter class(XMLWriter class)>>new
> XMLDocument(XMLNode)>>asHighlightedTextWrittenWith:
> XMLDocument(XMLNode)>>treeViewLabelText
> [ :each | each treeViewLabelText ] in
> XMLDocument(XMLNode)>>gtInspectorTreeIn:inContext: in Block: [ :each | each
> treeViewLabelText ]
> BlockClosure>>glamourValueWithArgs:
> BlockClosure(ProtoObject)>>glamourValue:
> GLMTreePresentation(GLMFormatedPresentation)>>formatedDisplayValueOf:
> GLMTreeMorphNodeModel>>displayText
> GLMTreeMorphNodeModel>>elementColumn
> [ :node :cont | node perform: self rowMorphGetSelector ] in
> MorphTreeColumn>>rowMorphGetterBlock in Block: [ :node :cont | node
> perform: self rowMorphGetSele...etc...
> MorphTreeColumn>>rowMorphFor:
> [ :col |
> | v |
> v := col rowMorphFor: complexContents.
> controls add: v.
> col -> v ] in MorphTreeNodeMorph>>buildRowMorph in Block: [ :col | ...
> OrderedCollection>>collect:
> MorphTreeNodeMorph>>buildRowMorph
> MorphTreeNodeMorph>>initRow
> MorphTreeNodeMorph>>initWithContents:prior:forList:indentLevel:
> [ :item :idx |
> priorMorph := self indentingItemClass new
> initWithContents: item
> prior: priorMorph
> forList: self
> indentLevel: newIndent.
> firstAddition ifNil: [ firstAddition := priorMorph ].
> morphList add: priorMorph.
> "Was this row expanded ? if true -> expand it
> again "
> ((item hasEquivalentIn: expandedItems) or: [ priorMorph isExpanded ])
> ifTrue: [ priorMorph isExpanded: true.
> priorMorph
> addChildrenForList: self
> addingTo: morphList
> withExpandedItems: expandedItems ] ] in
> GLMPaginatedMorphTreeMorph(MorphTreeMorph)>>addMorphsTo:from:withExpandedItems:atLevel:
> in Block: [ :item :idx | ...
> OrderedCollection(SequenceableCollection)>>withIndexDo:
>
>
>
> .
>
> If it isn’t obvious what is going wrong from the above, I gues the thing
> to do is reinstall Pharo, and go through the steps of installing the
> various packages while recording them. If I get the same error, I’ll post
> the video of what I did  on youtube. If I don’t an error, then it was
> operator error from the start, obviously.
>

When it works for others but not yourself, then it seems something is
different in the environment.
One part of that environment is saved "Settings" so try the following
experiment

Presuming you are using Pharo Launcher...
1. Right click Pharo 7.0 64bit (stable), then > Create Image

2. Right-click that new image, then > LAUNCH WITHOUT SETTING

3. In Playground, evaluate the following all together...
Metacello new
 baseline: 'XMLParserHTML';
 repository: 'github://pharo-contributions/XML-XMLParserHTML/src';
 load.
(#XMLHTMLParser asClass parseURL:  '
https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference')
inspect.

4. Report whether the error still occurs.

=

That said, reviewing the method near top of your stack...
XMLHighlightDefaults >> textColorForShoutProperty: aShoutProperty
self haltOnce.
^ TextColor color:
(((SHTextStylerST80 new attributesFor: aShoutProperty)
detect: [:each | each respondsTo: #color]
ifNone: [^ nil]) color)

the syntax highlighter indicates that  "SHTextStylerST80"  is an unknown
class,
in which scenario your DNU error message is expected when #new is sent to
it.

I notice there is a  "SHTextStyler" class.  I'm not sure how this relates
to "SHTextStylerST80"
but have a go at changing #textColorForShoutProperty: to use it,
then again evaluate...
   (#XMLHTMLParser asClass parseURL: '
https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference')
inspect.

cheers -ben


Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2020-01-07 Thread LawsonEnglish
> 
> You can copy that from the debugger extra menu (top right).
> 
> Try 'Basic Inspect It' instead of 'Inspect It', this will use an older less 
> complex inspector, that will probably work.
> 
>> On 7 Jan 2020, at 23:06, LawsonEnglish  wrote:
>> 
>> Well, as you can see in my response elsewhere, none of that actually works 
>> as you describe.
>> 
>>>ingredientsXML := XMLHTMLParser parseURL: 
>>> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.
>> 
>> Doesn’t raise any errors, with or without the local variable declaration.
>> 
>>> ingredientsXML = nil returns false
>> 
>>> ingredientsXML inspect
>> 
>> Raises the message “#new on nil
>> 
>> I “do it” on the entire text or on each line in the order entered. It 
>> doesn’t matter.
>> 
>> I’m using a Mac with Mac OS X Catalina, using Pharo 7.
>> 
>> 
>> L
>> 
>> 
>>> On Jan 7, 2020, at 5:31 AM, Torsten Bergmann  wrote:
>>> 
>>> Agree with Peter - but "screw things up" means then the users screws up.
>>> 
>>> Pharo and the Playground is working fine on them. But one has to know the 
>>> difference when 
>>> working with the Playground:
>>> 
>>> 1. If you evaluate with an explicit variable declaration than the variable 
>>> is freshly defined and used like a temporary variable in a method:
>>> 
>>>| ingredientsXML |
>>>ingredientsXML := XMLHTMLParser parseURL: 
>>> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.
>>>ingredientsXML inspect
>>> 
>>> You have to selected the full text and evaluate it (either with "do It" or 
>>> "print it" to get the result. 
>>> 
>>> If you only select "ingredientsXML inspect" part first and evaluate then 
>>> the variable "ingredientsXML" is not known, undefined 
>>> and uninitialized and therefore results in a nil.
>>> 
>>> 2. If in the playground you do not give an explicit variable declaration at 
>>> the beginning line like for example in
>>> 
>>>ingredientsXML := XMLHTMLParser parseURL: 
>>> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.
>>>ingredientsXML inspect
>>> 
>>>  then a "workspace local variable" is implicitly created by the playground 
>>> as soon as you evaluate which means 
>>> 
>>>- "ingredientsXML" is defined as a workspace variable as soon as you 
>>> evaluate
>>>- the contents of "ingredientsXML" is preserved over different 
>>> evaluations within the workspace / playground
>>>- you can use only "ingredientsXML" within this playground (not in 
>>> another plaground)
>>> 
>>>  So you can evaluate the first line doing the assignment (this initializes 
>>> the workspace variable "ingredientsXML" for the current playground) 
>>>  and when you later want to use it again you can just inspect it or 
>>> evaluate the second line in the same playground.
>>> 
>>>  If you like you can open a second playground which can have its own 
>>> "ingredientsXML" workspace variable.
>>> 
>>> Workspace variables (or "playground variables") are convenient for 
>>> experimenting - as they are preserved - but
>>> yes they might confuse you when you cant remember what was done with them 
>>> last.
>>> 
>>> Bye
>>> T.
>>> 
>>>> Gesendet: Dienstag, 07. Januar 2020 um 09:55 Uhr
>>>> Von: "PBKResearch" 
>>>> An: "'Any question about pharo is welcome'" 
>>>> Betreff: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub
>>>> 
>>>> It may be a quirk of how Pharo Playground works. It doesn't need local 
>>>> variable declarations - which is convenient - but putting them in can 
>>>> screw things up. Try your snippet again without the first line. Compare 
>>>> Torsten's code.
>>>> 
>>>> HTH
>>>> 
>>>> Peter Kenny
>>>> 
>>>> -Original Message-
>>>> From: Pharo-users  On Behalf Of 
>>>> Torsten Bergmann
>>>> Sent: 07 January 2020 07:47
>>>> To: pharo-users@lists.pharo.org
>>>> Cc: pharo-users@lists.pharo.org
>>>> Subject: Re: [Pharo-users] [ANN] XMLParserHTM

Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2020-01-07 Thread Sven Van Caekenberghe
Well, I am trying to help you ;-)

Did you read everything that I wrote ?

Did you try the 'Basic Inspect It' menu item ?

> On 7 Jan 2020, at 23:19, LawsonEnglish  wrote:
> 
> Thanks for responding. I won’t say that I’m not screwing up as I’ve had 
> severe health problems (impacting both physical and cognitive abilities to 
> the point that I am permanently on US government disability).
> 
> Even so, I did the Squeak from the very start 
> [https://www.youtube.com/playlist?list=PL6601A198DF14788D] videos some years 
> ago, and as far as I can tell, I can still understand Smalltalk and its 
> features to that level.
> 
> The “do it and go” yields the same “#new was sent to nil” error.
> 
> Note that “do it” doesn’t give an error. “inspect does” — the “#new was sent 
> to nil”
> 
> As before, 
> 
> ingredientsXML = nil.
> 
> returns “false”
> 
> This could be a plugin issue in Mac Catalina as Apple has added all sorts of 
> arcane security features with the new OS.
> 
> 
> Or it just could be my literally crippled brain not seeing something obvious 
> due to the fallout from my health issues.
> 
> I have no way of knowing (obviously)
> 
> 
> Thanks for responding.
> 
> I had hoped to do new videos discussing the neat features of Pharo similar to 
> the “very start” videos, but since I can’t get things started, obviously I 
> can’t make new "from the very start” videos either.
> 
> L
> 
> 
>> On Jan 7, 2020, at 3:04 PM, PBKResearch  wrote:
>> 
>> I agree it makes no sense. I repeated exactly what you describe in a new 
>> playground (in Pharo 6.1 on Windows 10) and all worked as expected – 
>> essentially the same result as Torsten reported in his first post. I wonder 
>> if it might be something Mac related in the operation of Playground.
>>  
>> As a desperate try to explain it, please see what happens if you open a 
>> Playground with just your single line
>> ingredientsXML := XMLHTMLParser parseURL: 
>> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference’
>> and then select ‘do it and go’. You should find an inspector pane opening to 
>> the right in the Playground, with the result of the parse. If this fails, 
>> the standard suggestion is to open a debugger on you error message and try 
>> to work back through the stack to see how execution got there.
>>  
>> Just to discourage you further, when you do get to read the contents of the 
>> URL, you will find that the USDA have changed everything. All the data are 
>> now on a separate web site, probably in a new layout. This is one of the 
>> perpetual hassles of web scraping – the web site authors have to justify 
>> their existence by rewriting everything. I wrote this section of the 
>> scraping booklet, working up something I had done as a one-off a year or so 
>> earlier, and then I found that the USDA had changed the layout in the 
>> interim and much needed to be rewritten.
>>  
>> HTH – in part at least.
>>  
>> Peter Kenny
>>  
>> To Torsten – I agree I was slipshod in my drafting – I was in a hurry. 
>> Instead of saying ‘can screw things up’ I should have said ‘can produce 
>> counter-intuitive results’, as exemplified by the fact that, in your first 
>> example, ‘ingredientsXML’ can mean different things depending on whether you 
>> execute it all in one go or a line at a time.
>>  
>> From: Pharo-users  On Behalf Of 
>> LawsonEnglish
>> Sent: 07 January 2020 20:55
>> To: Any question about pharo is welcome 
>> Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub
>>  
>> I deleted the playground and entered the text thusly
>>  
>> ingredientsXML := XMLHTMLParser parseURL: 
>> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference’. 
>>  
>> “do it” has no complaints
>>  
>> ingredientsXML = nil 
>>  
>> yields “false"
>>  
>> ingredientsXML inspect
>>  
>> has errors: #new sent to nil
>>  
>>  
>> .
>>  
>> This makes no sense at all.
>>  
>>  
>> L
>>  
>> 
>> 
>>> On Jan 7, 2020, at 1:55 AM, PBKResearch  wrote:
>>>  
>>> It may be a quirk of how Pharo Playground works. It doesn't need local 
>>> variable declarations - which is convenient - but putting them in can screw 
>>> things up. Try your snippet again without the first line. Compare Torsten's 
>>> code.
>>> 
>>> HTH
>>> 
>>> Peter Kenny
>>> 
>>> -Original Message-
>>> From: Pharo-users  On Beha

Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2020-01-07 Thread LawsonEnglish
Thanks for responding. I won’t say that I’m not screwing up as I’ve had severe 
health problems (impacting both physical and cognitive abilities to the point 
that I am permanently on US government disability).

Even so, I did the Squeak from the very start 
[https://www.youtube.com/playlist?list=PL6601A198DF14788D 
<https://www.youtube.com/playlist?list=PL6601A198DF14788D>] videos some years 
ago, and as far as I can tell, I can still understand Smalltalk and its 
features to that level.

The “do it and go” yields the same “#new was sent to nil” error.

Note that “do it” doesn’t give an error. “inspect does” — the “#new was sent to 
nil”

As before, 

ingredientsXML = nil.

returns “false”

This could be a plugin issue in Mac Catalina as Apple has added all sorts of 
arcane security features with the new OS.


Or it just could be my literally crippled brain not seeing something obvious 
due to the fallout from my health issues.

I have no way of knowing (obviously)


Thanks for responding.

I had hoped to do new videos discussing the neat features of Pharo similar to 
the “very start” videos, but since I can’t get things started, obviously I 
can’t make new "from the very start” videos either.

L


> On Jan 7, 2020, at 3:04 PM, PBKResearch  wrote:
> 
> I agree it makes no sense. I repeated exactly what you describe in a new 
> playground (in Pharo 6.1 on Windows 10) and all worked as expected – 
> essentially the same result as Torsten reported in his first post. I wonder 
> if it might be something Mac related in the operation of Playground.
>  
> As a desperate try to explain it, please see what happens if you open a 
> Playground with just your single line
> ingredientsXML := XMLHTMLParser parseURL: 
> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference’ 
> <https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference%E2%80%99>
> and then select ‘do it and go’. You should find an inspector pane opening to 
> the right in the Playground, with the result of the parse. If this fails, the 
> standard suggestion is to open a debugger on you error message and try to 
> work back through the stack to see how execution got there.
>  
> Just to discourage you further, when you do get to read the contents of the 
> URL, you will find that the USDA have changed everything. All the data are 
> now on a separate web site, probably in a new layout. This is one of the 
> perpetual hassles of web scraping – the web site authors have to justify 
> their existence by rewriting everything. I wrote this section of the scraping 
> booklet, working up something I had done as a one-off a year or so earlier, 
> and then I found that the USDA had changed the layout in the interim and much 
> needed to be rewritten.
>  
> HTH – in part at least.
>  
> Peter Kenny
>  
> To Torsten – I agree I was slipshod in my drafting – I was in a hurry. 
> Instead of saying ‘can screw things up’ I should have said ‘can produce 
> counter-intuitive results’, as exemplified by the fact that, in your first 
> example, ‘ingredientsXML’ can mean different things depending on whether you 
> execute it all in one go or a line at a time.
>  
> From: Pharo-users  On Behalf Of 
> LawsonEnglish
> Sent: 07 January 2020 20:55
> To: Any question about pharo is welcome 
> Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub
>  
> I deleted the playground and entered the text thusly
>  
> ingredientsXML := XMLHTMLParser parseURL: 
> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference’ 
> <https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference%E2%80%99>.
>  
>  
> “do it” has no complaints
>  
> ingredientsXML = nil 
>  
> yields “false"
>  
> ingredientsXML inspect
>  
> has errors: #new sent to nil
>  
>  
> .
>  
> This makes no sense at all.
>  
>  
> L
>  
> 
> 
>> On Jan 7, 2020, at 1:55 AM, PBKResearch > <mailto:pe...@pbkresearch.co.uk>> wrote:
>>  
>> It may be a quirk of how Pharo Playground works. It doesn't need local 
>> variable declarations - which is convenient - but putting them in can screw 
>> things up. Try your snippet again without the first line. Compare Torsten's 
>> code.
>> 
>> HTH
>> 
>> Peter Kenny
>> 
>> -Original Message-
>> From: Pharo-users > <mailto:pharo-users-boun...@lists.pharo.org>> On Behalf Of Torsten Bergmann
>> Sent: 07 January 2020 07:47
>> To: pharo-users@lists.pharo.org <mailto:pharo-users@lists.pharo.org>
>> Cc: pharo-users@lists.pharo.org <mailto:pharo-users@lists.pharo.org>
>> Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub
>> 
>> Works without

Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2020-01-07 Thread Sven Van Caekenberghe
Something is wrong in your image. 
The XML package adds special GT inspector views, the error is probably there.
This has nothing to do with the platform.

BTW, a stack trace would be much appreciated, like:

ZeroDivide when doing: 1/0

SmallInteger>>/
UndefinedObject>>DoIt
OpalCompiler>>evaluate
RubSmalltalkEditor>>evaluate:andDo:
RubSmalltalkEditor>>highlightEvaluateAndDo:
[ textMorph textArea editor highlightEvaluateAndDo: ann action.
textMorph shoutStyler style: textMorph text ] in [ textMorph textArea
handleEdit: [ textMorph textArea editor highlightEvaluateAndDo: ann 
action.
textMorph shoutStyler style: textMorph text ] ] in 
GLMMorphicPharoScriptRenderer(GLMMorphicPharoCodeRenderer)>>actOnHighlightAndEvaluate:
 in Block: [ textMorph textArea editor highlightEvaluateAndDo...etc...
RubEditingArea(RubAbstractTextArea)>>handleEdit:
[ textMorph textArea
handleEdit: [ textMorph textArea editor highlightEvaluateAndDo: ann 
action.
textMorph shoutStyler style: textMorph text ] ] in 
GLMMorphicPharoScriptRenderer(GLMMorphicPharoCodeRenderer)>>actOnHighlightAndEvaluate:
 in Block: [ textMorph textArea...
WorldState>>runStepMethodsIn:
WorldMorph>>runStepMethods
WorldState>>doOneCycleNowFor:
WorldState>>doOneCycleFor:
WorldMorph>>doOneCycle
WorldMorph class>>doOneCycle
[ [ WorldMorph doOneCycle.
Processor yield.
false ] whileFalse: [  ] ] in MorphicUIManager>>spawnNewProcess in Block: [ [ 
WorldMorph doOneCycle
[ self value.
Processor terminateActive ] in BlockClosure>>newProcess in Block: [ self 
value

You can copy that from the debugger extra menu (top right).

Try 'Basic Inspect It' instead of 'Inspect It', this will use an older less 
complex inspector, that will probably work.

> On 7 Jan 2020, at 23:06, LawsonEnglish  wrote:
> 
> Well, as you can see in my response elsewhere, none of that actually works as 
> you describe.
> 
>> ingredientsXML := XMLHTMLParser parseURL: 
>> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.
> 
> Doesn’t raise any errors, with or without the local variable declaration.
> 
>> ingredientsXML = nil returns false
> 
>> ingredientsXML inspect
> 
> Raises the message “#new on nil
> 
> I “do it” on the entire text or on each line in the order entered. It doesn’t 
> matter.
> 
> I’m using a Mac with Mac OS X Catalina, using Pharo 7.
> 
> 
> L
> 
> 
>> On Jan 7, 2020, at 5:31 AM, Torsten Bergmann  wrote:
>> 
>> Agree with Peter - but "screw things up" means then the users screws up.
>> 
>> Pharo and the Playground is working fine on them. But one has to know the 
>> difference when 
>> working with the Playground:
>> 
>> 1. If you evaluate with an explicit variable declaration than the variable 
>> is freshly defined and used like a temporary variable in a method:
>> 
>> | ingredientsXML |
>> ingredientsXML := XMLHTMLParser parseURL: 
>> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.
>> ingredientsXML inspect
>> 
>>  You have to selected the full text and evaluate it (either with "do It" or 
>> "print it" to get the result. 
>> 
>>  If you only select "ingredientsXML inspect" part first and evaluate then 
>> the variable "ingredientsXML" is not known, undefined 
>>  and uninitialized and therefore results in a nil.
>> 
>> 2. If in the playground you do not give an explicit variable declaration at 
>> the beginning line like for example in
>> 
>> ingredientsXML := XMLHTMLParser parseURL: 
>> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.
>> ingredientsXML inspect
>> 
>>   then a "workspace local variable" is implicitly created by the playground 
>> as soon as you evaluate which means 
>> 
>> - "ingredientsXML" is defined as a workspace variable as soon as you 
>> evaluate
>> - the contents of "ingredientsXML" is preserved over different 
>> evaluations within the workspace / playground
>> - you can use only "ingredientsXML" within this playground (not in 
>> another plaground)
>> 
>>   So you can evaluate the first line doing the assignment (this initializes 
>> the workspace variable "ingredientsXML" for the current playground) 
>>   and when you later want to use it again you can just inspect it or 
>> evaluate the second line in the same playground.
>> 
>>   If you like you can open a second playground which can have its own 
>> "ing

Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2020-01-07 Thread LawsonEnglish
Well, as you can see in my response elsewhere, none of that actually works as 
you describe.

>  ingredientsXML := XMLHTMLParser parseURL: 
> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.

Doesn’t raise any errors, with or without the local variable declaration.

> ingredientsXML = nil returns false

> ingredientsXML inspect

Raises the message “#new on nil

I “do it” on the entire text or on each line in the order entered. It doesn’t 
matter.

I’m using a Mac with Mac OS X Catalina, using Pharo 7.


L


> On Jan 7, 2020, at 5:31 AM, Torsten Bergmann  wrote:
> 
> Agree with Peter - but "screw things up" means then the users screws up.
> 
> Pharo and the Playground is working fine on them. But one has to know the 
> difference when 
> working with the Playground:
> 
> 1. If you evaluate with an explicit variable declaration than the variable is 
> freshly defined and used like a temporary variable in a method:
> 
>  | ingredientsXML |
>  ingredientsXML := XMLHTMLParser parseURL: 
> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.
>  ingredientsXML inspect
> 
>   You have to selected the full text and evaluate it (either with "do It" or 
> "print it" to get the result. 
> 
>   If you only select "ingredientsXML inspect" part first and evaluate then 
> the variable "ingredientsXML" is not known, undefined 
>   and uninitialized and therefore results in a nil.
> 
> 2. If in the playground you do not give an explicit variable declaration at 
> the beginning line like for example in
> 
>  ingredientsXML := XMLHTMLParser parseURL: 
> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.
>  ingredientsXML inspect
> 
>then a "workspace local variable" is implicitly created by the playground 
> as soon as you evaluate which means 
> 
>  - "ingredientsXML" is defined as a workspace variable as soon as you 
> evaluate
>  - the contents of "ingredientsXML" is preserved over different 
> evaluations within the workspace / playground
>  - you can use only "ingredientsXML" within this playground (not in 
> another plaground)
> 
>So you can evaluate the first line doing the assignment (this initializes 
> the workspace variable "ingredientsXML" for the current playground) 
>and when you later want to use it again you can just inspect it or 
> evaluate the second line in the same playground.
> 
>If you like you can open a second playground which can have its own 
> "ingredientsXML" workspace variable.
> 
> Workspace variables (or "playground variables") are convenient for 
> experimenting - as they are preserved - but
> yes they might confuse you when you cant remember what was done with them 
> last.
> 
> Bye
> T.
> 
>> Gesendet: Dienstag, 07. Januar 2020 um 09:55 Uhr
>> Von: "PBKResearch" 
>> An: "'Any question about pharo is welcome'" 
>> Betreff: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub
>> 
>> It may be a quirk of how Pharo Playground works. It doesn't need local 
>> variable declarations - which is convenient - but putting them in can screw 
>> things up. Try your snippet again without the first line. Compare Torsten's 
>> code.
>> 
>> HTH
>> 
>> Peter Kenny
>> 
>> -Original Message-
>> From: Pharo-users  On Behalf Of Torsten 
>> Bergmann
>> Sent: 07 January 2020 07:47
>> To: pharo-users@lists.pharo.org
>> Cc: pharo-users@lists.pharo.org
>> Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub
>> 
>> Works without a problem (Pharo 8 on Windows), see attached. So it looks like 
>> a local problem.
>> 
>> Just check the debugger and compare to the squeak version where you run in 
>> trouble.
>> Maybe the document could not be retrieved on your machine.
>> 
>> Bye
>> T.
>> 
>>> Gesendet: Dienstag, 07. Januar 2020 um 04:42 Uhr
>>> Von: "LawsonEnglish" 
>>> An: pharo-users@lists.pharo.org
>>> Betreff: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub
>>> 
>>> Torsten Bergmann wrote
>>>> Hi,
>>>> 
>>>> 
>>>> You can load using
>>>> 
>>>>   Metacello new
>>>>baseline: 'XMLParserHTML';
>>>>repository: 'github://pharo-contributions/XML-XMLParserHTML/src';
>>>>load.
>>>> 
>>>> 
>>>> Bye
>>>> T.
>>> 
>>> Hi,
>>&g

Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2020-01-07 Thread PBKResearch
I agree it makes no sense. I repeated exactly what you describe in a new 
playground (in Pharo 6.1 on Windows 10) and all worked as expected – 
essentially the same result as Torsten reported in his first post. I wonder if 
it might be something Mac related in the operation of Playground.

 

As a desperate try to explain it, please see what happens if you open a 
Playground with just your single line

ingredientsXML := XMLHTMLParser parseURL: ' 
<https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference’> 
https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference’

and then select ‘do it and go’. You should find an inspector pane opening to 
the right in the Playground, with the result of the parse. If this fails, the 
standard suggestion is to open a debugger on you error message and try to work 
back through the stack to see how execution got there.

 

Just to discourage you further, when you do get to read the contents of the 
URL, you will find that the USDA have changed everything. All the data are now 
on a separate web site, probably in a new layout. This is one of the perpetual 
hassles of web scraping – the web site authors have to justify their existence 
by rewriting everything. I wrote this section of the scraping booklet, working 
up something I had done as a one-off a year or so earlier, and then I found 
that the USDA had changed the layout in the interim and much needed to be 
rewritten.

 

HTH – in part at least.

 

Peter Kenny

 

To Torsten – I agree I was slipshod in my drafting – I was in a hurry. Instead 
of saying ‘can screw things up’ I should have said ‘can produce 
counter-intuitive results’, as exemplified by the fact that, in your first 
example, ‘ingredientsXML’ can mean different things depending on whether you 
execute it all in one go or a line at a time.

 

From: Pharo-users  On Behalf Of 
LawsonEnglish
Sent: 07 January 2020 20:55
To: Any question about pharo is welcome 
Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

 

I deleted the playground and entered the text thusly

 

ingredientsXML := XMLHTMLParser parseURL: 
'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb 
<https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference’> 
=Standard+Reference’. 

 

“do it” has no complaints

 

ingredientsXML = nil 

 

yields “false"

 

ingredientsXML inspect

 

has errors: #new sent to nil

 

 

.

 

This makes no sense at all.

 

 

L

 





On Jan 7, 2020, at 1:55 AM, PBKResearch mailto:pe...@pbkresearch.co.uk> > wrote:

 

It may be a quirk of how Pharo Playground works. It doesn't need local variable 
declarations - which is convenient - but putting them in can screw things up. 
Try your snippet again without the first line. Compare Torsten's code.

HTH

Peter Kenny

-Original Message-
From: Pharo-users mailto:pharo-users-boun...@lists.pharo.org> > On Behalf Of Torsten Bergmann
Sent: 07 January 2020 07:47
To: pharo-users@lists.pharo.org <mailto:pharo-users@lists.pharo.org> 
Cc: pharo-users@lists.pharo.org <mailto:pharo-users@lists.pharo.org> 
Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

Works without a problem (Pharo 8 on Windows), see attached. So it looks like a 
local problem.

Just check the debugger and compare to the squeak version where you run in 
trouble.
Maybe the document could not be retrieved on your machine.

Bye
T.




Gesendet: Dienstag, 07. Januar 2020 um 04:42 Uhr
Von: "LawsonEnglish" mailto:lengli...@cox.net> >
An: pharo-users@lists.pharo.org <mailto:pharo-users@lists.pharo.org> 
Betreff: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

Torsten Bergmann wrote



Hi,


You can load using

  Metacello new
   baseline: 'XMLParserHTML';
   repository: 'github://pharo-contributions/XML-XMLParserHTML/src';
   load.


Bye
T.


Hi,

I'm trying to use the sample code in the pharo screen scraping booklet 
— 
http://books.pharo.org/booklet-Scraping/pdf/2018-09-02-scrapingbook.pdf — but 
while everything appears to load, I'm getting an odd behavior from:

/| ingredientsXML |
ingredientsXML := XMLHTMLParser parseURL:
'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb 
<https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference> 
=Standard+Reference'.
ingredientsXML inspect/

"#new was sent to nil"

No matter what URL I use, I get the same message.

I'm using Mac OS Catalina so I thought I might have some strange Mac 
OS security issue (like it was quietly refusing to allow Pharo to 
access the internet), but I tested with squeak and the old

/html :=(HtmlParser parse:
'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb 
<https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference> 
=Standard+Reference'
asUrl retrieveContents content)/

and that returns actual html without any problems.


Suggestions?


Thanks.

L




--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html



 

 



Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2020-01-07 Thread LawsonEnglish
I deleted the playground and entered the text thusly

ingredientsXML := XMLHTMLParser parseURL: 
'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference’. 

“do it” has no complaints

ingredientsXML = nil 

yields “false"

ingredientsXML inspect

has errors: #new sent to nil


.

This makes no sense at all.


L


> On Jan 7, 2020, at 1:55 AM, PBKResearch  wrote:
> 
> It may be a quirk of how Pharo Playground works. It doesn't need local 
> variable declarations - which is convenient - but putting them in can screw 
> things up. Try your snippet again without the first line. Compare Torsten's 
> code.
> 
> HTH
> 
> Peter Kenny
> 
> -Original Message-
> From: Pharo-users  On Behalf Of Torsten 
> Bergmann
> Sent: 07 January 2020 07:47
> To: pharo-users@lists.pharo.org
> Cc: pharo-users@lists.pharo.org
> Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub
> 
> Works without a problem (Pharo 8 on Windows), see attached. So it looks like 
> a local problem.
> 
> Just check the debugger and compare to the squeak version where you run in 
> trouble.
> Maybe the document could not be retrieved on your machine.
> 
> Bye
> T.
> 
>> Gesendet: Dienstag, 07. Januar 2020 um 04:42 Uhr
>> Von: "LawsonEnglish" 
>> An: pharo-users@lists.pharo.org
>> Betreff: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub
>> 
>> Torsten Bergmann wrote
>>> Hi,
>>> 
>>> 
>>> You can load using
>>> 
>>>   Metacello new
>>> baseline: 'XMLParserHTML';
>>> repository: 'github://pharo-contributions/XML-XMLParserHTML/src';
>>> load.
>>> 
>>> 
>>> Bye
>>> T.
>> 
>> Hi,
>> 
>> I'm trying to use the sample code in the pharo screen scraping booklet 
>> — 
>> http://books.pharo.org/booklet-Scraping/pdf/2018-09-02-scrapingbook.pdf — 
>> but while everything appears to load, I'm getting an odd behavior from:
>> 
>> /| ingredientsXML |
>> ingredientsXML := XMLHTMLParser parseURL:
>> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.
>> ingredientsXML inspect/
>> 
>> "#new was sent to nil"
>> 
>> No matter what URL I use, I get the same message.
>> 
>> I'm using Mac OS Catalina so I thought I might have some strange Mac 
>> OS security issue (like it was quietly refusing to allow Pharo to 
>> access the internet), but I tested with squeak and the old
>> 
>> /html :=(HtmlParser parse:
>> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'
>> asUrl retrieveContents content)/
>> 
>> and that returns actual html without any problems.
>> 
>> 
>> Suggestions?
>> 
>> 
>> Thanks.
>> 
>> L
>> 
>> 
>> 
>> 
>> --
>> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>> 
>> 
> 
> 



Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2020-01-07 Thread Torsten Bergmann
Agree with Peter - but "screw things up" means then the users screws up.

Pharo and the Playground is working fine on them. But one has to know the 
difference when 
working with the Playground:
  
 1. If you evaluate with an explicit variable declaration than the variable is 
freshly defined and used like a temporary variable in a method:

  | ingredientsXML |
  ingredientsXML := XMLHTMLParser parseURL: 
'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.
  ingredientsXML inspect

   You have to selected the full text and evaluate it (either with "do It" or 
"print it" to get the result. 

   If you only select "ingredientsXML inspect" part first and evaluate then the 
variable "ingredientsXML" is not known, undefined 
   and uninitialized and therefore results in a nil.

 2. If in the playground you do not give an explicit variable declaration at 
the beginning line like for example in

  ingredientsXML := XMLHTMLParser parseURL: 
'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.
  ingredientsXML inspect

then a "workspace local variable" is implicitly created by the playground 
as soon as you evaluate which means 

  - "ingredientsXML" is defined as a workspace variable as soon as you 
evaluate
  - the contents of "ingredientsXML" is preserved over different 
evaluations within the workspace / playground
  - you can use only "ingredientsXML" within this playground (not in 
another plaground)

So you can evaluate the first line doing the assignment (this initializes 
the workspace variable "ingredientsXML" for the current playground) 
and when you later want to use it again you can just inspect it or evaluate 
the second line in the same playground.

If you like you can open a second playground which can have its own 
"ingredientsXML" workspace variable.

Workspace variables (or "playground variables") are convenient for 
experimenting - as they are preserved - but
yes they might confuse you when you cant remember what was done with them last.

Bye
T.

> Gesendet: Dienstag, 07. Januar 2020 um 09:55 Uhr
> Von: "PBKResearch" 
> An: "'Any question about pharo is welcome'" 
> Betreff: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub
>
> It may be a quirk of how Pharo Playground works. It doesn't need local 
> variable declarations - which is convenient - but putting them in can screw 
> things up. Try your snippet again without the first line. Compare Torsten's 
> code.
> 
> HTH
> 
> Peter Kenny
> 
> -Original Message-----
> From: Pharo-users  On Behalf Of Torsten 
> Bergmann
> Sent: 07 January 2020 07:47
> To: pharo-users@lists.pharo.org
> Cc: pharo-users@lists.pharo.org
> Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub
> 
> Works without a problem (Pharo 8 on Windows), see attached. So it looks like 
> a local problem.
> 
> Just check the debugger and compare to the squeak version where you run in 
> trouble.
> Maybe the document could not be retrieved on your machine.
> 
> Bye
> T.
> 
> > Gesendet: Dienstag, 07. Januar 2020 um 04:42 Uhr
> > Von: "LawsonEnglish" 
> > An: pharo-users@lists.pharo.org
> > Betreff: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub
> >
> > Torsten Bergmann wrote
> > > Hi,
> > > 
> > > 
> > > You can load using
> > > 
> > >Metacello new
> > >   baseline: 'XMLParserHTML';
> > >   repository: 'github://pharo-contributions/XML-XMLParserHTML/src';
> > >   load.
> > > 
> > > 
> > > Bye
> > > T.
> > 
> > Hi,
> > 
> > I'm trying to use the sample code in the pharo screen scraping booklet 
> > — 
> > http://books.pharo.org/booklet-Scraping/pdf/2018-09-02-scrapingbook.pdf — 
> > but while everything appears to load, I'm getting an odd behavior from:
> > 
> > /| ingredientsXML |
> > ingredientsXML := XMLHTMLParser parseURL:
> > 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.
> > ingredientsXML inspect/
> > 
> > "#new was sent to nil"
> > 
> > No matter what URL I use, I get the same message.
> > 
> > I'm using Mac OS Catalina so I thought I might have some strange Mac 
> > OS security issue (like it was quietly refusing to allow Pharo to 
> > access the internet), but I tested with squeak and the old
> > 
> > /html :=(HtmlParser parse:
> > 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'
> > asUrl retrieveContents content)/
> > 
> > and that returns actual html without any problems.
> > 
> > 
> > Suggestions?
> > 
> > 
> > Thanks.
> > 
> > L
> > 
> > 
> > 
> > 
> > --
> > Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
> > 
> >
> 
> 
>



Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2020-01-07 Thread PBKResearch
It may be a quirk of how Pharo Playground works. It doesn't need local variable 
declarations - which is convenient - but putting them in can screw things up. 
Try your snippet again without the first line. Compare Torsten's code.

HTH

Peter Kenny

-Original Message-
From: Pharo-users  On Behalf Of Torsten 
Bergmann
Sent: 07 January 2020 07:47
To: pharo-users@lists.pharo.org
Cc: pharo-users@lists.pharo.org
Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

Works without a problem (Pharo 8 on Windows), see attached. So it looks like a 
local problem.

Just check the debugger and compare to the squeak version where you run in 
trouble.
Maybe the document could not be retrieved on your machine.

Bye
T.

> Gesendet: Dienstag, 07. Januar 2020 um 04:42 Uhr
> Von: "LawsonEnglish" 
> An: pharo-users@lists.pharo.org
> Betreff: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub
>
> Torsten Bergmann wrote
> > Hi,
> > 
> > 
> > You can load using
> > 
> >Metacello new
> > baseline: 'XMLParserHTML';
> > repository: 'github://pharo-contributions/XML-XMLParserHTML/src';
> > load.
> > 
> > 
> > Bye
> > T.
> 
> Hi,
> 
> I'm trying to use the sample code in the pharo screen scraping booklet 
> — 
> http://books.pharo.org/booklet-Scraping/pdf/2018-09-02-scrapingbook.pdf — but 
> while everything appears to load, I'm getting an odd behavior from:
> 
> /| ingredientsXML |
> ingredientsXML := XMLHTMLParser parseURL:
> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.
> ingredientsXML inspect/
> 
> "#new was sent to nil"
> 
> No matter what URL I use, I get the same message.
> 
> I'm using Mac OS Catalina so I thought I might have some strange Mac 
> OS security issue (like it was quietly refusing to allow Pharo to 
> access the internet), but I tested with squeak and the old
> 
> /html :=(HtmlParser parse:
> 'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'
> asUrl retrieveContents content)/
> 
> and that returns actual html without any problems.
> 
> 
> Suggestions?
> 
> 
> Thanks.
> 
> L
> 
> 
> 
> 
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
> 
>




Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2020-01-06 Thread LawsonEnglish
Torsten Bergmann wrote
> Hi,
> 
> 
> You can load using
> 
>Metacello new
>   baseline: 'XMLParserHTML';
>   repository: 'github://pharo-contributions/XML-XMLParserHTML/src';
>   load.
> 
> 
> Bye
> T.

Hi, 

I'm trying to use the sample code in the pharo screen scraping booklet —
http://books.pharo.org/booklet-Scraping/pdf/2018-09-02-scrapingbook.pdf — 
but while everything appears to load, I'm getting an odd behavior from:

/| ingredientsXML |
ingredientsXML := XMLHTMLParser parseURL:
'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'.
ingredientsXML inspect/

"#new was sent to nil"

No matter what URL I use, I get the same message.

I'm using Mac OS Catalina so I thought I might have some strange Mac OS
security issue (like it was quietly refusing to allow Pharo to access the
internet), but I tested with squeak and the old 

/html :=(HtmlParser parse:
'https://ndb.nal.usda.gov/ndb/search/list?sort=ndb=Standard+Reference'
asUrl retrieveContents content)/

and that returns actual html without any problems.


Suggestions?


Thanks.

L




--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html



Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-30 Thread PBKResearch
Sean

I used Soup a few times, but found it difficult to interpret the output,
because the parse did not seem to reflect the hierarchy of the nodes in the
original; in particular, sibling nodes were not necessarily at the same
level in the Soup. XMLHTMLParser always gets the structure right, in my
experience. I think this is essential if you want to use Xpath to process
the parse. The worked examples in the scraping booklet show how the parser
and Xpath can work together.

HTH

Peter Kenny

-Original Message-
From: Pharo-users  On Behalf Of Sean P.
DeNigris
Sent: 30 November 2019 16:43
To: pharo-users@lists.pharo.org
Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

cedreek wrote
> To me, far better than using Soup. 

Ah, interesting! I use Soup almost exclusively. What did you find superior
about XMLParserHTML? I may give it a try...


cedreek wrote
> Google chrome pharo integration helps top to scrap complex full JS web 
> site like google ;)

Also interesting! Any publicly available examples? How does one load "Google
chrome pharo integration"? Also, there is often the "poor man's" way (albeit
requiring manual intervention) by inspecting the Ajax http requests in a
developer console and then recreating directly in Pharo.



-
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html




Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-30 Thread Cédrick Béler
I couldn’t get it from Zn as (I think) there are some js lib that defer the 
full rendering.

I have the same problem with a site in France (leboncoin). They use 
https://datadome.co  to complicate webscrapping. So an 
headless browser is the only solution I know. 

Cheers,

Cédrick

> Le 1 déc. 2019 à 00:23, Esteban Maringolo  a écrit :
> 
> Why use Chrome instead of ZnClient? To get a "real" render of the
> content? (including JS and whatnot).
> 
> Regards!
> 
> 
> Esteban A. Maringolo
> 
> On Sat, Nov 30, 2019 at 8:11 PM Cédrick Béler  wrote:
>> 
>> 
>>> 
>>> Also interesting! Any publicly available examples? How does one load "Google
>>> chrome pharo integration »?
>> 
>> "https://github.com/astares/Pharo-Chrome;
>> "https://github.com/akgrant43/Pharo-Chrome »
>> 
>> Cheers,
>> 
>> Cédrick
>> 
>> 
> 



Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-30 Thread Esteban Maringolo
Why use Chrome instead of ZnClient? To get a "real" render of the
content? (including JS and whatnot).

Regards!


Esteban A. Maringolo

On Sat, Nov 30, 2019 at 8:11 PM Cédrick Béler  wrote:
>
>
> >
> > Also interesting! Any publicly available examples? How does one load "Google
> > chrome pharo integration »?
>
> "https://github.com/astares/Pharo-Chrome;
> "https://github.com/akgrant43/Pharo-Chrome »
>
> Cheers,
>
> Cédrick
>
>



Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-30 Thread Cédrick Béler


> 
> Also interesting! Any publicly available examples? How does one load "Google
> chrome pharo integration »?

"https://github.com/astares/Pharo-Chrome;
"https://github.com/akgrant43/Pharo-Chrome »

Cheers,

Cédrick




Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-30 Thread Cédrick Béler

> cedreek wrote
>> To me, far better than using Soup. 
> 
> Ah, interesting! I use Soup almost exclusively. What did you find superior
> about XMLParserHTML? I may give it a try...
> 

It’s mainly xpath which I find easier than navigating the html tree with soup 
or even The xmlHtmlparser. 

I usually copy the xpath form a web inspector. I have to tweak it a bit though.

> 
> cedreek wrote
>> Google chrome pharo integration helps top to scrap complex full JS web
>> site like google ;)
> 
> Also interesting! Any publicly available examples? How does one load "Google
> chrome pharo integration"? Also, there is often the "poor man's" way (albeit
> requiring manual intervention) by inspecting the Ajax http requests in a
> developer console and then recreating directly in Pharo.
> 

I just tried it once. 

There is a google chrome plugin that allows to use chrome headless to get the 
fully loaded html page. 

I need to try it again. A simple example I’d like to do is to scrap google and 
remove advertised content ^^

This is btw Torsten package:

https://github.com/astares/Pharo-Chrome

Happy scrapping ;-)

And thx Torsten for all ^^

Cedrick 

> 
> 
> -
> Cheers,
> Sean
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
> 


Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-30 Thread Sean P. DeNigris
cedreek wrote
> To me, far better than using Soup. 

Ah, interesting! I use Soup almost exclusively. What did you find superior
about XMLParserHTML? I may give it a try...


cedreek wrote
> Google chrome pharo integration helps top to scrap complex full JS web
> site like google ;)

Also interesting! Any publicly available examples? How does one load "Google
chrome pharo integration"? Also, there is often the "poor man's" way (albeit
requiring manual intervention) by inspecting the Ajax http requests in a
developer console and then recreating directly in Pharo.



-
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html



Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-29 Thread Esteban Maringolo
Great!

I just added a link to the README.md of the project and created a PR,
because it is very likely that if you're parsing HTML you're doing
some scrapping. :-)

Esteban A. Maringolo


On Fri, Nov 29, 2019 at 2:18 PM Cédrick Béler  wrote:
>
> Stef and other wrote this book a while ago:
>
> http://books.pharo.org/booklet-Scraping/html/scrapingbook.html
>
> Basically XMLHtmlParser + XPath
>
> To me, far better than using Soup.
> Google chrome pharo integration helps top to scrap complex full JS web site 
> like google ;)



Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-29 Thread Cédrick Béler
Stef and other wrote this book a while ago:

http://books.pharo.org/booklet-Scraping/html/scrapingbook.html

Basically XMLHtmlParser + XPath

To me, far better than using Soup. 
Google chrome pharo integration helps top to scrap complex full JS web site 
like google ;)


Cheers,

Cedrick 

> Le 29 nov. 2019 à 15:41, Esteban Maringolo  a écrit :
> 
> Thank you Torsten,
> 
> I wasn't aware of this tool, I'm already using it to scrap content
> from a website and fed a Pharo driven system :)
> 
> The XML integration in the Inspector is great too.
> 
> Regards!
> 
> Esteban A. Maringolo
> 
>> On Tue, Nov 19, 2019 at 8:40 AM Torsten Bergmann  wrote:
>> 
>> Hi,
>> 
>> the STHub -> PharoExtras project "XMLParserHTML"
>> 
>> was now moved from http://smalltalkhub.com/#!/~PharoExtras/XMLParserHTML to
>> https://github.com/pharo-contributions/XML-XMLParserHTML including the FULL 
>> HISTORY
>> 
>> The old STHub repo was marked as obsolete - but is linking to the new one. 
>> I've also
>> setup an CI job:  https://travis-ci.org/pharo-contributions/XML-XMLParserHTML
>> which is green for Pharo 7. Some cleanups, class comments and docu was 
>> applied as you can
>> see from commit history.
>> 
>> The new version is tagged in git as version 1.6.0 (with a moveable tag 1.6.x 
>> in case further
>> hotfixes are required).
>> 
>> You can load using
>> 
>>   Metacello new
>>baseline: 'XMLParserHTML';
>>repository: 'github://pharo-contributions/XML-XMLParserHTML/src';
>>load.
>> 
>> or from catalog in Pharo 7 or 8.
>> 
>> Attached is current dependency graph.
>> 
>> More to come soon ...
>> 
>> Bye
>> T.
> 


Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-29 Thread Esteban Maringolo
Thank you Torsten,

I wasn't aware of this tool, I'm already using it to scrap content
from a website and fed a Pharo driven system :)

The XML integration in the Inspector is great too.

Regards!

Esteban A. Maringolo

On Tue, Nov 19, 2019 at 8:40 AM Torsten Bergmann  wrote:
>
> Hi,
>
> the STHub -> PharoExtras project "XMLParserHTML"
>
> was now moved from http://smalltalkhub.com/#!/~PharoExtras/XMLParserHTML to
> https://github.com/pharo-contributions/XML-XMLParserHTML including the FULL 
> HISTORY
>
> The old STHub repo was marked as obsolete - but is linking to the new one. 
> I've also
> setup an CI job:  https://travis-ci.org/pharo-contributions/XML-XMLParserHTML
> which is green for Pharo 7. Some cleanups, class comments and docu was 
> applied as you can
> see from commit history.
>
> The new version is tagged in git as version 1.6.0 (with a moveable tag 1.6.x 
> in case further
> hotfixes are required).
>
> You can load using
>
>Metacello new
> baseline: 'XMLParserHTML';
> repository: 'github://pharo-contributions/XML-XMLParserHTML/src';
> load.
>
> or from catalog in Pharo 7 or 8.
>
> Attached is current dependency graph.
>
> More to come soon ...
>
> Bye
> T.