Roberto Aguilar wrote:
>
> On Oct 24, 2009, at 6:17 PM, elca wrote:
>> hello!
>> thanks for your reply
>> for example i want to extract some text in cnn website.
>> such like 'Sponsored links' 'Money' text in cnn website.
>> follow is sample what i want to make script.
>> i want to add function into my script source which can extract such
>> like
>> text.
>> thanks in advance ! :)
>
> Unless I'm missing something, why do you need Internet Explorer at
> all? You can get the HTML using urllib2:
>
> import urllib2
> response = urllib2.urlopen('http://cnn.com/')
> html = response.read()
>
> then extract what you're looking for with beautiful soup:
>
> from BeautifulSoup import BeautifulSoup
> soup = BeautifulSoup(html)
>
> for content in soup.findAll('div', class="cnn_sectbincntnt2"):
> if ' /money?cnn=yes import win32com.client
>> from time import sleep
>> from win32com.client
>> import Dispatch
>> import urllib,urllib2
>> from BeautifulSoup import BeautifulSoup
>> ie = Dispatch("InternetExplorer.Application")
>> ie.Visible = 1
>> ie.Navigate("http://www.cnn.com")
>> sleep(15)
>> ie.Quit()
>>
>>
>> ccurvey wrote:
>>>
>>> you can definitely use IE to and innerHTML() to get the HTML, then
>>> use
>>> BeautifulSoup to parse the HTML. What are you having trouble with?
>>>
>>>
>>>
>>> On Sat, Oct 24, 2009 at 8:34 PM, elca <high...@gmail.com> wrote:
>>>
>>>>
>>>> hello...
>>>> if anyone know..please help me !
>>>> i really want to know...i was searched in google lot of time.
>>>> but can't found clear soultion. and also because of my lack of
>>>> python
>>>> knowledge.
>>>> i want to use IE.navigate function with beautifulsoup or lxml..
>>>> if anyone know about this or sample.
>>>> please help me!
>>>> thanks in advance
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/how-to-use-win32com-with-beautifulsoup-or-lxml--tp26044332p26044332.html
>>>> Sent from the Python - python-win32 mailing list archive at Nabble.com
>>>> .
>>>>
>>>> _______________________________________________
>>>> python-win32 mailing list
>>>> python-win32@python.org
>>>> http://mail.python.org/mailman/listinfo/python-win32
>>>>
>>>
>>>
>>>
>>> --
>>> The source of your stress might be a moron
>>>
>>> _______________________________________________
>>> python-win32 mailing list
>>> python-win32@python.org
>>> http://mail.python.org/mailman/listinfo/python-win32
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/how-to-use-win32com-with-beautifulsoup-or-lxml--tp26044332p26044523.html
>> Sent from the Python - python-win32 mailing list archive at
>> Nabble.com.
>>
>> _______________________________________________
>> python-win32 mailing list
>> python-win32@python.org
>> http://mail.python.org/mailman/listinfo/python-win32
>
> _______________________________________________
> python-win32 mailing list
> python-win32@python.org
> http://mail.python.org/mailman/listinfo/python-win32
>
>
Hello,
sorry for late reply..
actually im making web scraper.
and scraping is no problem with javascript.
after made scraper, i will add some other function and that time i will
encounter many javascript,
so why i try to use PAMIE or IE
http://elca.pastebin.com/m52e7d8e0
i was attached current scraper script source.
especially i want to change 'thepage = urllib.urlopen(theurl).read()' to
PAMIE method.
if possible ,you can check it and correct me?
thanks in advance..
Regards
--
View this message in context:
http://www.nabble.com/how-to-use-win32com-with-beautifulsoup-or-lxml--tp26044332p26053433.html
Sent from the Python - python-win32 mailing list archive at Nabble.com.
_______________________________________________
python-win32 mailing list
python-win32@python.org
http://mail.python.org/mailman/listinfo/python-win32