Re: Python parser problem

2012-12-13 Thread Chris Angelico
On Fri, Dec 14, 2012 at 6:30 PM, Paul Rudin  wrote:
> Chris Angelico  writes:
>
>> On Fri, Dec 14, 2012 at 6:12 AM, RCU  wrote:
>>>   Dave,
>>> Thanks for the reply.
>>> The script was originally edited on Windows with proper \r\n endings,
>>
>> It's worth noting that many Windows-based editors and interpreters are
>> quite happy with \n line endings. You may be able to save yourself
>> some trouble by switching everything to Unix newlines.
>
> ... and in particular emacs for windows works well.
>
> (holy wars ensue...)

So do lots of others. In fact, the only editor I've seen that
consistently fails is Notepad.

Actually... that sentence stands alone. The only editor I've seen that
consistently fails is Notepad.

(holy wars are deflected into heaping scorn on an Acceptable Target)

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser problem

2012-12-13 Thread Paul Rudin
Chris Angelico  writes:

> On Fri, Dec 14, 2012 at 6:12 AM, RCU  wrote:
>>   Dave,
>> Thanks for the reply.
>> The script was originally edited on Windows with proper \r\n endings,
>
> It's worth noting that many Windows-based editors and interpreters are
> quite happy with \n line endings. You may be able to save yourself
> some trouble by switching everything to Unix newlines.

... and in particular emacs for windows works well.

(holy wars ensue...)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser problem

2012-12-13 Thread Chris Angelico
On Fri, Dec 14, 2012 at 6:12 AM, RCU  wrote:
>   Dave,
> Thanks for the reply.
> The script was originally edited on Windows with proper \r\n endings,

It's worth noting that many Windows-based editors and interpreters are
quite happy with \n line endings. You may be able to save yourself
some trouble by switching everything to Unix newlines.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser problem

2012-12-13 Thread RCU

  Dave,
Thanks for the reply.
The script was originally edited on Windows with proper \r\n endings, but the 
PythonTidy script somehow does the doubling (I guess it assumes UNIX format only), i.e., 
\r\r\n . So indeed, that's kind of messy (and the Python Lang Reference specifies clearly 
it interprets \r as a newline, as well) and I didn't realize it with my editor. After 
running dos2unix (twice) on the script I cleaned all \r and it went OK.


I guess Python is complaining at line 30 and not at the previous lines, because of 
the line-breaking backslash.


  Best regards,
Alex

On 12/12/2012 9:59 PM, Dave Angel wrote:

On 12/12/2012 02:10 PM, RCU wrote:

   Hello.
 I would like to report a parser bug manifesting on Python 2.5, 2.7
(but not on 2.2) and 3.3.
 Please see the attached script.
 Basically this bug appeared after applying PythonTidy on a valid
script.

 More exactly, when running:
 python -c "import iCam_GIT5_5"
   I get:
 Traceback (most recent call last):
   File "", line 1, in
   File "iCam_GIT5_5.py", line 60

 ^
 SyntaxError: invalid syntax

 Actually, the error reported by Python is a bug, as far as I see:
the line 60 reported in the script does not actually contain the text
reported in the error, and this makes quite difficult locating the
so-called error.


No, the error is on line 60.  You have blank line between each line, but
your editor apparently doesn't show you that.

Your line-endings are messed up.  Here's a dump of the first two lines.
(using hexdump -C)

  43 55 52 52 45 4e 54 5f  52 45 4c 45 41 53 45 5f
|CURRENT_RELEASE_|
0010  54 49 4d 45 20 3d 20 27  32 30 31 32 5f 31 32 5f  |TIME =
'2012_12_|
0020  31 30 5f 31 33 5f 30 30  5f 30 30 27 0d 0d 0a 4e
|10_13_00_00'...N|

Notice that the line ends with 0d0d0a, or \r\r\n.  That's not valid.
Apparently python's logic considers that as a line ending with \r,
followed by a blank line ending with\r\n.



 In fact the error is at script line 30: we should have all the
code on one line, like this
 playlistToUse = youtubeClient.AddPlaylist(playlistTitle,
playlistTitle, playlist_private=False).
 The "\" used in the script to break the line in 2 is a
reminiscence of running PythonTidy-1.22.python (so fixing this bug
would be directly relevant when using PythonTidy).


Nothing wrong with ending with a backslash for continuation.  Backslash
continues the line onto the next one, which is blank.  Remove the extra
\r there and it'll be fine.



 With this occasion I would like to ask also what are the limits of
the Python 2.x and 3.x parser. Where can I find what are the limits on
the size/lines of the parsed script?


Can't help there.





--
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser problem

2012-12-12 Thread Terry Reedy

On 12/12/2012 2:10 PM, RCU wrote:

 I would like to report a parser bug manifesting on Python 2.5, 2.7
(but not on 2.2) and 3.3.


You are not the first to erroneously attribute a problem to Python 
itself. But seriously, the interpreter itself is so thoroughly tested on 
a daily basis that you should assume that a reported SyntaxError is real.



 Please see the attached script.
 Basically this bug appeared after applying PythonTidy on a valid
script.


PythonTidy is much more likely to be buggy than Python itself.


 More exactly, when running:
 python -c "import iCam_GIT5_5"
   I get:
 Traceback (most recent call last):
   File "", line 1, in 
   File "iCam_GIT5_5.py", line 60

 ^
 SyntaxError: invalid syntax


SyntaxErrors are sometimes reported on the line after they occur, 
especially when the error is at the very end of the line and not obvious 
until /n is read.



 The "\" used in the script to break the line in 2 is a reminiscence
of running PythonTidy-1.22.python (so fixing this bug would be directly
relevant when using PythonTidy).


A '\' used to break a line MUST be the last character in the line. Dave 
explained how your editor and PythonTidy together made a bug.



 With this occasion I would like to ask also what are the limits of
the Python 2.x and 3.x parser. Where can I find what are the limits on
the size/lines of the parsed script?


Python, the language has no limits. Implementations will, but they are 
larger than you will every write by hand. Auto-generated code that, for 
instance, nests a tuple more than 2**16 levels deep may have problems.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser problem

2012-12-12 Thread Jerry Hill
On Wed, Dec 12, 2012 at 2:10 PM, RCU  wrote:
> With this occasion I would like to ask also what are the limits of the
> Python 2.x and 3.x parser. Where can I find what are the limits on the
> size/lines of the parsed script?

The Python Language Reference is probably what you're looking for:
http://docs.python.org/2/reference/index.html

See, particularly, section 2 about lexical analysis and possibly
section 9 for the grammar.  The short answer though, is that python
doesn't have any limits on the line length or the size of a script,
other than that execution will obviously fail if you run out of memory
while parsing or compiling the script.

PEP 8 (http://www.python.org/dev/peps/pep-0008/) is the Python style
guide, and it does have some recommendations about line length
(http://www.python.org/dev/peps/pep-0008/#maximum-line-length).  That
document suggests a maximum length of 79 characters per line, and
that's probably what PythonTidy was trying to accomplish by splitting
your line.

-- 
Jerry
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser problem

2012-12-12 Thread Dave Angel
On 12/12/2012 02:10 PM, RCU wrote:
>   Hello.
> I would like to report a parser bug manifesting on Python 2.5, 2.7
> (but not on 2.2) and 3.3.
> Please see the attached script.
> Basically this bug appeared after applying PythonTidy on a valid
> script.
>
> More exactly, when running:
> python -c "import iCam_GIT5_5"
>   I get:
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "iCam_GIT5_5.py", line 60
>
> ^
> SyntaxError: invalid syntax
>
> Actually, the error reported by Python is a bug, as far as I see:
> the line 60 reported in the script does not actually contain the text
> reported in the error, and this makes quite difficult locating the
> so-called error.

No, the error is on line 60.  You have blank line between each line, but
your editor apparently doesn't show you that.

Your line-endings are messed up.  Here's a dump of the first two lines. 
(using hexdump -C)

  43 55 52 52 45 4e 54 5f  52 45 4c 45 41 53 45 5f 
|CURRENT_RELEASE_|
0010  54 49 4d 45 20 3d 20 27  32 30 31 32 5f 31 32 5f  |TIME =
'2012_12_|
0020  31 30 5f 31 33 5f 30 30  5f 30 30 27 0d 0d 0a 4e 
|10_13_00_00'...N|

Notice that the line ends with 0d0d0a, or \r\r\n.  That's not valid. 
Apparently python's logic considers that as a line ending with \r,
followed by a blank line ending with\r\n.


> In fact the error is at script line 30: we should have all the
> code on one line, like this
> playlistToUse = youtubeClient.AddPlaylist(playlistTitle,
> playlistTitle, playlist_private=False).
> The "\" used in the script to break the line in 2 is a
> reminiscence of running PythonTidy-1.22.python (so fixing this bug
> would be directly relevant when using PythonTidy).

Nothing wrong with ending with a backslash for continuation.  Backslash
continues the line onto the next one, which is blank.  Remove the extra
\r there and it'll be fine.

>
> With this occasion I would like to ask also what are the limits of
> the Python 2.x and 3.x parser. Where can I find what are the limits on
> the size/lines of the parsed script?
>
Can't help there.




-- 

DaveA

-- 
http://mail.python.org/mailman/listinfo/python-list


Python parser problem

2012-12-12 Thread RCU

  Hello.
I would like to report a parser bug manifesting on Python 2.5, 2.7 (but not on 2.2) 
and 3.3.

Please see the attached script.
Basically this bug appeared after applying PythonTidy on a valid script.

More exactly, when running:
python -c "import iCam_GIT5_5"
  I get:
Traceback (most recent call last):
  File "", line 1, in 
  File "iCam_GIT5_5.py", line 60

^
SyntaxError: invalid syntax

Actually, the error reported by Python is a bug, as far as I see: the line 60 
reported in the script does not actually contain the text reported in the error, and this 
makes quite difficult locating the so-called error.
In fact the error is at script line 30: we should have all the code on one line, like 
this
	playlistToUse = youtubeClient.AddPlaylist(playlistTitle, playlistTitle, 
playlist_private=False).
The "\" used in the script to break the line in 2 is a reminiscence of running 
PythonTidy-1.22.python (so fixing this bug would be directly relevant when using PythonTidy).


With this occasion I would like to ask also what are the limits of the Python 2.x and 
3.x parser. Where can I find what are the limits on the size/lines of the parsed script?


  Best regards,
Alex
CURRENT_RELEASE_TIME = '2012_12_10_13_00_00'

NEW_BT_FORMAT_TO_ALLOW_PLAYING_FILE_EVEN_IN_INBOX = True

def SendAlarmMessageToYouTubePlaylist(message):

global youtubeClient, youtubeClientAlreadyConnected

global YOUTUBE_TEST_CLIENT_ID, googleUsername, youtubeDeveloperKey

global uploadMediaToYouTube

global deviceId

if MY_DEBUG_STDOUT:

print 'Entered SendAlarmMessageToYouTubePlaylist() at %s.' % 
GetCurrentDateTimeStringWithMilliseconds()

sys.stdout.flush()

if uploadMediaToYouTube == 0:

uploadMediaToYouTube = 1

if youtubeClientAlreadyConnected == False:

if gdataModulesImported == False:

ImportGdataModules()

connResult = ConnectToYouTubeGData()

try:

playlistTitle = 'iCam_alarm_' + deviceId

if False:

playlistDescription = playlistTitle

playlistToUse = None

feed = youtubeClient.GetYouTubePlaylistFeed()

for myEntry in feed.entry:

myEntryTitle = myEntry.title.text

myEntryIdStr = myEntry.id.text.split('/')[-1]

if playlistTitle == myEntryTitle:

playlistToUse = myEntry

break

if playlistToUse is None:

playlistToUse = \

youtubeClient.AddPlaylist(playlistTitle, playlistTitle, 
playlist_private=False)

playlistDescription = ''

newPlaylistDescription = 'Alarm... motion degree... audio degree... 
%s.' % message

playlistToUse = None

feed = youtubeClient.GetYouTubePlaylistFeed()

for myEntry in feed.entry:

myEntryTitle = myEntry.title.text

myEntryIdStr = myEntry.id.text.split('/')[-1]

if myEntryTitle == playlistTitle:

if MY_DEBUG_STDOUT:

print 'SendAlarmMessageToYouTubePlaylist(): Feed matched 
myEntry =', myEntry

print 'SendAlarmMessageToYouTubePlaylist(): myEntry.content 
=', myEntry.content

print 'SendAlarmMessageToYouTubePlaylist(): 
myEntry.description = %s' % str(myEntry.description)

sys.stdout.flush()

playlistDescription = 
str(myEntry.description).split('>')[-2].split('-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python-parser running Beautiful Soup only spits out one line of 10. What i have gotten wrong here?

2010-12-25 Thread John Nagle

   Your program is doing what you asked it to do.  It finds the
first table with class 'bp_ergebnis_tab_info'.  Then it ignores
that results.  Then it finds the first "td" item in the document,
and prints the contents of that.  Then it exits.  What did
you want it to do?

   Try this.  It prints out the TD items on each
row of the table, in order.

import urllib2
from BeautifulSoup import BeautifulSoup
page = 
urllib2.urlopen("http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323";)

soup = BeautifulSoup(page)
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
for row in table.findAll('tr') : # for all TR items (table rows)
for td in row.findAll('td') : # for TD items in row
text = td.renderContents().strip()
print(text)
print('-') # mark end of row

John Nagle

On 12/25/2010 9:58 AM, Martin Kaspar wrote:

Hello dear Community,.
I am trying to get a scraper up and running: And keep running into
problems.

when I try what you have i have learned so far I only get:
Schuldaten

Here is the code that I used:

import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://www.schulministerium.nrw.de/BP/
SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323")
soup = BeautifulSoup(page)
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
first_td = soup.find('td')
text = first_td.renderContents()
trimmed_text = text.strip()
print trimmed_text


i run it in the template at http://scraperwiki.com/scrapers/new/python

see the target: 
http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323

What have I gotten wrong?

Can anybody review the code -

many thanks in Advance

regards
matze


--
http://mail.python.org/mailman/listinfo/python-list


python-parser running Beautiful Soup only spits out one line of 10. What i have gotten wrong here?

2010-12-25 Thread Martin Kaspar
Hello dear Community,.


I am trying to get a scraper up and running: And keep running into
problems.

when I try what you have i have learnedd so far I only get:
Schuldaten

Here is the code that I used:

import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://www.schulministerium.nrw.de/BP/
SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323")
soup = BeautifulSoup(page)
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
first_td = soup.find('td')
text = first_td.renderContents()
trimmed_text = text.strip()
print trimmed_text


i run it in the template at http://scraperwiki.com/scrapers/new/python

see the target: 
http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323

What have I gotten wrong?

Can anybody review the code -

many thanks in Advance

regards
matze
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python-parser running Beautiful Soup needs to be reviewed

2010-12-12 Thread Stef Mientki
I've no opinion.
>> I'm just struggling with BeautifulSoup myself, finding it one of the 
>> toughest libs I've seen ;-)
>
> Really? While I'm by no means an expert, I find it very easy to work with. 
> It's very well
> structured IMHO.
I think the cause lies in the documentation.
The PySide documentation is much easier to understand (at least for me)

http://www.pyside.org/docs/pyside/PySide/QtWebKit/QWebElement.html

cheers,
Stef
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python-parser running Beautiful Soup needs to be reviewed

2010-12-11 Thread Alexander Kapps

On 11.12.2010 22:38, Stef Mientki wrote:

On 11-12-2010 17:24, Martin Kaspar wrote:

Hello commnity

i am new to Python and to Beatiful Soup also!
It is told to be a great tool to parse and extract content. So here i
am...:

I want to take the content of a-tag of a table in a html
document. For example, i have this table


 
 
  This is a sample text
 

 
  This is the second sample text
 
 


How can i use beautifulsoup to take the text "This is a sample text"?

Should i make use
soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) to get
the whole table.

See the target 
http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323

Well - what have we to do first:

The first thing is t o find the table:

i do this with Using find rather than findall returns the first item
in the list
(rather than returning a list of all finds - in which case we'd have
to add an extra [0]
to take the first element of the list):


table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})

Then use find again to find the first td:

first_td = soup.find('td')

Then we have to use renderContents() to extract the textual contents:

text = first_td.renderContents()

... and the job is done (though we may also want to use strip() to
remove leading and trailing spaces:

trimmed_text = text.strip()

This should give us:


print trimmed_text
This is a sample text

as desired.


What do you think about the code? I love to hear from you!?

I've no opinion.
I'm just struggling with BeautifulSoup myself, finding it one of the toughest 
libs I've seen ;-)


Really? While I'm by no means an expert, I find it very easy to work 
with. It's very well structured IMHO.



So the simplest solution I came up with:

Text = """

 
 
  This is a sample text
 

 
  This is the second sample text
 
 

"""
Content = BeautifulSoup ( Text )
print Content.find('td').contents[0].strip()

This is a sample text


And now I wonder how to get the next contents !!


Content = BeautifulSoup ( Text )
for td in Content.findAll('td'):
print td.string.strip() # or td.renderContents().strip()
--
http://mail.python.org/mailman/listinfo/python-list


Re: python-parser running Beautiful Soup needs to be reviewed

2010-12-11 Thread Peter Pearson
On Sat, 11 Dec 2010 22:38:43 +0100, Stef Mientki wrote:
[snip]
> So the simplest solution I came up with:
>
> Text = """
>
> 
> 
>  This is a sample text
> 
>
> 
>  This is the second sample text
> 
> 
>
> """
> Content = BeautifulSoup ( Text )
> print Content.find('td').contents[0].strip()
 This is a sample text
>
> And now I wonder how to get the next contents !!

Here's a suggestion:

pe...@eleodes:~$ python
Python 2.5.2 (r252:60911, Jul 22 2009, 15:35:03) 
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from BeautifulSoup import BeautifulSoup
>>> Text = """
... 
... 
... 
...  This is a sample text
... 
... 
... 
...  This is the second sample text
... 
... 
... 
... """
>>> Content = BeautifulSoup ( Text )
>>> for xx in Content.findAll('td'):
...   print xx.contents[0].strip()
... 
This is a sample text
This is the second sample text
>>> 

-- 
To email me, substitute nowhere->spamcop, invalid->net.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python-parser running Beautiful Soup needs to be reviewed

2010-12-11 Thread Stef Mientki
On 11-12-2010 17:24, Martin Kaspar wrote:
> Hello commnity
>
> i am new to Python and to Beatiful Soup also!
> It is told to be a great tool to parse and extract content. So here i
> am...:
>
> I want to take the content of a -tag of a table in a html
> document. For example, i have this table
>
> 
> 
> 
>  This is a sample text
> 
>
> 
>  This is the second sample text
> 
> 
> 
>
> How can i use beautifulsoup to take the text "This is a sample text"?
>
> Should i make use
> soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) to get
> the whole table.
>
> See the target 
> http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323
>
> Well - what have we to do first:
>
> The first thing is t o find the table:
>
> i do this with Using find rather than findall returns the first item
> in the list
> (rather than returning a list of all finds - in which case we'd have
> to add an extra [0]
> to take the first element of the list):
>
>
> table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
>
> Then use find again to find the first td:
>
> first_td = soup.find('td')
>
> Then we have to use renderContents() to extract the textual contents:
>
> text = first_td.renderContents()
>
> ... and the job is done (though we may also want to use strip() to
> remove leading and trailing spaces:
>
> trimmed_text = text.strip()
>
> This should give us:
>
>
> print trimmed_text
> This is a sample text
>
> as desired.
>
>
> What do you think about the code? I love to hear from you!?
I've no opinion.
I'm just struggling with BeautifulSoup myself, finding it one of the toughest 
libs I've seen ;-)

So the simplest solution I came up with:

Text = """



 This is a sample text



 This is the second sample text



"""
Content = BeautifulSoup ( Text )
print Content.find('td').contents[0].strip()
>>> This is a sample text

And now I wonder how to get the next contents !!

cheers,
Stef
> greetings
> matze

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python-parser running Beautiful Soup needs to be reviewed

2010-12-11 Thread Nitin Pawar
try using lxml ... its very useful

On Sat, Dec 11, 2010 at 11:24 AM, Martin Kaspar  wrote:

> Hello commnity
>
> i am new to Python and to Beatiful Soup also!
> It is told to be a great tool to parse and extract content. So here i
> am...:
>
> I want to take the content of a -tag of a table in a html
> document. For example, i have this table
>
> 
>
>
> This is a sample text
>
>
>
> This is the second sample text
>
>
> 
>
> How can i use beautifulsoup to take the text "This is a sample text"?
>
> Should i make use
> soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) to get
> the whole table.
>
> See the target
> http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323
>
> Well - what have we to do first:
>
> The first thing is t o find the table:
>
> i do this with Using find rather than findall returns the first item
> in the list
> (rather than returning a list of all finds - in which case we'd have
> to add an extra [0]
> to take the first element of the list):
>
>
> table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
>
> Then use find again to find the first td:
>
> first_td = soup.find('td')
>
> Then we have to use renderContents() to extract the textual contents:
>
> text = first_td.renderContents()
>
> ... and the job is done (though we may also want to use strip() to
> remove leading and trailing spaces:
>
> trimmed_text = text.strip()
>
> This should give us:
>
>
> print trimmed_text
> This is a sample text
>
> as desired.
>
>
> What do you think about the code? I love to hear from you!?
>
> greetings
> matze
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
Nitin Pawar
-- 
http://mail.python.org/mailman/listinfo/python-list


python-parser running Beautiful Soup needs to be reviewed

2010-12-11 Thread Martin Kaspar
Hello commnity

i am new to Python and to Beatiful Soup also!
It is told to be a great tool to parse and extract content. So here i
am...:

I want to take the content of a -tag of a table in a html
document. For example, i have this table




 This is a sample text



 This is the second sample text




How can i use beautifulsoup to take the text "This is a sample text"?

Should i make use
soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) to get
the whole table.

See the target 
http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323

Well - what have we to do first:

The first thing is t o find the table:

i do this with Using find rather than findall returns the first item
in the list
(rather than returning a list of all finds - in which case we'd have
to add an extra [0]
to take the first element of the list):


table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})

Then use find again to find the first td:

first_td = soup.find('td')

Then we have to use renderContents() to extract the textual contents:

text = first_td.renderContents()

... and the job is done (though we may also want to use strip() to
remove leading and trailing spaces:

trimmed_text = text.strip()

This should give us:


print trimmed_text
This is a sample text

as desired.


What do you think about the code? I love to hear from you!?

greetings
matze
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python parser overridden by pymol

2009-11-12 Thread Dave Angel



Robert Kern wrote:
Jeremiah 
H. Savage wrote:



To use pymol and numpy together, I now do the following:

To ~/.bashrc add:
PYMOL_PATH=/usr/lib/pymodules/python2.5/pymol
export PYMOL_PATH

Then I can do the following in python:

 import numpy
 numpy.save('123',numpy.array([1,2,3]))
 numpy.load('123.npy')
   array([1, 2, 3])
 import sys
 sys.path.append( "/usr/lib/pymodules/python2.5/pymol")
 import pymol
 pymol.finish_launching()
 pymol.importing.load("/path/to/file.pdb")


No, do not do this. Add /usr/lib/pymodules/python2.5/ to your 
$PYTHONPATH, *not* /usr/lib/pymodules/python2.5/pymol/. You will 
continue to run into problems if you do it this way. You are not 
supposed to put the directory *of* the package onto sys.path but 
rather the directory that *contains* the package directory.


As I said before, I don't know pymol.  But if that is the package name, 
then Robert is certainly right.  You need to read the docs on pymol to 
see what they require.  For example, it's surprising they require a 
separate PYMOL_PATH environment variable, since they can find their own 
directory path with the __file__ attribute of one of the modules.


Anyway, one more generic comment.  Rather than having that directory in 
both the bashrc file  AND in your python source, I'd consider deriving 
the latter from the environment variable, once you determine that it's 
actually necessary.  And of course you could strip the last node from 
the path in the environment variable before appending it to sys.path, if 
that's what's appropriate.


DaveA




--
http://mail.python.org/mailman/listinfo/python-list


Re: python parser overridden by pymol

2009-11-12 Thread Robert Kern

Jeremiah H. Savage wrote:


To use pymol and numpy together, I now do the following:

To ~/.bashrc add:
PYMOL_PATH=/usr/lib/pymodules/python2.5/pymol
export PYMOL_PATH

Then I can do the following in python:

 import numpy
 numpy.save('123',numpy.array([1,2,3]))
 numpy.load('123.npy')
   array([1, 2, 3])
 import sys
 sys.path.append( "/usr/lib/pymodules/python2.5/pymol")
 import pymol
 pymol.finish_launching()
 pymol.importing.load("/path/to/file.pdb")


No, do not do this. Add /usr/lib/pymodules/python2.5/ to your $PYTHONPATH, *not* 
/usr/lib/pymodules/python2.5/pymol/. You will continue to run into problems if 
you do it this way. You are not supposed to put the directory *of* the package 
onto sys.path but rather the directory that *contains* the package directory.


--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: python parser overridden by pymol

2009-11-12 Thread Jeremiah H. Savage
On Wed, Nov 11, 2009 at 7:48 PM, Dave Angel  wrote:
>
>
> Jeremiah wrote:
>>
>> Hello,
>>
>> I'm fairly new to python (version 2.5.4), and am writing a program
>> which uses both pymol (version 1.2r1) and numpy (version 1.3.0) from
>> debian.
>>
>> It appears that when I add pymol to $PYTHONPATH, that parser.expr() is
>> no longer available, and so I am unable to use numpy.load(). I have
>> looked for where parser.expr() is defined in the python system so I
>> could place that directory first in $PYTHONPATH, but I have not been
>> able to find the file that defines expr().
>>
>> My reason for using numpy.load() is that I have a numpy array which
>> takes an hour to generate. Therefore, I'd like to use numpy.save() so
>> I could generate the array one time, and then load it later as needed
>> with numpy.load().
>>
>> I've successfully tested the use of numpy.save() and numpy.load() with
>> a small example when the pymol path is not defined in $PYTHONPATH  :
>>
>>   >>> import numpy
>>   >>> numpy.save('123',numpy.array([1,2,3]))
>>   >>> numpy.load('123.npy')
>>   array([1, 2, 3])
>>
>>
>> However, a problem arises once $PYTHONPATH includes the pymol
>> directory. To use the pymol api, I add the following to ~/.bashrc:
>>
>>   PYMOL_PATH=/usr/lib/pymodules/python2.5/pymol
>>   export PYMOL_PATH
>>   PYTHONPATH=$PYMOL_PATH
>>   export PYTHONPATH
>>
>> Once this is done, numpy.load() no longer works correctly, as pymol
>> contains a file named parser.py ( /usr/lib/pymodules/python2.5/pymol/
>> parser.py ), which apparently prevents python from using its native
>> parser.
>>
>>   >>> numpy.load('123.npy')
>>   Traceback (most recent call last):
>>     File "", line 1, in 
>>     File "/usr/lib/python2.5/site-packages/numpy/lib/io.py", line
>> 195, in load
>>       return format.read_array(fid)
>>     File "/usr/lib/python2.5/site-packages/numpy/lib/format.py",
>> line 353, in read_array
>>       shape, fortran_order, dtype = read_array_header_1_0(fp)
>>     File "/usr/lib/python2.5/site-packages/numpy/lib/format.py",
>> line 250, in read_array_header_1_0
>>       d = safe_eval(header)Thank you. That really helped.

To use pymol and numpy to
>>     File "/usr/lib/python2.5/site-packages/numpy/lib/utils.py", line
>> 840, in safe_eval
>>       ast = compiler.parse(source, "eval")
>>     File "/usr/lib/python2.5/compiler/transformer.py", line 54, in
>> parse
>>       return Transformer().parseexpr(buf)
>>     File "/usr/lib/python2.5/compiler/transformer.py", line 133, in
>> parseexpr
>>       return self.transform(parser.expr(text))
>>   AttributeError: 'module' object has no attribute 'expr'
>>
>> If I understand the problem correctly, can anyone tell me where
>> python.expr() is defined, or suggest a better method to fix this
>> problem?
>>
>> Thanks,
>> Jeremiah
>>
>>
>
> Generic answers, I have no experience with pymol
>
> If pymol really needs that parser.py, you have a problem, as there can only
> be one module by that name in the application.  But assuming it's needed for
> some obscure feature that you don't need, you could try the following
> sequence.
>
> 1) temporarily rename the pymol's  parser.py  file to something else, like
> pymolparser.py, and see what runs.
> 2) rather than changing the PYTHONPATH, fix  up  sys.path during your script
> initialization.
>   In particular, do an    import parser    near the beginning of the script.
>  This gets it loaded, even though you might not need to use it from this
> module.
>   After that import, then add the following line (which could be generalized
> later)
>   sys.path.append( "/usr/lib/pymodules/python2.5/pymol")
>
>
> If this works, then you can experiment a bit more, perhaps you don't need
> the extra import parser, just putting the pymol directory at the end of the
> sys.path rather than the beginning may be good enough.
>
> If the parser.py in the pymol is actually needed, you might need to rename
> its internal references to some other name, like pymolparser.
>
> HTH,
> DaveA
>
>

Thank you. Your second suggestion really helped.

To use pymol and numpy together, I now do the following:

To ~/.bashrc add:
PYMOL_PATH=/usr/lib/pymodules/python2.5/pymol
export PYMOL_PATH

Then I can do the following in python:

 import numpy
 numpy.save('123',numpy.array([1,2,3]))
 numpy.load('123.npy')
   array([1, 2, 3])
 import sys
 sys.path.append( "/usr/lib/pymodules/python2.5/pymol")
 import pymol
 pymol.finish_launching()
 pymol.importing.load("/path/to/file.pdb")

Thanks,
Jeremiah
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python parser overridden by pymol

2009-11-11 Thread Dave Angel



Jeremiah wrote:

Hello,

I'm fairly new to python (version 2.5.4), and am writing a program
which uses both pymol (version 1.2r1) and numpy (version 1.3.0) from
debian.

It appears that when I add pymol to $PYTHONPATH, that parser.expr() is
no longer available, and so I am unable to use numpy.load(). I have
looked for where parser.expr() is defined in the python system so I
could place that directory first in $PYTHONPATH, but I have not been
able to find the file that defines expr().

My reason for using numpy.load() is that I have a numpy array which
takes an hour to generate. Therefore, I'd like to use numpy.save() so
I could generate the array one time, and then load it later as needed
with numpy.load().

I've successfully tested the use of numpy.save() and numpy.load() with
a small example when the pymol path is not defined in $PYTHONPATH  :

   >>> import numpy
   >>> numpy.save('123',numpy.array([1,2,3]))
   >>> numpy.load('123.npy')
   array([1, 2, 3])


However, a problem arises once $PYTHONPATH includes the pymol
directory. To use the pymol api, I add the following to ~/.bashrc:

   PYMOL_PATH=/usr/lib/pymodules/python2.5/pymol
   export PYMOL_PATH
   PYTHONPATH=$PYMOL_PATH
   export PYTHONPATH

Once this is done, numpy.load() no longer works correctly, as pymol
contains a file named parser.py ( /usr/lib/pymodules/python2.5/pymol/
parser.py ), which apparently prevents python from using its native
parser.

   >>> numpy.load('123.npy')
   Traceback (most recent call last):
 File "", line 1, in 
 File "/usr/lib/python2.5/site-packages/numpy/lib/io.py", line
195, in load
   return format.read_array(fid)
 File "/usr/lib/python2.5/site-packages/numpy/lib/format.py",
line 353, in read_array
   shape, fortran_order, dtype = read_array_header_1_0(fp)
 File "/usr/lib/python2.5/site-packages/numpy/lib/format.py",
line 250, in read_array_header_1_0
   d = safe_eval(header)
 File "/usr/lib/python2.5/site-packages/numpy/lib/utils.py", line
840, in safe_eval
   ast = compiler.parse(source, "eval")
 File "/usr/lib/python2.5/compiler/transformer.py", line 54, in
parse
   return Transformer().parseexpr(buf)
 File "/usr/lib/python2.5/compiler/transformer.py", line 133, in
parseexpr
   return self.transform(parser.expr(text))
   AttributeError: 'module' object has no attribute 'expr'

If I understand the problem correctly, can anyone tell me where
python.expr() is defined, or suggest a better method to fix this
problem?

Thanks,
Jeremiah

  

Generic answers, I have no experience with pymol

If pymol really needs that parser.py, you have a problem, as there can 
only be one module by that name in the application.  But assuming it's 
needed for some obscure feature that you don't need, you could try the 
following sequence.


1) temporarily rename the pymol's  parser.py  file to something else, 
like pymolparser.py, and see what runs.
2) rather than changing the PYTHONPATH, fix  up  sys.path during your 
script initialization.
   In particular, do animport parsernear the beginning of the 
script.  This gets it loaded, even though you might not need to use it 
from this module.
   After that import, then add the following line (which could be 
generalized later)

   sys.path.append( "/usr/lib/pymodules/python2.5/pymol")


If this works, then you can experiment a bit more, perhaps you don't 
need the extra import parser, just putting the pymol directory at the 
end of the sys.path rather than the beginning may be good enough.


If the parser.py in the pymol is actually needed, you might need to 
rename its internal references to some other name, like pymolparser.


HTH,
DaveA

--
http://mail.python.org/mailman/listinfo/python-list


Re: python parser overridden by pymol

2009-11-11 Thread Steven D'Aprano
On Wed, 11 Nov 2009 17:41:07 -0800, Jeremiah wrote:

> Hello,
> 
> I'm fairly new to python (version 2.5.4), and am writing a program which
> uses both pymol (version 1.2r1) and numpy (version 1.3.0) from debian.
> 
> It appears that when I add pymol to $PYTHONPATH, that parser.expr() is
> no longer available, and so I am unable to use numpy.load(). I have
> looked for where parser.expr() is defined in the python system so I
> could place that directory first in $PYTHONPATH, but I have not been
> able to find the file that defines expr().


>>> import parser
>>> parser.__file__
'/usr/lib/python2.5/lib-dynload/parsermodule.so'
>>> parser.expr




[...]
> However, a problem arises once $PYTHONPATH includes the pymol
> directory. To use the pymol api, I add the following to ~/.bashrc:
> 
>PYMOL_PATH=/usr/lib/pymodules/python2.5/pymol
>export PYMOL_PATH
>PYTHONPATH=$PYMOL_PATH
>export PYTHONPATH


Change that to 

PYMOL_PATH=/usr/lib/pymodules/python2.5


and it should work, assuming pymol uses a package, as it should. If it 
doesn't, if it's just a hodge-podge of loose modules in a directory, then 
they should be slapped with a wet fish for shadowing a standard library 
module.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python parser overridden by pymol

2009-11-11 Thread Robert Kern

Jeremiah wrote:


However, a problem arises once $PYTHONPATH includes the pymol
directory. To use the pymol api, I add the following to ~/.bashrc:

   PYMOL_PATH=/usr/lib/pymodules/python2.5/pymol
   export PYMOL_PATH
   PYTHONPATH=$PYMOL_PATH
   export PYTHONPATH


Don't change your PYTHONPATH like that. You want to put 
/usr/lib/pymodules/python2.5 onto your PYTHONPATH and import PyMOL's stuff from 
the pymol package. I.e., instead of


  import api

Do

  from pymol import api

pymol is a package for precisely this reason.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


python parser overridden by pymol

2009-11-11 Thread Jeremiah
Hello,

I'm fairly new to python (version 2.5.4), and am writing a program
which uses both pymol (version 1.2r1) and numpy (version 1.3.0) from
debian.

It appears that when I add pymol to $PYTHONPATH, that parser.expr() is
no longer available, and so I am unable to use numpy.load(). I have
looked for where parser.expr() is defined in the python system so I
could place that directory first in $PYTHONPATH, but I have not been
able to find the file that defines expr().

My reason for using numpy.load() is that I have a numpy array which
takes an hour to generate. Therefore, I'd like to use numpy.save() so
I could generate the array one time, and then load it later as needed
with numpy.load().

I've successfully tested the use of numpy.save() and numpy.load() with
a small example when the pymol path is not defined in $PYTHONPATH  :

   >>> import numpy
   >>> numpy.save('123',numpy.array([1,2,3]))
   >>> numpy.load('123.npy')
   array([1, 2, 3])


However, a problem arises once $PYTHONPATH includes the pymol
directory. To use the pymol api, I add the following to ~/.bashrc:

   PYMOL_PATH=/usr/lib/pymodules/python2.5/pymol
   export PYMOL_PATH
   PYTHONPATH=$PYMOL_PATH
   export PYTHONPATH

Once this is done, numpy.load() no longer works correctly, as pymol
contains a file named parser.py ( /usr/lib/pymodules/python2.5/pymol/
parser.py ), which apparently prevents python from using its native
parser.

   >>> numpy.load('123.npy')
   Traceback (most recent call last):
 File "", line 1, in 
 File "/usr/lib/python2.5/site-packages/numpy/lib/io.py", line
195, in load
   return format.read_array(fid)
 File "/usr/lib/python2.5/site-packages/numpy/lib/format.py",
line 353, in read_array
   shape, fortran_order, dtype = read_array_header_1_0(fp)
 File "/usr/lib/python2.5/site-packages/numpy/lib/format.py",
line 250, in read_array_header_1_0
   d = safe_eval(header)
 File "/usr/lib/python2.5/site-packages/numpy/lib/utils.py", line
840, in safe_eval
   ast = compiler.parse(source, "eval")
 File "/usr/lib/python2.5/compiler/transformer.py", line 54, in
parse
   return Transformer().parseexpr(buf)
 File "/usr/lib/python2.5/compiler/transformer.py", line 133, in
parseexpr
   return self.transform(parser.expr(text))
   AttributeError: 'module' object has no attribute 'expr'

If I understand the problem correctly, can anyone tell me where
python.expr() is defined, or suggest a better method to fix this
problem?

Thanks,
Jeremiah
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser

2009-03-04 Thread Alan G Isaac

Gabriel Genellina wrote:

Do you mean the simpleparser project in Sourceforge?



http://simpleparse.sourceforge.net/

I thought this to be one of the most famous
and useful Python parsers, because of its
combination of simplicity and speed.
Anyway, it is *very* good, and not having
a version for 2.6 is quite unfortunate.

Alan
--
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser

2009-03-04 Thread andrew cooke
Kay Schluehr wrote:
> You'll most likely need a GLR parser.

i'm not sure why you think this.  as far as i can tell, the OP needs a
parser that is suitable for whatever grammar they find (and the grammar
will probably be written for a particular parser, which may not be GLR).

however, if you are saying that only a GLR parser can parse natural
languages then i think you are wrong.  not only can grammars be rewritten
in different ways, but a recursive descent parser with appropriate
memoisation is capable of parsing "any" grammar.

see, for example, the second example at
http://www.acooke.org/lepl/advanced.html#memoisation - that is a
left-recursive, highly ambiguous grammar, and is parsed successfully with
recursive descent (as far as i can tell).  for more info see
http://www.acooke.org/lepl/implementation.html#memoisation and
http://www.cs.uwindsor.ca/~hafiz/p46-frost.pdf

in theory (if not currently in practice for my code, at least) it is also
efficient (in a "big O" sense).

disclaimer - the lepl parser linked to is my own and that functionality is
very new (there's a beta version released, but it is buggy).  however,
that doesn't mean this is not possible

andrew


--
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser

2009-03-03 Thread Kay Schluehr
On 2 Mrz., 23:14, Clarendon  wrote:
> Thank you, Lie and Andrew for your help.
>
> I have studied NLTK quite closely but its parsers seem to be only for
> demo. It has a very limited grammar set, and even a parser that is
> supposed to be "large" does not have enough grammar to cover common
> words like "I".
>
> I need to parse a large amount of texts collected from the web (around
> a couple hundred sentences at a time) very quickly, so I need a parser
> with a broad scope of grammar, enough to cover all these texts. This
> is what I mean by 'random'.
>
> An advanced programmer has advised me that Python is rather slow in
> processing large data, and so there are not many parsers written in
> Python. He recommends that I use Jython to use parsers written in
> Java. What are your views about this?
>
> Thank you very much.

You'll most likely need a GLR parser.

There is

http://www.lava.net/~newsham/pyggy/

which I tried once and found it to be broken.

Then there is the Spark toolkit

http://pages.cpsc.ucalgary.ca/~aycock/spark/

I checked it out years ago and found it was very slow.

Then there is bison which can be used with a %glr-parser declaration
and PyBison bindings

http://www.freenet.org.nz/python/pybison/

Bison might be solid and fast. I can't say anything about the quality
of the bindings though.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser

2009-03-03 Thread Gabriel Genellina
En Tue, 03 Mar 2009 22:39:19 -0200, Alan G Isaac   
escribió:



This reminds me: the SimpleParse developers ran into
some troubles porting to Python 2.6.  It would be
great if someone could give them a hand.


Do you mean the simpleparser project in Sourceforge? Latest alpha released  
in 2003? Or what?


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser

2009-03-03 Thread Alan G Isaac

This reminds me: the SimpleParse developers ran into
some troubles porting to Python 2.6.  It would be
great if someone could give them a hand.

Alan Isaac
--
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser

2009-03-02 Thread andrew cooke
Clarendon wrote:
[...]
> I need to parse a large amount of texts collected from the web (around
> a couple hundred sentences at a time) very quickly, so I need a parser
> with a broad scope of grammar, enough to cover all these texts. This
> is what I mean by 'random'.

so the most important things are that (1) the grammar be as large as
possible and (2) the parser be as fast as possible.  for something that
specific i would suggest you start by looking at what solutions exist for
*any* programming language and then choosing from what you find.

in short: you should be asking "natural language parsing people" and not
"python people".

sorry i can't be more help,
andrew


--
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser

2009-03-02 Thread Robert Kern

On 2009-03-02 16:14, Clarendon wrote:

Thank you, Lie and Andrew for your help.

I have studied NLTK quite closely but its parsers seem to be only for
demo. It has a very limited grammar set, and even a parser that is
supposed to be "large" does not have enough grammar to cover common
words like "I".

I need to parse a large amount of texts collected from the web (around
a couple hundred sentences at a time) very quickly, so I need a parser
with a broad scope of grammar, enough to cover all these texts. This
is what I mean by 'random'.

An advanced programmer has advised me that Python is rather slow in
processing large data, and so there are not many parsers written in
Python. He recommends that I use Jython to use parsers written in
Java. What are your views about this?


Let me clarify your request: you are asking for a parser of the English 
language, yes? Not just parsers in general? Not many English-language parsers 
are written in *any* language.


AFAIK, there is no English-language parser written in Python beyond those 
available in NLTK. There are probably none (in any language) which will robustly 
parse all of the grammatically correct English texts you will encounter by 
scraping the web, much less all of the incorrect English you will encounter.


Python can be rather slow for certain kinds of processing of large volumes (and 
really quite speedy for others). In this case, it's neither here nor there; the 
algorithms are reasonably slow in any language.


You may try your luck with link-grammar, which is implemented in C:

  http://www.abisource.com/projects/link-grammar/

Or The Stanford Parser, implemented in Java:

  http://nlp.stanford.edu/software/lex-parser.shtml

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser

2009-03-02 Thread Clarendon
Thank you, Lie and Andrew for your help.

I have studied NLTK quite closely but its parsers seem to be only for
demo. It has a very limited grammar set, and even a parser that is
supposed to be "large" does not have enough grammar to cover common
words like "I".

I need to parse a large amount of texts collected from the web (around
a couple hundred sentences at a time) very quickly, so I need a parser
with a broad scope of grammar, enough to cover all these texts. This
is what I mean by 'random'.

An advanced programmer has advised me that Python is rather slow in
processing large data, and so there are not many parsers written in
Python. He recommends that I use Jython to use parsers written in
Java. What are your views about this?

Thank you very much.



--
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser

2009-03-02 Thread andrew cooke

if this is for natural language texts you may want to look at
http://www.nltk.org/

andrew

Clarendon wrote:
> Can somebody recommend a good parser that can be used in Python
> programs? I need a parser with large grammar that can cover a large
> amount of random texts.
>
> Thank you very much.
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>


--
http://mail.python.org/mailman/listinfo/python-list


Re: Python parser

2009-03-02 Thread Lie Ryan
Clarendon wrote:
> Can somebody recommend a good parser that can be used in Python
> programs?

Do you want parser that can parse python source code or parser that
works in python? If the latter, pyparsing is a popular choice. Ply is
another. There are many choice:
http://nedbatchelder.com/text/python-parsers.html

For simple parsing, the re module might be enough.

> I need a parser with large grammar that can cover a large
> amount of random texts.

Random text? Uh... what's the purpose of parsing random text?
--
http://mail.python.org/mailman/listinfo/python-list


Python parser

2009-03-02 Thread Clarendon
Can somebody recommend a good parser that can be used in Python
programs? I need a parser with large grammar that can cover a large
amount of random texts.

Thank you very much.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Help needed with controlling the python parser

2006-01-10 Thread André
vinjvinj wrote:
> I use python to script my application. Users will be able to write
> their own python scripts which are then run a grid of servers. I want
> to be able to capture syntax errors in submitted users scripts and then
> display them (with line numbers) back to the user.

I was going to wait for other, familiar with pylint, etc. to answer,
but it seems like no-one is attempting to answer your query.  So, here
goes a poor attempt ;-)

>From all I have heard, if you are going to be concerned about safety,
you are pretty much out of luck.  However, assuming you still want to
try it, here's one potential way (untested way) to do it:

.try:
.exec self.code in MyGlobals  # Define your own dictionary for
added safety...
.except Exception, info:
.if "invalid syntax" in info:   # to catch it and change the
default message
.linenumber = info[1][1]
.print "An error was found on (or before) line: %d"%info[1][1]

>
> I also want to check for obvious things which I'm going to restrict in
> the code. Initially I would like to disallow any imports, and access to
> __* access. I understand that it is near impossible to make the scripts
> run in a completely restricted env.
You could try something like the following untested function:

.def ParseProgram(contents):
.bad_keywords = ["chr", "exec", "eval", "input", "raw_input",
"import"]
.for word in bad_keywords:
.if word in contents:
.mesg = "Keyword or function not allowed:" + str(word)
.return False, mesg
.return True, ''

I would augment it with a regular expression to catch "__*".  [This is
left as an exercise to the reader ;-)]

>
> Is scripting a tool like pylint the way to go? Or is it fairly easy to
> control the python parser to do this?
>
I don't know what pylint can do for you in that regard.
As far as I know, it is near impossible to ensure that you can restrict
a determined user from doing nasty stuff.

> Thanks,
> 
> VJ
André

-- 
http://mail.python.org/mailman/listinfo/python-list


Help needed with controlling the python parser

2006-01-10 Thread vinjvinj
I use python to script my application. Users will be able to write
their own python scripts which are then run a grid of servers. I want
to be able to capture syntax errors in submitted users scripts and then
display them (with line numbers) back to the user.

I also want to check for obvious things which I'm going to restrict in
the code. Initially I would like to disallow any imports, and access to
__* access. I understand that it is near impossible to make the scripts
run in a completely restricted env.

Is scripting a tool like pylint the way to go? Or is it fairly easy to
control the python parser to do this?

Thanks,

VJ

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: seek python parser

2005-08-07 Thread Paul Boddie
I tend to use the compiler module:

http://docs.python.org/lib/compiler.html

With the output of the parsing functions I selectively inspect the AST
nodes using the named attributes listed in the documentation:

http://docs.python.org/lib/module-compiler.ast.html

For cases where one just needs to traverse the child nodes of an AST
node, the getChildNodes method is probably more helpful than the
getChildren method, which returns things like name strings and other
miscellaneous information mixed up with the list of nodes.

If you want a genuine XML DOM version of a Python AST, the following
project might provide you with a solution:

http://pysch.sourceforge.net/ast.html

See also:

http://uucode.com/texts/genxml/genxml.html

Paul

-- 
http://mail.python.org/mailman/listinfo/python-list


seek python parser

2005-08-06 Thread Evil Bastard
Hi,

Does anyone know of a python source parser program which can return the
parsed source file as a coherent dom-like data structure?

I'm playing around with ast, and the output of ast.tolist(), but it's
got a lot of chaff, and would need a lot of hacking to break it down to
simple data structures

All recommendations appreciated

-- 
Cheers
EB

--

One who is not a conservative by age 20 has no brain.
One who is not a liberal by age 40 has no heart.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python parser

2005-07-12 Thread Bengt Richter
On Tue, 12 Jul 2005 13:30:14 -0700, Robert Kern <[EMAIL PROTECTED]> wrote:

>tuxlover wrote:
>> Hello everyone
>> 
>> I have to write a verilog parser in python for a class project. I was
>> wondering if all you folks could advise me on choosing the right python
>> parser module. I am not comfortable with lex/yacc and as a result find
>> myself strugging with any module which use lex/yacc syntax/philosophy.
>> pyparser looks good to me, but before I dive into it, I would really
>> appreciate feedback from members of this group
>
>A Verilog parser has been written using pyparsing at least once before, 
>so I imagine that it shouldn't be too difficult to do so again. Of 
>course, if you just need *a* Verilog parser, not necessarily one written 
>by you, you could just email the guy who wrote it and ask him for a 
>copy. Grep
>
>   http://pyparsing.sourceforge.net/
>
>for "Verilog".
>
or google for
verilog site:sourceforge.net

BTW googling for
verilog site:pyparsing.sourceforge.net
will only get one hit (maybe less if I typoed again ;-)

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python parser

2005-07-12 Thread Robert Kern
tuxlover wrote:
> Hello everyone
> 
> I have to write a verilog parser in python for a class project. I was
> wondering if all you folks could advise me on choosing the right python
> parser module. I am not comfortable with lex/yacc and as a result find
> myself strugging with any module which use lex/yacc syntax/philosophy.
> pyparser looks good to me, but before I dive into it, I would really
> appreciate feedback from members of this group

A Verilog parser has been written using pyparsing at least once before, 
so I imagine that it shouldn't be too difficult to do so again. Of 
course, if you just need *a* Verilog parser, not necessarily one written 
by you, you could just email the guy who wrote it and ask him for a 
copy. Grep

   http://pyparsing.sourceforge.net/

for "Verilog".

-- 
Robert Kern
[EMAIL PROTECTED]

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python parser

2005-07-12 Thread matt
I recently was successful using pyparsing after messing around with ply
for a few hours.  See my blog for more details (
http://panela.blog-city.com/icfp_contest_implementation_in_python_notes.htm
).

I personally corresponded with the author and he was very helpful as
well, giving my useful critiques and feedback.  The next time I'm
parsing something more complex than a tab-delimited file (excluding xml
:)) I'll probably use pyparsing.  I found it very pythonic and easy to
use.

good luck parsing...
matt

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python parser

2005-07-12 Thread Christopher Subich
tuxlover wrote:
> I have to write a verilog parser in python for a class project. I was
> wondering if all you folks could advise me on choosing the right python
> parser module. I am not comfortable with lex/yacc and as a result find
> myself strugging with any module which use lex/yacc syntax/philosophy.
> pyparser looks good to me, but before I dive into it, I would really
> appreciate feedback from members of this group

I've had good luck with DParser for Python 
(http://staff.washington.edu/sabbey/dy_parser/index.html); in fact, it 
might even be a very easy translation from a premade Verilog grammar to 
a DParser grammar (Google search if you don't have BNF for Verilog already).

Two caevats come to mind, though; documentation isn't as newbie-friendly 
as it could be, and DParser requires a binary library -- it's not 
Python-only, which might matter for your project.
-- 
http://mail.python.org/mailman/listinfo/python-list


python parser

2005-07-12 Thread tuxlover
Hello everyone

I have to write a verilog parser in python for a class project. I was
wondering if all you folks could advise me on choosing the right python
parser module. I am not comfortable with lex/yacc and as a result find
myself strugging with any module which use lex/yacc syntax/philosophy.
pyparser looks good to me, but before I dive into it, I would really
appreciate feedback from members of this group

Thanks
Tuxlover

-- 
http://mail.python.org/mailman/listinfo/python-list