[Tutor] extracting informations (images and text) from a PDF and creating a database from it
I need to make a database from some PDFs. I need to extract logos as well as the information (i.e. name,address) beneath the logo and fill it up in database. The logo can be text as well as picture as shown in two of the screenshots of one of the sample PDF file: http://imagebin.org/77378 http://imagebin.org/77379 Will converting to html a good option? Later on I need to apply some image processing too. What should be the ideal way towards it ? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] computer basics
I am learning Python slowly. I would like to begin learning all about how computers work from the bottom up. I have an understanding of binary code. Where should I go from here; can you suggest continued reading, on line or off to continue my education? Thank You Richard ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python and Computational Geometry
Googling "python computational geometry" points to http://www.cgal.org/ and http://cgal-python.gforge.inria.fr/ Kent On Mon, Dec 28, 2009 at 6:13 PM, Abdulhafid Igor Ryabchuk wrote: > Dear Pythonistas, > > I am starting a small project that centres around implementation of > computational geometry algorithms. I was wondering if there are any > particular Python modules I should have a look at. > > Regards, > > AH > ___ > Tutor maillist - tu...@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Python and Computational Geometry
Dear Pythonistas, I am starting a small project that centres around implementation of computational geometry algorithms. I was wondering if there are any particular Python modules I should have a look at. Regards, AH ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] using mechanize to authenticate and pull data out of site
hello, thank you all for the replies. On Mon, Dec 28, 2009 at 10:21 AM, Rich Lovely wrote: > 2009/12/26 Norman Khine : >> Hello, >> >> I am trying to authenticate on http://commerce.sage.com/Solidarmonde/ >> using urllib but have a problem in that there are some hidden fields >> that use javascript to create a security token which then is passed to >> the submit button and to the header. >> >> Here is the output of the LiveHeader during authentication >> >> http://paste.lisp.org/display/92656 >> >> Here is what I have so far: >> >> http://paste.lisp.org/+1ZHS/1 >> > print results >> But the page returned prints out that the session is out of time. >> >> Here are details of the forms: >> >> http://paste.lisp.org/+1ZHS/2 >> >> Any help much appreciated. >> >> Norman >> ___ >> Tutor maillist - tu...@python.org >> To unsubscribe or change subscription options: >> http://mail.python.org/mailman/listinfo/tutor >> > > The first thing to try is to attempt to login with javascript > disabled. If it will let you do that, transfer the relevant form info > to the mechanize browser, and it should be fine. It does not work, i need javascript enabled in order to login. > > If not, you will need to look through all of the javascript files, to > find out which one generates/receives the security token. Looking at > it, the element will be called "_xmlToken". Looking at the javascript - http://paste.lisp.org/+1ZHS/4 the 'function browser_localForm_form_onsubmit' has contextKey that is passed to it. i think the verification between the two tokens comes: securityToken = _browser.getElement("_xmlToken"); document.localForm.__sgx_contextSecurity.value = securityToken.value; also there seems to be a lot of hash keys being generated at the begining of the javascripts, here are some examples: http://paste.lisp.org/+1ZHS/3 > > The "xml" suggests that it might be received over ajax, which means > you will need to find the page that it comes from, and fake an ajax > request to it - fortunately, this is just a simple http request, much > like you are already doing - it's just handled under the surface by > javascript. how would i fake the ajax before i submit the form everything seems to come form this page /solidarmonde/defaultsgx.asp thanks > > -- > Rich "Roadie Rich" Lovely > > There are 10 types of people in the world: those who know binary, > those who do not, and those who are off by one. > -- %>>> "".join( [ {'*':'@','^':'.'}.get(c,None) or chr(97+(ord(c)-83)%26) for c in ",adym,*)&uzq^zqf" ] ) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] using mechanize to authenticate and pull data out of site
2009/12/26 Norman Khine : > Hello, > > I am trying to authenticate on http://commerce.sage.com/Solidarmonde/ > using urllib but have a problem in that there are some hidden fields > that use javascript to create a security token which then is passed to > the submit button and to the header. > > Here is the output of the LiveHeader during authentication > > http://paste.lisp.org/display/92656 > > Here is what I have so far: > > http://paste.lisp.org/+1ZHS/1 > print results > But the page returned prints out that the session is out of time. > > Here are details of the forms: > > http://paste.lisp.org/+1ZHS/2 > > Any help much appreciated. > > Norman > ___ > Tutor maillist - tu...@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > The first thing to try is to attempt to login with javascript disabled. If it will let you do that, transfer the relevant form info to the mechanize browser, and it should be fine. If not, you will need to look through all of the javascript files, to find out which one generates/receives the security token. Looking at it, the element will be called "_xmlToken". The "xml" suggests that it might be received over ajax, which means you will need to find the page that it comes from, and fake an ajax request to it - fortunately, this is just a simple http request, much like you are already doing - it's just handled under the surface by javascript. -- Rich "Roadie Rich" Lovely There are 10 types of people in the world: those who know binary, those who do not, and those who are off by one. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor