Re: [Tutor] Python C extension - which method?
Hi, Brad M schrieb am 04.05.2018 um 11:30: > I want to create a C-based memory scanner for Python, and so far this is > how I do it: > > Python: > > from ctypes import cdll > mydll = cdll.LoadLibrary('hello.dll') > print(mydll.say_something()) > > and hello.dll: > > #include > __declspec(dllexport) int say_something() > { > return 1980; > } > > so the printout is "1980" > > Is this alright? Depends on your needs and your C/C++ knowledge. If you have a shared library that provides the ready-made functionality, and accessing that native code at all is more important than calling it very quickly (e.g. you only do a few longish-running calls into it), then wrapping a shared library with ctypes (or preferably cffi) is a good way to do it. Otherwise, try either a native wrapper generator like pybind11, or write your wrapper in Cython. Specifically, if you are not just calling into an external library 1:1, but need to do (or can benefit from doing) non-trivial operations in native code, definitely use Cython. http://cython.org > I am aware that there is another much more complicated > method such as this: > > https://tutorialedge.net/python/python-c-extensions-tutorial/#building-and-installing-our-module Well, yes, it exists, but I advise against wrapping C code manually that way. It's just too cumbersome and error prone. Leave it to the experts who have already written their tools for you. Stefan Disclosure: I'm a Cython core dev, so I'm biased and I absolutely know what I'm talking about. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Extract main text from HTML document
On Sat, May 5, 2018 at 12:59 PM, Simon Connahwrote: > I was wondering if there was a way in which I could download a web > page and then just extract the main body of text without all of the > HTML. I do not have any experience with this, but I like to collect books. One of them [1] says on page 245: "Beautiful Soup is a module for extracting information from an HTML page (and is much better for this purpose than regular expressions)." I believe this topic has come up before on this list as well as the main Python list. You may want to check it out. It can be installed with pip. [1] "Automate the Boring Stuff with Python -- Practical Programming for Total Beginners" by Al Sweigart. HTH! -- boB ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Extract main text from HTML document
On 05/05/2018 11:59 AM, Simon Connah wrote: > Hi, > > I'm writing a very simple web scraper. It'll download a page from a > website and then store the result in a database of some sort. The > problem is that this will obviously include a whole heap of HTML, > JavaScript and maybe even some CSS. None of which is useful to me. > > I was wondering if there was a way in which I could download a web > page and then just extract the main body of text without all of the > HTML. > > The title is obviously easy but the main body of text could contain > all sorts of HTML and I'm interested to know how I might go about > removing the bits that are not needed but still keep the meaning of > the document intact. > > Does anyone have any suggestions on this front at all? there's so much prior art in this space it's not really worth reinventing this, unless you're using it as an exercise to teach yourself more Python (always a worth goal!) Here's one guy's summary of _some_ of the existing practice, albeit probably the best known. https://elitedatascience.com/python-web-scraping-libraries ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Extract main text from HTML document
Hi, I'm writing a very simple web scraper. It'll download a page from a website and then store the result in a database of some sort. The problem is that this will obviously include a whole heap of HTML, JavaScript and maybe even some CSS. None of which is useful to me. I was wondering if there was a way in which I could download a web page and then just extract the main body of text without all of the HTML. The title is obviously easy but the main body of text could contain all sorts of HTML and I'm interested to know how I might go about removing the bits that are not needed but still keep the meaning of the document intact. Does anyone have any suggestions on this front at all? Thanks for any help. Simon. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Figuring out selective actions in Python
Hello, I'm trying to figure out how to do blank in blank things. For example, if I want to delete 5 MB ( or anything ) for every 20 MB, how would the could look like? I'm essentially trying to do an action in one order of the sequence out of an entire sequence. Thank you for your help ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Python C extension - which method?
Hi all: I want to create a C-based memory scanner for Python, and so far this is how I do it: Python: from ctypes import cdll mydll = cdll.LoadLibrary('hello.dll') print(mydll.say_something()) and hello.dll: #include __declspec(dllexport) int say_something() { return 1980; } so the printout is "1980" Is this alright? I am aware that there is another much more complicated method such as this: https://tutorialedge.net/python/python-c-extensions-tutorial/#building-and-installing-our-module Is my method alright? Thanks! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor