Re: [Tutor] Python C extension - which method?

2018-05-05 Thread Stefan Behnel
Hi,

Brad M schrieb am 04.05.2018 um 11:30:
> I want to create a C-based memory scanner for Python, and so far this is
> how I do it:
> 
> Python:
> 
> from ctypes import cdll
> mydll = cdll.LoadLibrary('hello.dll')
> print(mydll.say_something())
> 
> and hello.dll:
> 
> #include 
> __declspec(dllexport) int say_something()
> {
> return 1980;
> }
> 
> so the printout is "1980"
> 
> Is this alright?


Depends on your needs and your C/C++ knowledge.

If you have a shared library that provides the ready-made functionality,
and accessing that native code at all is more important than calling it
very quickly (e.g. you only do a few longish-running calls into it), then
wrapping a shared library with ctypes (or preferably cffi) is a good way to
do it.

Otherwise, try either a native wrapper generator like pybind11, or write
your wrapper in Cython.

Specifically, if you are not just calling into an external library 1:1, but
need to do (or can benefit from doing) non-trivial operations in native
code, definitely use Cython.

http://cython.org


> I am aware that there is another much more complicated
> method such as this:
> 
> https://tutorialedge.net/python/python-c-extensions-tutorial/#building-and-installing-our-module

Well, yes, it exists, but I advise against wrapping C code manually that
way. It's just too cumbersome and error prone. Leave it to the experts who
have already written their tools for you.

Stefan


Disclosure: I'm a Cython core dev, so I'm biased and I absolutely know what
I'm talking about.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Extract main text from HTML document

2018-05-05 Thread boB Stepp
On Sat, May 5, 2018 at 12:59 PM, Simon Connah  wrote:

> I was wondering if there was a way in which I could download a web
> page and then just extract the main body of text without all of the
> HTML.

I do not have any experience with this, but I like to collect books.
One of them [1] says on page 245:

"Beautiful Soup is a module for extracting information from an HTML
page (and is much better for this purpose than regular expressions)."

I believe this topic has come up before on this list as well as the
main Python list.  You may want to check it out.  It can be installed
with pip.

[1] "Automate the Boring Stuff with Python -- Practical Programming
for Total Beginners" by Al Sweigart.

HTH!
-- 
boB
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Extract main text from HTML document

2018-05-05 Thread Mats Wichmann
On 05/05/2018 11:59 AM, Simon Connah wrote:
> Hi,
> 
> I'm writing a very simple web scraper. It'll download a page from a
> website and then store the result in a database of some sort. The
> problem is that this will obviously include a whole heap of HTML,
> JavaScript and maybe even some CSS. None of which is useful to me.
> 
> I was wondering if there was a way in which I could download a web
> page and then just extract the main body of text without all of the
> HTML.
> 
> The title is obviously easy but the main body of text could contain
> all sorts of HTML and I'm interested to know how I might go about
> removing the bits that are not needed but still keep the meaning of
> the document intact.
> 
> Does anyone have any suggestions on this front at all?

there's so much prior art in this space it's not really worth
reinventing this, unless you're using it as an exercise to teach
yourself more Python (always a worth goal!)

Here's one guy's summary of _some_ of the existing practice, albeit
probably the best known.

https://elitedatascience.com/python-web-scraping-libraries


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Extract main text from HTML document

2018-05-05 Thread Simon Connah
Hi,

I'm writing a very simple web scraper. It'll download a page from a
website and then store the result in a database of some sort. The
problem is that this will obviously include a whole heap of HTML,
JavaScript and maybe even some CSS. None of which is useful to me.

I was wondering if there was a way in which I could download a web
page and then just extract the main body of text without all of the
HTML.

The title is obviously easy but the main body of text could contain
all sorts of HTML and I'm interested to know how I might go about
removing the bits that are not needed but still keep the meaning of
the document intact.

Does anyone have any suggestions on this front at all?

Thanks for any help.

Simon.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Figuring out selective actions in Python

2018-05-05 Thread Daniel Bosah
Hello,

I'm trying to figure out how to do blank in blank things. For example, if I
want to delete 5 MB ( or anything ) for every 20 MB, how would the could
look like? I'm essentially trying to do an action in one order of the
sequence out of an entire sequence.


Thank you for your help
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Python C extension - which method?

2018-05-05 Thread Brad M
Hi all:

I want to create a C-based memory scanner for Python, and so far this is
how I do it:

Python:

from ctypes import cdll
mydll = cdll.LoadLibrary('hello.dll')
print(mydll.say_something())



and hello.dll:

#include 
__declspec(dllexport) int say_something()
{
return 1980;
}


so the printout is "1980"


Is this alright? I am aware that there is another much more complicated
method such as this:

https://tutorialedge.net/python/python-c-extensions-tutorial/#building-and-installing-our-module



Is my method alright?

Thanks!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor