[Tutor] Python Websocket Server

2019-02-14 Thread Simon Connah

Hi,

I was wondering what the best practice for writing web socket servers in 
Python was in 2019? I found an old example on the web which used the 
tornado library but that was talking about Chrome 22 as the client which 
is ancient now so I'm not sure if things have changed?


Any suggestions on the best library to use would be greatfully accepted. 
I'd ideally like to be able to do it in an asynchronous manner.


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Extract main text from HTML document

2018-05-07 Thread Simon Connah
That looks like a useful combination. Thanks.

On 6 May 2018 at 17:32, Mark Lawrence  wrote:
> On 05/05/18 18:59, Simon Connah wrote:
>>
>> Hi,
>>
>> I'm writing a very simple web scraper. It'll download a page from a
>> website and then store the result in a database of some sort. The
>> problem is that this will obviously include a whole heap of HTML,
>> JavaScript and maybe even some CSS. None of which is useful to me.
>>
>> I was wondering if there was a way in which I could download a web
>> page and then just extract the main body of text without all of the
>> HTML.
>>
>> The title is obviously easy but the main body of text could contain
>> all sorts of HTML and I'm interested to know how I might go about
>> removing the bits that are not needed but still keep the meaning of
>> the document intact.
>>
>> Does anyone have any suggestions on this front at all?
>>
>> Thanks for any help.
>>
>> Simon.
>
>
> A combination of requests http://docs.python-requests.org/en/master/ and
> beautiful soup https://www.crummy.com/software/BeautifulSoup/bs4/doc/ should
> fit the bill.  Both are installable with pip and are regarded as best of
> breed.
>
> --
> My fellow Pythonistas, ask not what our language can do for you, ask
> what you can do for our language.
>
> Mark Lawrence
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Extract main text from HTML document

2018-05-06 Thread Simon Connah
Thanks for the replies, everyone. Beautiful Soup looks like a good option.

My primary goal is to extract the main body text, the title and the
meta description from a web page and run it through one of the cloud
Natural Language processing services to find out some information that
I'd like to know and I'd like to do it to quite a few websites.

This is all for a little project I have in mind. I'm not even sure if
it'll work but it'll be fun to try. I might have to do some custom
work on top of what Beautiful Soup offers though as I need to get very
specific data in a certain format.

On 5 May 2018 at 22:43, boB Stepp  wrote:
> On Sat, May 5, 2018 at 12:59 PM, Simon Connah  wrote:
>
>> I was wondering if there was a way in which I could download a web
>> page and then just extract the main body of text without all of the
>> HTML.
>
> I do not have any experience with this, but I like to collect books.
> One of them [1] says on page 245:
>
> "Beautiful Soup is a module for extracting information from an HTML
> page (and is much better for this purpose than regular expressions)."
>
> I believe this topic has come up before on this list as well as the
> main Python list.  You may want to check it out.  It can be installed
> with pip.
>
> [1] "Automate the Boring Stuff with Python -- Practical Programming
> for Total Beginners" by Al Sweigart.
>
> HTH!
> --
> boB
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Extract main text from HTML document

2018-05-05 Thread Simon Connah
Hi,

I'm writing a very simple web scraper. It'll download a page from a
website and then store the result in a database of some sort. The
problem is that this will obviously include a whole heap of HTML,
JavaScript and maybe even some CSS. None of which is useful to me.

I was wondering if there was a way in which I could download a web
page and then just extract the main body of text without all of the
HTML.

The title is obviously easy but the main body of text could contain
all sorts of HTML and I'm interested to know how I might go about
removing the bits that are not needed but still keep the meaning of
the document intact.

Does anyone have any suggestions on this front at all?

Thanks for any help.

Simon.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Async TCP Server

2018-04-25 Thread Simon Connah
Hi,

I've come up with an idea for a new protocol I want to implement in
Python using 3.6 (or maybe 3.7 when that comes out), but I'm somewhat
confused about how to do it in an async way.

The way I understand it is that you have a loop that waits for an
incoming request and then calls a function/method asynchronously which
handles the incoming request. While that is happening the main event
loop is still listening for incoming connections.

Is that roughly correct?

The idea is to have a chat application that can at least handle a few
hundred clients if not more in the future. I'm planning on using
Python because I am pretty up-to-date with it, but I've never written
a network server before.

Also another quick question. Does Python support async database
operations? I'm thinking of the psycopg2-binary database driver. That
way I can offload the storage in the database while still handling
incoming connections.

If I have misunderstood anything, any clarification would be much appreciated.

Simon.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Proper way to unit test the raising of exceptions?

2018-04-01 Thread Simon Connah via Tutor
 Awesome. Thank you all. Your solutions are great and should make the whole 
process a lot more simple.
The only problem is that some_func() on my end is Django model with about 8 
named arguments so it might be a bit of a pain passing all of those arguments.
The context manager example seems like a perfect fit for that particular 
problem.
Thanks again. All of your help is much appreciated.
On Sunday, 1 April 2018, 16:32:11 BST, Mats Wichmann  
wrote:  
 
 On 04/01/2018 09:10 AM, Peter Otten wrote:
> Simon Connah via Tutor wrote:
> 
>> Hi,
>> I'm just wondering what the accepted way to handle unit testing exceptions
>> is? I know you are meant to use assertRaises, but my code seems a little
>> off. 
> 
>> try:
>>    some_func()
>> except SomeException:  
>>    self.assertRaises(SomeException) 
> 
> The logic is wrong here as you surmise below. If you catch the exception 
> explicitly you have to write
> 
> try:
>    some_func()
> except SomeException:
>    pass  # OK
> else:
>    self.fail("no exception raised")


If you use PyTest, the procedure is pretty well documented:

https://docs.pytest.org/en/latest/assert.html

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Proper way to unit test the raising of exceptions?

2018-04-01 Thread Simon Connah via Tutor
Hi,
I'm just wondering what the accepted way to handle unit testing exceptions is? 
I know you are meant to use assertRaises, but my code seems a little off.
try:    some_func()
except SomeException:    self.assertRaises(SomeException)
Is there a better way to do this at all? The problem with the above code is 
that if no exception is raised the code completely skips the except block and 
that would mean that the unit test would pass so, I considered adding:
self.fail('No exception raised')
at the end outside of the except block but not sure if that is what I need to 
do.
Any help is appreciated.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Does the secrets module in Python 3.6 use a hardware RNG like that provided in Intel CPUs?

2018-03-09 Thread Simon Connah via Tutor
Hi,
I was reading through the secrets documentation in Python 3.6 and noticed that 
it uses /dev/urandom but I'm unsure if that means it'll use a hardware RNG or 
just one provided by the operating system (Linux / Windows / etc) in software. 
The question is is it possible to determine the source of the randomness from 
os.urandom if there was ever a flaw found in a particular hardware RNG? Plus 
systems could have a third party hardware RNG that was an external addon card 
or similar which might be better than the one found in Intel CPUs.
I'm just a bit curious about the whole "will always use the strongest source 
for pseudo-random numbers" when research could change that assumption overnight 
based on discovered flaws.
This is probably a really stupid question and if it is I apologise but I'm 
somewhat confused.
Thanks for any help.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor