Re: Need advice on writing better test cases.

2017-08-27 Thread Ben Finney
Anubhav Yadav writes: > I want to write more test cases, specially that rely on database > insertions and reads and file IO. Thanks for taking seriously the importance of test cases for your code! One important thing to recognise is that a unit test is only one type of test. It tests one unit o

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list
Ah, shoot me. I had a .join() statement on the output queue but not on in the input queue. So the threads for the input queue got terminated before BeautifulSoup could get started. I went down that same rabbit hole with CSVWriter the other day. *sigh* Thanks for everyone's help. Chris R. -- h

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Peter Otten
Christopher Reimer via Python-list wrote: > On 8/27/2017 1:31 PM, Peter Otten wrote: > >> Here's a simple example that extracts titles from generated html. It >> seems to work. Does it resemble what you do? > Your example is similar to my code when I'm using a list for the input > to the parser.

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Paul Rubin
Christopher Reimer writes: > I have 20 read_threads requesting and putting pages into the output > queue that is the input_queue for the parser. Given how slow parsing is, you probably want to scrap the pages into disk files, and then run the parser in parallel processes that read from the disk.

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list
On 8/27/2017 1:50 PM, MRAB wrote: What if you don't sort the list? I ask because it sounds like you're changing 2 variables (i.e. list->queue, sorted->unsorted) at the same time, so you can't be sure that it's the queue that's the problem. If I'm using a list, I'm using a for loop to input ite

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list
On 8/27/2017 1:31 PM, Peter Otten wrote: Here's a simple example that extracts titles from generated html. It seems to work. Does it resemble what you do? Your example is similar to my code when I'm using a list for the input to the parser. You have soup_threads and write_threads, but no read_t

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread MRAB
On 2017-08-27 21:35, Christopher Reimer via Python-list wrote: On 8/27/2017 1:12 PM, MRAB wrote: What do you mean by "queue (random order)"? A queue is sequential order, first-in-first-out. With 20 threads requesting 20 different pages, they're not going into the queue in sequential order (i

Re: Express thanks

2017-08-27 Thread Abdur-Rahmaan Janhangeer
hi, liking py, i follow py discuss at pretty some places, i can say that upto now, py mailing lists are awesome just make a drop on irc ... Keep it up guys ! Abdur-Rahmaan Janhangeer, Mauritius abdurrahmaanjanhangeer.wordpress.com On 21 Aug 2017 18:38, "Hamish MacDonald" wrote: I wanted to

Need advice on writing better test cases.

2017-08-27 Thread Anubhav Yadav
Hello, I am a (self-learned) python developer and I write a lot of python code everyday. I try to do as much unit testing as possible. But I want to be better at it, I want to write more test cases, specially that rely on database insertions and reads and file IO. Here are my use-cases for test

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list
On 8/27/2017 1:12 PM, MRAB wrote: What do you mean by "queue (random order)"? A queue is sequential order, first-in-first-out. With 20 threads requesting 20 different pages, they're not going into the queue in sequential order (i.e., 0, 1, 2, ..., 17, 18, 19) and coming in at different time

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Peter Otten
Christopher Reimer via Python-list wrote: > On 8/27/2017 11:54 AM, Peter Otten wrote: > >> The documentation >> >> https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup >> >> says you can make the BeautifulSoup object from a string or file. >> Can you give a few more details wher

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread MRAB
On 2017-08-27 20:35, Christopher Reimer via Python-list wrote: On 8/27/2017 11:54 AM, Peter Otten wrote: The documentation https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup says you can make the BeautifulSoup object from a string or file. Can you give a few more details w

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list
On 8/27/2017 11:54 AM, Peter Otten wrote: The documentation https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup says you can make the BeautifulSoup object from a string or file. Can you give a few more details where the queue comes into play? A small code sample would be ide

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Peter Otten
Christopher Reimer via Python-list wrote: > Greetings, > > I have Python 3.6 script on Windows to scrape comment history from a > website. It's currently set up this way: > > Requestor (threads) -> list -> Parser (threads) -> queue -> CVSWriter > (single thread) > > It takes 15 minutes to proce

BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list
Greetings, I have Python 3.6 script on Windows to scrape comment history from a website. It's currently set up this way: Requestor (threads) -> list -> Parser (threads) -> queue -> CVSWriter (single thread) It takes 15 minutes to process ~11,000 comments. When I replaced the list with a qu