Re: Long running process - how to speed up?

2022-02-23 Thread Robert Latest via Python-list
Shaozhong SHI wrote:
> Can it be divided into several processes?

I'd do it like this:

from time import sleep
from threading import Thread

t = Thread(target=lambda: sleep(1))
t.run()

# do your work here

t.wait()

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Long running process - how to speed up?

2022-02-20 Thread Dennis Lee Bieber
On Sun, 20 Feb 2022 18:05:33 +, Shaozhong SHI 
declaimed the following:


>I am trying this approach,
>
>import multiprocessing as mp
>
>def my_func(x):
>  print(x**x)
>

Not much of a processing load there, especially for your small set of
integers. I suspect you are consuming a significant time just having the OS
create and tear down each process.

In truth, for that example, I suspect plain threads will run faster
because you don't have the overhead of setting up a process with new
stdin/stdout/stderr. The exponential operation probably runs within one
Python threading quantum, and the print() will trigger a thread switch to
allow the next one to compute.

>def main():
>  pool = mp.Pool(mp.cpu_count())
>  result = pool.map(my_func, [4,2,3])

-=-=-
>>> import multiprocessing as mp
>>> mp.cpu_count()
8
>>> 
-=-=-

Really under-loaded on my Win10 system (hyperthreaded processors count
as 2 CPUs, so a quad-core HT reports as 8 CPUs).Even an older
Raspberry-Pi 3B (quad core) reports

-=-=-
md_admin@microdiversity:~$ python3
Python 3.7.3 (default, Jan 22 2021, 20:04:44)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing as mp
>>> mp.cpu_count()
4
>>> exit()
md_admin@microdiversity:~$
-=-=-


-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Long running process - how to speed up?

2022-02-20 Thread Avi Gross via Python-list
dAVId,
I would like to assume that the processing needed is quite a bit more than 
calculating the square of each X.
But as we are getting negligible information on anything useful, why continue 
interacting. 
I am trying to imagin a scenario with a million rows of sorts in a CSV 
(typically not used for a single column of data, as that tends not to use 
commas) where you read in the data and want to generate just the square of each 
number in what may be a disorganized order as threads are not guaranteed to do 
anything but interleave!
Parallelism can be used in cases where the data is segmented properly and the 
algorithm adjusted to fit the needs. But the overhead can be substantial. For 
the  trivial task mentioned, which I have to hope is not the actual task, you 
can get quite a decent speed by reading it into a numpy data structure and 
using a vectorized way to produce the squares and simply print that. 
The original question here is turning out to be mysterious as it began by 
asking how to speed up some slow process not yet explained and some mumbling 
about calling a sleep function. I note some parallel algorithms require a 
variant of that in that some parts must wait for other parts to complete and 
arrange to be dormant till signaled or schedule themselves to be woken 
regularly and check if things are ready for them to resume. Sleeping is a very 
common occurence in systems that are time-shared.


-Original Message-
From: Shaozhong SHI 
To: Mats Wichmann 
Cc: python-list@python.org
Sent: Sun, Feb 20, 2022 1:05 pm
Subject: Re: Long running process - how to speed up?

On Sat, 19 Feb 2022 at 19:44, Mats Wichmann  wrote:

> On 2/19/22 05:09, Shaozhong SHI wrote:
> > Can it be divided into several processes?
> > Regards,
> > David
>
> The answer is: "maybe".  Multiprocessing doesn't happen for free, you
> have to figure out how to divide the task up, requiring thought and
> effort. We can't guess to what extent the problem you have is amenable
> to multiprocessing.
>
> Google for "dataframe" and "multiprocessing" and you should get some
> hits (in my somewhat limited experience in this area, people usually
> load the csv data into Pandas before they get started working with it).
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list


I am trying this approach,

import multiprocessing as mp

def my_func(x):
  print(x**x)

def main():
  pool = mp.Pool(mp.cpu_count())
  result = pool.map(my_func, [4,2,3])

if __name__ == "__main__":
  main()

I modified the script and set off a test run.

However, I have no idea whether this approach will be faster than
conventional approach.

Any one has idea?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Long running process - how to speed up?

2022-02-20 Thread Chris Angelico
On Mon, 21 Feb 2022 at 05:07, Shaozhong SHI  wrote:
> However, I have no idea whether this approach will be faster than
> conventional approach.
>
> Any one has idea?

Try it. Find out. The only way to know is to measure.

I can't see the sleep call though. You may need to post your actual
code instead of trivial examples.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Re: Long running process - how to speed up?

2022-02-20 Thread Shaozhong SHI
On Sat, 19 Feb 2022 at 18:51, Alan Gauld  wrote:

> On 19/02/2022 11:28, Shaozhong SHI wrote:
>
> > I have a cvs file of 932956 row
>
> That's not a lot in modern computing terms.
>
> > and have to have time.sleep in a Python
> > script.
>
> Why? Is it a requirement by your customer? Your manager?
> time.sleep() is not usually helpful if you want to do
> things quickly.
>
> > It takes a long time to process.
>
> What is a "long time"? minutes? hours? days? weeks?
>
> It should take a million times as long as it takes to
> process one row. But you have given no clue what you
> are doing in each row.
> - reading a database?
> - reading from the network? or the internet?
> - writing to a database? or the internet?
> - performing highly complex math operations?
>
> Or perhaps the processing load is in analyzing the totality
> of the data after reading it all? A very different type
> of problem. But we just don't know.
>
> All of these factors will affect performance.
>
> > How can I speed up the processing?
>
> It all depends on the processing.
> You could try profiling your code to see where the time is spent.
>
> > Can I do multi-processing?
>
> Of course. But there is no guarantee that will speed things
> up if there is a bottleneck on a single resource somewhere.
> But it might be possible to divide and conquer and get better
> speed. It all depends on what you are doing. We can't tell.
>
> We cannot answer such a vague question with any specific
> solution.
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
> --
> https://mail.python.org/mailman/listinfo/python-list


Do not know these answers yet.  Now, it appeared to hang/stop at a point
and does not move on.

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Long running process - how to speed up?

2022-02-20 Thread Shaozhong SHI
On Sat, 19 Feb 2022 at 19:44, Mats Wichmann  wrote:

> On 2/19/22 05:09, Shaozhong SHI wrote:
> > Can it be divided into several processes?
> > Regards,
> > David
>
> The answer is: "maybe".  Multiprocessing doesn't happen for free, you
> have to figure out how to divide the task up, requiring thought and
> effort. We can't guess to what extent the problem you have is amenable
> to multiprocessing.
>
> Google for "dataframe" and "multiprocessing" and you should get some
> hits (in my somewhat limited experience in this area, people usually
> load the csv data into Pandas before they get started working with it).
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list


I am trying this approach,

import multiprocessing as mp

def my_func(x):
  print(x**x)

def main():
  pool = mp.Pool(mp.cpu_count())
  result = pool.map(my_func, [4,2,3])

if __name__ == "__main__":
  main()

I modified the script and set off a test run.

However, I have no idea whether this approach will be faster than
conventional approach.

Any one has idea?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Long running process - how to speed up?

2022-02-19 Thread Avi Gross via Python-list
Indeed not a clear request. Timing is everything but there are times ...
For many purposes, people may read in the entire CSV at a gulp into some data 
structure like a pandas DataFrame. The file is then closed and any processing 
done later does whatever you want.
Of course you can easily read on line at a time in Python and parse it by comma 
and any other processing and act on one row at a time or in small batches so 
you never need huge amounts of memory. But using other methods to read in the 
entire set of data is often better optimized and faster, and being able to do 
some things with the data is faster if done in vectorized fashion using add-ons 
like numpy and pandas. We have no idea what is being used and none of this 
explains a need to use some form of sleep.
Multi-processing helps only if you can make steps in the processing run in 
parallel without interfering with each other or making things happen out of 
order. Yes, you could read in the data and assign say 10,000 rows to a thread 
to process and then get more and assign, if done quite carefully. The results 
might need to be carefully combined and any shared variables might need locks 
and so on. Not necessarily worth it if the data is not too large and the 
calculations are small.

And it remains unclear where you want to sleep or why. Parallelism can be 
important if the sleep is to wait for the user to respond to something while 
processing continues in the background. Is it possible that whatever you are 
calling to do processing has some kind of sleep within it and you may be 
calling it as often as per row? In that case, ask why it does that and can you 
avoid that? Yes, running in parallel may let you move forward but again, it has 
to be done carefully and having thousands of processes sleeping at the same 
time may be worse!
I note badly defined questions get horrible answers. Mine included.
-Original Message-
From: Alan Gauld 
To: python-list@python.org
Sent: Sat, Feb 19, 2022 7:33 am
Subject: Fwd: Re: Long running process - how to speed up?

On 19/02/2022 11:28, Shaozhong SHI wrote:

> I have a cvs file of 932956 row 

That's not a lot in modern computing terms.

> and have to have time.sleep in a Python
> script. 

Why? Is it a requirement by your customer? Your manager?
time.sleep() is not usually helpful if you want to do
things quickly.

> It takes a long time to process.

What is a "long time"? minutes? hours? days? weeks?

It should take a million times as long as it takes to
process one row. But you have given no clue what you
are doing in each row.
- reading a database?
- reading from the network? or the internet?
- writing to a database? or the internet?
- performing highly complex math operations?

Or perhaps the processing load is in analyzing the totality
of the data after reading it all? A very different type
of problem. But we just don't know.

All of these factors will affect performance.

> How can I speed up the processing? 

It all depends on the processing.
You could try profiling your code to see where the time is spent.

> Can I do multi-processing?

Of course. But there is no guarantee that will speed things
up if there is a bottleneck on a single resource somewhere.
But it might be possible to divide and conquer and get better
speed. It all depends on what you are doing. We can't tell.

We cannot answer such a vague question with any specific
solution.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

-- 
https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Long running process - how to speed up?

2022-02-19 Thread Mats Wichmann
On 2/19/22 05:09, Shaozhong SHI wrote:
> Can it be divided into several processes?
> Regards,
> David

The answer is: "maybe".  Multiprocessing doesn't happen for free, you
have to figure out how to divide the task up, requiring thought and
effort. We can't guess to what extent the problem you have is amenable
to multiprocessing.

Google for "dataframe" and "multiprocessing" and you should get some
hits (in my somewhat limited experience in this area, people usually
load the csv data into Pandas before they get started working with it).


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Long running process - how to speed up?

2022-02-19 Thread Dennis Lee Bieber
On Sat, 19 Feb 2022 11:28:31 +, Shaozhong SHI 
declaimed the following:

>I have a cvs file of 932956 row and have to have time.sleep in a Python
>script.  It takes a long time to process.
>
I'd echo the others... Unless you better explain WHY you have .sleep()
(along with how often it is called, and what duration you sleep) the first
recommendation would be to remove it.

The most common justification for .sleep() is that one has CPU-BOUND
processing and needs to force context switches to let other operations
proceed more often than the system quantum time. Not normally a concern
given Python's GIL and the presence of multi-core chips.

How are you processing the (near) million rows of that CSV? If you are
loading all of them into a large list you could be running Python list
reallocations, or OS page swapping (though I wouldn't expect that on most
modern systems -- maybe on a Raspberry-Pi/BeagleBone Black). Note:

>>> import sys
>>> sys.getsizeof("a")
50
>>> 

even a one-character string expands to 50 bytes. An EMPTY string takes up
51 bytes... Except for the empty string, that comes to about 49+<#chars> IF
all characters fit an 8-bit encoding -- if any non 8-bit characters are in
the string, the #chars needs to be multiplied by either 2 or 4 depending
upon the widest representation needed.

If you are doing read-one-record, process-one-record, repeat -- and
have the .sleep() inside that loop... definitely remove the .sleep(). That
loop is already I/O bound, the fastest you can obtain is determined by how
rapidly the OS can transfer records from the file system to your program.


-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Fwd: Re: Long running process - how to speed up?

2022-02-19 Thread Alan Gauld
On 19/02/2022 11:28, Shaozhong SHI wrote:

> I have a cvs file of 932956 row 

That's not a lot in modern computing terms.

> and have to have time.sleep in a Python
> script. 

Why? Is it a requirement by your customer? Your manager?
time.sleep() is not usually helpful if you want to do
things quickly.

> It takes a long time to process.

What is a "long time"? minutes? hours? days? weeks?

It should take a million times as long as it takes to
process one row. But you have given no clue what you
are doing in each row.
- reading a database?
- reading from the network? or the internet?
- writing to a database? or the internet?
- performing highly complex math operations?

Or perhaps the processing load is in analyzing the totality
of the data after reading it all? A very different type
of problem. But we just don't know.

All of these factors will affect performance.

> How can I speed up the processing? 

It all depends on the processing.
You could try profiling your code to see where the time is spent.

> Can I do multi-processing?

Of course. But there is no guarantee that will speed things
up if there is a bottleneck on a single resource somewhere.
But it might be possible to divide and conquer and get better
speed. It all depends on what you are doing. We can't tell.

We cannot answer such a vague question with any specific
solution.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Long running process - how to speed up?

2022-02-19 Thread Dan Stromberg
On Sat, Feb 19, 2022 at 3:29 AM Shaozhong SHI 
wrote:

> I have a cvs file of 932956 row and have to have time.sleep in a Python
> script.  It takes a long time to process.
>
> How can I speed up the processing?  Can I do multi-processing?
>

How are you doing it right now?

Are you using the csv module?

You might be able to use the GNU "split" command as a prelude to using the
csv module in combination with multiprocessing.  GNU split comes with
Linuxes, but I'm sure you can get it for Windows.  MacOS comes with a
rather less powerful "split" command, but it still might work for you.

You also could try Pypy3.

HTH.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Long running process - how to speed up?

2022-02-19 Thread Kirill Ratkin

Hi,

How I understand your script starts another script and should wait until 
second one completes its job. Right?


If so you have several options depend on how your first script is written.

If your script is async then ... there is good asyncio option

proc = await asyncio.create_subprocess_shell(
f"{execcmd}{execargs}", stdin=None, stdout=None
)
await proc.wait() In this way you can starn many workers and you don't neet to 
wait then in sync manner.


Anyway, just please give more info about what problem you face.

19.02.2022 14:28, Shaozhong SHI пишет:

I have a cvs file of 932956 row and have to have time.sleep in a Python
script.  It takes a long time to process.

How can I speed up the processing?  Can I do multi-processing?

Regards,

David

--
https://mail.python.org/mailman/listinfo/python-list


Re: Long running process - how to speed up?

2022-02-19 Thread Shaozhong SHI
Can it be divided into several processes?
Regards,
David

On Saturday, 19 February 2022, Chris Angelico  wrote:

> On Sat, 19 Feb 2022 at 22:59, Karsten Hilbert 
> wrote:
> >
> > > > I have a cvs file of 932956 row and have to have time.sleep in a
> Python
> > > > script.  It takes a long time to process.
> > > >
> > > > How can I speed up the processing?  Can I do multi-processing?
> > > >
> > > Remove the time.sleep()?
> >
> > He's attesting to only having "time.sleep" in there...
> >
> > I doubt removing that will help much ;-)
>
> I honestly don't understand the question, hence offering the
> stupidly-obvious suggestion in the hope that it would result in a
> better question. A million rows of CSV, on its own, isn't all that
> much to process, so it must be the processing itself (of which we have
> no information other than this reference to time.sleep) that takes all
> the time.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Long running process - how to speed up?

2022-02-19 Thread Albert-Jan Roskam
   On Feb 19, 2022 12:28, Shaozhong SHI  wrote:

 I have a cvs file of 932956 row and have to have time.sleep in a Python
 script.  It takes a long time to process.

 How can I speed up the processing?  Can I do multi-processing?

   
   Perhaps a dask df: 
   https://docs.dask.org/en/latest/generated/dask.dataframe.read_csv.html
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Re: Long running process - how to speed up?

2022-02-19 Thread Chris Angelico
On Sat, 19 Feb 2022 at 22:59, Karsten Hilbert  wrote:
>
> > > I have a cvs file of 932956 row and have to have time.sleep in a Python
> > > script.  It takes a long time to process.
> > >
> > > How can I speed up the processing?  Can I do multi-processing?
> > >
> > Remove the time.sleep()?
>
> He's attesting to only having "time.sleep" in there...
>
> I doubt removing that will help much ;-)

I honestly don't understand the question, hence offering the
stupidly-obvious suggestion in the hope that it would result in a
better question. A million rows of CSV, on its own, isn't all that
much to process, so it must be the processing itself (of which we have
no information other than this reference to time.sleep) that takes all
the time.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Aw: Re: Long running process - how to speed up?

2022-02-19 Thread Karsten Hilbert
> > I have a cvs file of 932956 row and have to have time.sleep in a Python
> > script.  It takes a long time to process.
> >
> > How can I speed up the processing?  Can I do multi-processing?
> >
> Remove the time.sleep()?

He's attesting to only having "time.sleep" in there...

I doubt removing that will help much ;-)

Karsten
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Long running process - how to speed up?

2022-02-19 Thread Chris Angelico
On Sat, 19 Feb 2022 at 22:30, Shaozhong SHI  wrote:
>
> I have a cvs file of 932956 row and have to have time.sleep in a Python
> script.  It takes a long time to process.
>
> How can I speed up the processing?  Can I do multi-processing?
>
Remove the time.sleep()?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Long running process - how to speed up?

2022-02-19 Thread Shaozhong SHI
I have a cvs file of 932956 row and have to have time.sleep in a Python
script.  It takes a long time to process.

How can I speed up the processing?  Can I do multi-processing?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list