Re: Create many objects using threads
On 05/06/2014 05:46 AM, hardcoremore wrote: > But what does exactly means that Garbage Collector blocks? What > does it blocks and in which way? I know this much: The current GC that comes in D runtime is a single-threaded GC (aka "a stop-the-world GC"), meaning that all threads are stopped when the GC is running a garbage collection cycle. > And can I use threads to create multiple instance faster or that is just > not possible? My example program that did nothing but constructed objects on the GC heap cannot be an indicator of the performance of all multi-threaded programs. In real programs there will be computation-intensive parts; there will be parts blocked on I/O; etc. There is no way of knowing without measuring. Ali
Re: Create many objects using threads
On Tuesday, 6 May 2014 at 15:56:11 UTC, Kapps wrote: On Monday, 5 May 2014 at 22:11:39 UTC, Ali Çehreli wrote: On 05/05/2014 02:38 PM, Kapps wrote: > I think that the GC actually blocks when > creating objects, and thus multiple threads creating instances would not > provide a significant speedup, possibly even a slowdown. Wow! That is the case. :) > You'd want to benchmark this to be certain it helps. I did: import std.range; import std.parallelism; class C {} void foo() { auto c = new C; } void main(string[] args) { enum totalElements = 10_000_000; if (args.length > 1) { foreach (i; iota(totalElements).parallel) { foo(); } } else { foreach (i; iota(totalElements)) { foo(); } } } Typical run on my system for "-O -noboundscheck -inline": $ time ./deneme parallel real0m4.236s user0m4.325s sys 0m9.795s $ time ./deneme real0m0.753s user0m0.748s sys 0m0.003s Ali Huh, that's a much, much, higher impact than I'd expected. I tried with GDC as well (the one in Debian stable, which is unfortunately still 2.055...) and got similar results. I also tried creating only totalCPUs threads and having each of them create NUM_ELEMENTS / totalCPUs objects rather than risking that each creation was a task, and it still seems to be the same. snip I tried with using an allocator that never releases memory, rounds up to a power of 2, and is lock-free. The results are quite a bit better. shardsoft:~$ ./test 1 sec, 47 ms, 474 μs, and 4 hnsecs shardsoft:~$ ./test 1 sec, 43 ms, 588 μs, and 2 hnsecs shardsoft:~$ ./test tasks 692 ms, 769 μs, and 8 hnsecs shardsoft:~$ ./test tasks 692 ms, 686 μs, and 8 hnsecs shardsoft:~$ ./test parallel 691 ms, 856 μs, and 9 hnsecs shardsoft:~$ ./test parallel 690 ms, 22 μs, and 3 hnsecs I get similar results on my laptop (which is much faster than the results I got on it using DMD's malloc): test 1 sec, 125 ms, and 847 ╬╝s test 1 sec, 125 ms, 741 ╬╝s, and 6 hnsecs test tasks 556 ms, 613 ╬╝s, and 8 hnsecs test tasks 552 ms and 287 ╬╝s test parallel 554 ms, 542 ╬╝s, and 6 hnsecs test parallel 551 ms, 514 ╬╝s, and 9 hnsecs Code: http://pastie.org/9146326 Unfortunately it doesn't compile with the ancient version of gdc available in Debian, so I couldn't test with that. The results should be quite a bit better since core.atomic would be faster. And frankly, I'm not sure if the allocator actually works properly, but it's just for testing purposes anyways.
Re: Create many objects using threads
On Monday, 5 May 2014 at 22:11:39 UTC, Ali Çehreli wrote: On 05/05/2014 02:38 PM, Kapps wrote: > I think that the GC actually blocks when > creating objects, and thus multiple threads creating instances would not > provide a significant speedup, possibly even a slowdown. Wow! That is the case. :) > You'd want to benchmark this to be certain it helps. I did: import std.range; import std.parallelism; class C {} void foo() { auto c = new C; } void main(string[] args) { enum totalElements = 10_000_000; if (args.length > 1) { foreach (i; iota(totalElements).parallel) { foo(); } } else { foreach (i; iota(totalElements)) { foo(); } } } Typical run on my system for "-O -noboundscheck -inline": $ time ./deneme parallel real0m4.236s user0m4.325s sys 0m9.795s $ time ./deneme real0m0.753s user0m0.748s sys 0m0.003s Ali Huh, that's a much, much, higher impact than I'd expected. I tried with GDC as well (the one in Debian stable, which is unfortunately still 2.055...) and got similar results. I also tried creating only totalCPUs threads and having each of them create NUM_ELEMENTS / totalCPUs objects rather than risking that each creation was a task, and it still seems to be the same. Using malloc and emplace instead of new D, results are about 50% faster for single-threadeded and ~3-4 times faster for multi-threaded (4 cpu 8 thread machine, Linux 64-bit). The multi-threaded version is still twice as slow though. On my Windows laptop (with the program compiled for 32-bit), it did not make a significant difference and the multi-threaded version is still 4 times slower. That being said, I think most malloc implementations while being thread-safe, usually use locks or do not scale well. Code: import std.range; import std.parallelism; import std.datetime; import std.stdio; import core.stdc.stdlib; import std.conv; class C {} void foo() { //auto c = new C; enum size = __traits(classInstanceSize, C); void[] mem = malloc(size)[0..size]; emplace!C(mem); } void createFoos(size_t count) { foreach(i; 0 .. count) { foo(); } } void main(string[] args) { StopWatch sw = StopWatch(AutoStart.yes); enum totalElements = 10_000_000; if (args.length <= 1) { foreach (i; iota(totalElements)) { foo(); } } else if(args[1] == "tasks") { foreach (i; parallel(iota(totalElements))) { foo(); } } else if(args[1] == "parallel") { for(int i = 0; i < totalCPUs; i++) { taskPool.put(task(&createFoos, totalElements / totalCPUs)); } taskPool.finish(true); } else writeln("Unknown argument '", args[1], "'."); sw.stop(); writeln(cast(Duration)sw.peek); } Results (Linux 64-bit): shardsoft:~$ dmd -O -inline -release test.d shardsoft:~$ ./test 552 ms, 729 μs, and 7 hnsecs shardsoft:~$ ./test 532 ms, 139 μs, and 5 hnsecs shardsoft:~$ ./test tasks 1 sec, 171 ms, 126 μs, and 4 hnsecs shardsoft:~$ ./test tasks 1 sec, 38 ms, 468 μs, and 6 hnsecs shardsoft:~$ ./test parallel 1 sec, 146 ms, 738 μs, and 2 hnsecs shardsoft:~$ ./test parallel 1 sec, 268 ms, 195 μs, and 3 hnsecs
Re: Create many objects using threads
On Tuesday, 6 May 2014 at 03:26:52 UTC, Ali Çehreli wrote: On 05/05/2014 04:32 PM, Caslav Sabani wrote: > So basically using threads in D for creating multiple instances of class is > actually slower. Not at all! That statement can be true only in certain programs. :) Ali But what does exactly means that Garbage Collector blocks? What does it blocks and in which way? And can I use threads to create multiple instance faster or that is just not possible? Thanks
Re: Create many objects using threads
On 05/05/2014 04:32 PM, Caslav Sabani wrote: > So basically using threads in D for creating multiple instances of class is > actually slower. Not at all! That statement can be true only in certain programs. :) Ali
Re: Create many objects using threads
Hi all, Thanks for your reply. So basically using threads in D for creating multiple instances of class is actually slower. But what does exactly means that Garbage Collector blocks? What does it blocks and in which way? Thanks
Re: Create many objects using threads
On 05/05/2014 02:38 PM, Kapps wrote: > I think that the GC actually blocks when > creating objects, and thus multiple threads creating instances would not > provide a significant speedup, possibly even a slowdown. Wow! That is the case. :) > You'd want to benchmark this to be certain it helps. I did: import std.range; import std.parallelism; class C {} void foo() { auto c = new C; } void main(string[] args) { enum totalElements = 10_000_000; if (args.length > 1) { foreach (i; iota(totalElements).parallel) { foo(); } } else { foreach (i; iota(totalElements)) { foo(); } } } Typical run on my system for "-O -noboundscheck -inline": $ time ./deneme parallel real0m4.236s user0m4.325s sys 0m9.795s $ time ./deneme real0m0.753s user0m0.748s sys 0m0.003s Ali
Re: Create many objects using threads
On Monday, 5 May 2014 at 17:14:54 UTC, Caslav Sabani wrote: Hi, I have just started to learn D. Its a great language. I am trying to achieve the following but I am not sure is it possible or should be done at all: I want to have one array where I will store like 10 objects. But I want to use 4 threads where each thread will create 25000 objects and store them in array above mentioned. And all 4 threads should be working in parallel because I have 4 core processor for example. I do not care in which order objects are created nor objects should be aware of one another. I just need them stored in array. Can threading help in creating many objects at once? Note that I am beginner at working with threads so any help is welcome :) Thanks I could be wrong here, but I think that the GC actually blocks when creating objects, and thus multiple threads creating instances would not provide a significant speedup, possibly even a slowdown. You'd want to benchmark this to be certain it helps.
Re: Create many objects using threads
On 05/05/2014 01:38 PM, Caslav Sabani wrote: > I am struggling to understand from your example where is the code that > creates or spawns new thread. The .parallel in the foreach loop makes the body of the loop be executed in parallel. > How do you create new thread and fill array with instantiated objects in > that thread? It is automatic in that example but you can created thread explicitly by std.concurrency or core.thread as well. Ali
Re: Create many objects using threads
Hi Ali, Thanks for your reply. But I am struggling to understand from your example where is the code that creates or spawns new thread. How do you create new thread and fill array with instantiated objects in that thread? Thanks
Re: Create many objects using threads
On 05/05/2014 10:25 AM, Ali Çehreli wrote: > On 05/05/2014 10:14 AM, Caslav Sabani wrote: > > > I want to have one array where I will store like 10 objects. > > > > But I want to use 4 threads where each thread will create 25000 objects > > and store them in array above mentioned. > > 1) If it has to be a single array, meaning that all of the objects are > in consecutive memory, you can create the array and give four slices of > it to the four tasks. > > To do that, you can either create a proper D array filled with objects > with .init values; or you can allocate any type of memory and create > objects in place there. Here is an example: import std.stdio; import std.parallelism; import core.thread; import std.conv; enum elementCount = 8; size_t elementPerThread; static this () { assert((elementCount % totalCPUs) == 0, "Cannot distribute tasks to cores evenly"); elementPerThread = elementCount / totalCPUs; } void main() { auto arr = new int[](elementCount); foreach (i; 0 .. totalCPUs) { const beg = i * elementPerThread; const end = beg + elementPerThread; arr[beg .. end] = i.to!int; } thread_joinAll();// (I don't think this is necessary with std.parallelism) writeln(arr);// [ 0, 0, 1, 1, 2, 2, 3, 3 ] } > 2) If it doesn't have to a single array, you can have the four tasks > create four separate arrays. You can then use them as a single range by > std.range.chain. That is a lie. :) chain would work but it had to know the number of total cores at compile time. Instead, joiner or join can be used: import std.stdio; import std.parallelism; import core.thread; enum elementCount = 8; size_t elementPerThread; static this () { assert((elementCount % totalCPUs) == 0, "Cannot distribute tasks to cores evenly"); elementPerThread = elementCount / totalCPUs; } void main() { auto arr = new int[][](totalCPUs); foreach (i; 0 .. totalCPUs) { foreach (e; 0 .. elementPerThread) { arr[i] ~= i; } } thread_joinAll();// (I don't think this is necessary with std.parallelism) writeln(arr); // [[0, 0], [1, 1], [2, 2], [3, 3]] import std.range; writeln(arr.joiner); // [ 0, 0, 1, 1, 2, 2, 3, 3 ] import std.algorithm; auto arr2 = arr.joiner.array; static assert(is (typeof(arr2) == int[])); writeln(arr2); // [ 0, 0, 1, 1, 2, 2, 3, 3 ] auto arr3 = arr.join; static assert(is (typeof(arr3) == int[])); writeln(arr3); // [ 0, 0, 1, 1, 2, 2, 3, 3 ] } > This option allows you to have a single array as well. arr2 and arr3 above are examples of that. Ali