Re: New library: open multi-methods
On Tuesday, 18 July 2017 at 00:47:04 UTC, Jean-Louis Leroy wrote: I don't know R but after a trip to Wikipedia it looks like it. J-L R is listed as one of the languages with built-in support in this wiki link. I searched for multiple dispatch because I was familiar with the similar feature in julia, and that's how they refer to it. https://en.wikipedia.org/wiki/Multiple_dispatch An excerpt statement from this wiki page is : " dynamically dispatched based on the run-time (dynamic) type or, in the more general case some other attribute, of more than one of its arguments" Based on the 'some other attribute', I wonder if the library could conceivably be extended to dispatch based on the User Defined Attribute info https://dlang.org/spec/attribute.html @('c') string s; pragma(msg, __traits(getAttributes, s)); // prints tuple('c')
Re: Release fluent-asserts 0.6.0
On Sunday, 2 July 2017 at 13:34:25 UTC, Szabo Bogdan wrote: Any feedback is appreciated. Thanks, Bogdan Hi, if you're just looking for other ideas, you might want to look at adding capabilities like in the java hamcrest matchers. You might also want to support regular expression matches in the string matchers. http://hamcrest.org/JavaHamcrest/javadoc/1.3/org/hamcrest/Matchers.html These were used in swtbot, which made a very nice testing environment for their swt gui widgets. Swtbot added filtering for the context of the match as well. You can get a feel for it in this article. There is a DWT library translated from java swt, but this testing app wasn't ported. http://www.vogella.com/tutorials/SWTBot/article.html
Re: Cap'n Proto for D v0.1.2
On Wednesday, 19 April 2017 at 16:52:14 UTC, Thomas Brix Larsen wrote: Take a look at FileDescriptor[1]. It is a class I've added to support read/write using File from std.stdio. You can create a similar streamer using std.mmfile. I believe that this would be enough for memory mapped reading. [1]: https://github.com/ThomasBrixLarsen/capnproto-dlang/blob/master/source/capnproto/FileDescriptor.d Ok, thanks. I took a look at several capnproto implementations just now, and didn't see any tests for a mmap 'feature'. The roadmap doc below indicates it doesn't exist, and perhaps there are some details yet to be resolved to make it 'friendly' for a mmap. https://capnproto.org/roadmap.html mmap-friendly mutable storage format: Define a standard storage format that is friendly to mmap-based use while allowing modification. (With the current serialization format, mmap is only useful for read-only structures.) Possibly based on the ORM interface, updates only possible at the granularity of a whole ORM entry. In java the MappedByteBuffer can be used with a RandomAccessFile channel, and then the accesses of the loaded map can be random, without requiring sequential accesses of the whole mapped file. So java nio already has some higher level classes in place that would make it easier to develop a first implementation of the mmap features.
Re: Cap'n Proto for D v0.1.2
On Tuesday, 18 April 2017 at 18:09:54 UTC, Thomas Brix Larsen wrote: "Cap’n Proto is an insanely fast data interchange format and capability-based RPC system. Think JSON, except binary. Or think Protocol Buffers, except faster." The features below, from the capnproto.org description, interest me. However a MappedByteBuffer would be used for the mmap feature in java. https://capnproto.org/ mmap: Read a large Cap’n Proto file by memory-mapping it. The OS won’t even read in the parts that you don’t access. Inter-language communication: Calling C++ code from, say, Java or Python tends to be painful or slow. With Cap’n Proto, the two languages can easily operate on the same in-memory data structure. Inter-process communication: Multiple processes running on the same machine can share a Cap’n Proto message via shared memory. No need to pipe data through the kernel. Calling another process can be just as fast and easy as calling another thread. This info from stackoverflow also seems to imply that MappedByteBuffer would be required for some of the capnproto features. So, could you explain a little more about what are the capabilities of the current d library implementation, with just the ByteBuffer implemented from the java nio code? Thanks, Jay http://stackoverflow.com/questions/29361058/read-proto-partly-instead-of-full-parsing-in-java 'If you are willing to consider using a different protocol framework, Cap'n Proto is extremely similar in design to Protobufs but features in the ability to read only the part of the message you care about. Cap'n Proto incurs no overhead for the fields you don't examine, other than obviously the bandwidth and memory to receive the raw message bytes. If you are reading from a file, and you use memory mapping (MappedByteBuffer in Java), then only the parts of the message you actually use will be read from disk.'
Re: 4x faster strlen with 4 char sentinel
On Tuesday, 28 June 2016 at 09:18:34 UTC, qznc wrote: Did you also compare to strlen from libc? I'd guess GNU libc uses a lot more tricks like vector instructions. I did test with the libc strlen, although the D libraries did not have a strlen for dchar or wchar. I'm currently using this for comparison, and playing around with shorter string lengths: nothrow pure size_t strlen(const(char)* c) { if (c is null ) return 0; const(char)* c_save = c; while (*c) { c++; } return c - c_save; } I'm also trying some tests on a PA device where I have tools to look at cache hits, misses, branches mispredicted. Similar C code.
Re: 4x faster strlen with 4 char sentinel
On Tuesday, 28 June 2016 at 09:31:46 UTC, Sebastiaan Koppe wrote: If we were in interview, I'd ask you "what does this returns if you pass it an empty string ?" Since no one is answering: It depends on the memory right before c. But if there is at least one 0 right before it - which is quite likely - then you get some crazy big number returned. Yes, the test checked for 0 length but not with a preceding 0. I posted the fix. if (c is null || *c==0)
Re: 4x faster strlen with 4 char sentinel
On Tuesday, 28 June 2016 at 03:11:26 UTC, Jay Norwood wrote: On Tuesday, 28 June 2016 at 01:53:22 UTC, deadalnix wrote: If we were in interview, I'd ask you "what does this returns if you pass it an empty string ?" oops. I see ... need to test for empty string. nothrow pure size_t strlen2(const(char)* c) { if (c is null || *c==0) return 0; const(char)* c_save = c; while (*c){ c+=4; } while (*c==0){ c--; } c++; return c - c_save; }
Re: 4x faster strlen with 4 char sentinel
On Tuesday, 28 June 2016 at 01:53:22 UTC, deadalnix wrote: If we were in interview, I'd ask you "what does this returns if you pass it an empty string ?" I'd say use this one instead, to avoid negative size_t. It is also a little faster for the same measurement. nothrow pure size_t strlen2(const(char)* c) { if (c is null) return 0; const(char)* c_save = c; while (*c){ c+=4; } while (*c==0){ c--; } c++; return c - c_save; } 2738 540 2744
Re: 4x faster strlen with 4 char sentinel
On Monday, 27 June 2016 at 20:43:40 UTC, Ola Fosheim Grøstad wrote: Just keep in mind that the major bottleneck now is loading 64 bytes from memory into cache. So if you test performance you have to make sure to invalidate the caches before you test and test with spurious reads over a very large memory area to get realistic results. But essentially, the operation is not heavy, so to speed it up you need to predict and prefetch from memory in time, meaning no library solution is sufficient. (you need to prefetch memory way before your library function is called) I doubt the external memory accesses are involved in these measurements. I'm using a 100KB char array terminated by four zeros, and doing strlen on substring pointers into it incremented by 1 for 100K times. The middle of the three timings is for strlen2, while the two outer timings are for strlen during the same program execution. I'm initializing the 100KB immediately prior to the measurement. The 100KB array should all be in L1 or L2 cache by the time I make even the first of the three time measurements. The prefetch shouldn't have a problem predicting this. 2749 688 2783 2741 683 2738
Re: 4x faster strlen with 4 char sentinel
On Monday, 27 June 2016 at 16:38:58 UTC, Ola Fosheim Grøstad wrote: Yes, and the idea of speeding up strings by padding out with zeros is not new. ;-) I recall suggesting it back in 1999 when discussing the benefits of having a big endian cpu when sorting strings. If it is big endian you can compare ascii as 32/64 bit integers, so if you align the string and pad out with zeros then you can speed up strcmp() by a significant factor. Oh, here it is: Your link's use of padding pads out with a variable number of zeros, so that a larger data type can be used for the compare operations. This isn't the same as my example, which is simpler due to not having to fiddle with alignment and data type casting. I didn't find a strlen implementation for dchar or wchar in the D libraries. I also found it strange, the non-zero initialization values for char, dchar, wchar. I suppose there's some reason? int [100] to zeros. char [100] to 0xff; dchar [100] to 0x; wchar [100] to 0x;
Re: 4x faster strlen with 4 char sentinel
On Monday, 27 June 2016 at 06:31:49 UTC, Ola Fosheim Grøstad wrote: Besides there are plenty of other advantages to using a terminating sentinel depending on the use scenario. E.g. if you want many versions of the same tail or if you are splitting a string at white space (overwrite a white space char with a zero). This strlen2 doesn't require special alignment or casting of char pointer types to some larger type. That keeps the strlen2 implementation fairly simple. The implementation is only testing one char per increment. It doesn't require the extra xor processing used in some of the examples. I haven't checked if there is a strlen for dchar or wchar, but it should also speed up those.
Re: 4x faster strlen with 4 char sentinel
On Sunday, 26 June 2016 at 16:59:54 UTC, David Nadlinger wrote: Please keep general discussions like this off the announce list, which would e.g. be suitable for announcing a fleshed out collection of high-performance string handling routines. A couple of quick hints: - This is not a correct implementation of strlen, as it already assumes that the array is terminated by four zero bytes. That iterating memory with a stride of 4 instead of 1 will be faster is a self-evident truth. - You should be benchmarking against a "proper" SIMD-optimised strlen implementation. — David This is more of just an observation that the choice of the single zero sentinel for C string termination comes at a cost of 4x strlen speed vs using four terminating zeros. I don't see a SIMD strlen implementation in the D libraries. The strlen2 function I posted works on any string that is terminated by four zeros, and returns the same len as strlen in that case, but much faster. How to get strings initialized with four terminating zeros at compile time is a separate issue. I don't know the solution, else I might consider doing more with this.
4x faster strlen with 4 char sentinel
After watching Andre's sentinel thing, I'm playing with strlen on char strings with 4 terminating 0s instead of a single one. Seems to work and is 4x faster compared to the runtime version. nothrow pure size_t strlen2(const(char)* c) { if (c is null) return 0; size_t l=0; while (*c){ c+=4; l+=4;} while (*c==0){ c--; l--;} return l+1; } This is the timing of my test case, which I can post if anyone is interested. strlen\Release>strlen 2738 681
Re: SDLang-D v0.9.0
Very nice. I wonder about representation of references, and perhaps replication, inheritance. Does SDL just punt on those?
Re: D is for Data Science
On Monday, 24 November 2014 at 23:32:14 UTC, Jay Norwood wrote: Is this related? https://github.com/dscience-developers/dscience This seems good too. Why the comments in the discussion about lack of libraries? https://github.com/kyllingstad/scid/wiki
Re: D is for Data Science
On Monday, 24 November 2014 at 15:27:19 UTC, Gary Willoughby wrote: Just browsing reddit and found this article posted about D. Written by Andrew Pascoe of AdRoll. From the article: "The D programming language has quickly become our language of choice on the Data Science team for any task that requires efficiency, and is now the keystone language for our critical infrastructure. Why? Because D has a lot to offer." Article: http://tech.adroll.com/blog/data/2014/11/17/d-is-for-data-science.html Reddit: http://www.reddit.com/r/programming/comments/2n9gfb/d_is_for_data_science/ Is this related? https://github.com/dscience-developers/dscience
Re: Experimental win32 OMF linker written in D now on github
On Sunday, 23 March 2014 at 20:33:15 UTC, Daniel Murphy wrote: It still needs a lot of work, but it's functional. Is there a test suite that you have to pass to declare it fully functional?
Re: DDT 0.10.0 released - featuring DUB support.
On Friday, 14 March 2014 at 15:44:24 UTC, Bruno Medeiros wrote: A new version of DDT - D Development tools is out. This has really nice source browsing... much better than the VisualD. I end up using both because the debugging support is still better in VisualD. One browsing issue I noticed though is that it has a problem finding source for properties, for example if you try to find the isDir definition from lin 2216 of file.d.
Re: DDT 0.10.0 release - featuring DUB support.
I ran into this Kepler bug trying to update. The work-around is stated, which involves renaming your eclipse.exe. Worked for me. https://bugs.eclipse.org/bugs/show_bug.cgi?id=55
Re: Facebook open sources flint, a C++ linter written in D
http://www.reddit.com/r/programming/comments/1yts5n/facebook_open_sources_flint_a_c_linter_written_in/ Somewhere in that thread was a mention of facebook moving away from git because it was too slow. I thought it was interesting and found this info on the topic ... They rewrote some sections of Mercurial to make it scale better, and got it working faster than git in their environment. https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/
Re: Revamp of CandyDOC
On Wednesday, 11 April 2012 at 22:17:16 UTC, Eldar Insafutdinov wrote: example http://eldar.me/candydoc/algorithm.html . Among new The outline panel links work fine on Google Chrome, but not on IE8.
Re: Modern COM Programming in D
On Tuesday, 3 April 2012 at 14:10:32 UTC, Jesse Phillips wrote: Most of his code isn't available as it was kind of under Microsoft. However I revived Juno for D2 awhile ago (still need to play with it myself). Juno provides some nice tools and API. https://github.com/JesseKPhillips/Juno-Windows-Class-Library http://dsource.org/projects/juno There was a discussion about D support for xpcom at this thread a few years back, where the conclusion seems to have been that xpcom usage was abi compatible. Have you tested communicating with xpcom objects using the juno library? http://forum.dlang.org/thread/fqkfsu$25g5$1...@digitalmars.com#post-fqph1i:241kk3:241:40digitalmars.com
code to get LCN from filename
I hacked up one of the file.d functions to create a function that returns the first Logical Cluster Number for a regular file. I've tested it on the 2GB layout that has been defragged with the myDefrag sortByName() operation, and it works as expected. Values of 0 mean the file was small enough to fit in the MFT. The LCN numbers would be a good thing to sort by before doing accesses on entries coming from any large directory operations ... for example zip, copy, delete of directories. enum { FILE_DEVICE_FILE_SYSTEM = 9, METHOD_NEITHER = 3, FILE_ANY_ACCESS = 0 } uint CTL_CODE(uint t, uint f, uint m, uint a) { return (t << 16) | (a << 14) | (f << 2) | m; } const FSCTL_GET_RETRIEVAL_POINTERS = CTL_CODE(FILE_DEVICE_FILE_SYSTEM,28,METHOD_NEITHER,FILE_ANY_ACCESS); /* extern (Windows) int DeviceIoControl(void *, uint, void *, uint, void *, uint, uint *, _OVERLAPPED *); from WinIoCtl.h in SDK #define FSCTL_GET_RETRIEVAL_POINTERS CTL_CODE(FILE_DEVICE_FILE_SYSTEM, 28, METHOD_NEITHER, FILE_ANY_ACCESS) // STARTING_VCN_INPUT_BUFFER, RETRIEVAL_POINTERS_BUFFER */ struct RETRIEVAL_POINTERS_BUFFER{ } ulong getStartLCN (in char[] name) { int[] buffer = [ 0 ]; version(Windows) { alias TypeTuple!(GENERIC_READ, FILE_SHARE_READ, null, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, HANDLE.init) defaults; auto h = useWfuncs ? CreateFileW(std.utf.toUTF16z(name), defaults) : CreateFileA(toMBSz(name), defaults); cenforce(h != INVALID_HANDLE_VALUE, name); scope(exit) cenforce(CloseHandle(h), name); alias long LARGE_INTEGER ; struct STARTING_VCN_INPUT_BUFFER { LARGE_INTEGER StartingVcn; } STARTING_VCN_INPUT_BUFFER inputVcn; inputVcn.StartingVcn = 0; struct RPExtents{ LARGE_INTEGER NextVcn; LARGE_INTEGER Lcn; } struct RETRIEVAL_POINTERS_BUFFER { DWORD ExtentCount; LARGE_INTEGER StartingVcn; RPExtents rpExtents[1]; } RETRIEVAL_POINTERS_BUFFER rpBuf; DWORD numBytes; //expect only a partial return of one rpExtent DeviceIoControl( h, FSCTL_GET_RETRIEVAL_POINTERS, cast(void*)&inputVcn, inputVcn.sizeof, cast(void*)&rpBuf, rpBuf.sizeof, &numBytes, null ); return cast(ulong)rpBuf.rpExtents[0].Lcn; } else version(Posix) return 0; // not implemented }
Re: unzip parallel, 3x faster than 7zip
On Sunday, 8 April 2012 at 13:55:21 UTC, Marco Leise wrote: Maybe the kernel caches writes, but synchronizes deletes? (So the seek times become apparent there, and not in the writes) Also check the file creation flags, maybe you can hint Windows to the final file size and they wont be fragmented? My understanding is that a delete operation occurs after all the file handles associated with a file are closed, assuming there other handles were opened with file_share_delete. I believe otherwise you get an error from the attempt to delete. I'm doing some experiments with myFrag sortByName() and it indicates to me that there will be huge improvments in delete efficiency available on a hard drive if you can figure out some way to get the os to arrange the files and directories in LCNs in that byName order. Below are the delete time from win7 rmdir on the same 2GB folder with and without defrag using myFrag sortByName(). This is win7 rmdir following myFrag sortByName() defrag ... less than 7 seconds G:\>cmd /v:on /c "echo !TIME! & rmdir /q /s tz & echo !TIME!" 9:06:33.79 9:06:40.47 This is the same rmdir without defrag of the folder. 2 minutes 14 secs. G:\>cmd /v:on /c "echo !TIME! & rmdir /q /s tz & echo !TIME!" 14:34:09.06 14:36:23.36 This is all on win7 ntfs, and I have no idea if similar gains are available for linux. So, yes, whatever tricks you can play with the win api in order to get it to organize the unzipped archive into this particular order is going to make huge improvements in the speed of delete.
Re: unzip parallel, 3x faster than 7zip
On Saturday, 7 April 2012 at 17:08:33 UTC, Jay Norwood wrote: The mydefrag program uses the ntfs defrag api. There is an article at the following link showing how to access it to get the Logical Cluster Numbers on disk for a file. I suppose you could sort your file operations by start LCN, of the file, for example during compression, and that might reduce the seek related delays. http://blogs.msdn.com/b/jeffrey_wall/archive/2004/09/13/229137.aspx I did a complete defrag of the g hard drive, then did parallel unzip of the tz folder with rmd, then unzipped it again with the parallel uzp. Then analyzed the disk again with mydefrag. The analysis shows the unzip resulted in over 300 fragmented files created, even though I wrote each expanded file in a single operation. So, I did a complete defrag again, then removed the folder again, and get about the same 109 secs for the delete operation on the hd (vs about 17 sec on the ssd for the same operation). The uzp parallel unzip is bout 85 secs vs about 17.5 sec on the ssd. G:\>rmd tz removing: .\tz finished! time:109817 ms G:\>uzp tzip.zip tz unzipping: .\tzip.zip finished! time: 85405 ms G:\>rmd tz removing: .\tz finished! time:108387 ms So ... it looks like the defrag helps, as the 109 sec values are at the low end of the range I've seen previously. Still it is totally surprising to me that deleting files should take longer than creating the same files. btw, here are the windows rmdir on the defragged hd and on the ssd drive, and the third measurement is the D parallel rmd on the ssd ... much faster on D. G:\>cmd /v:on /c "echo !TIME! & rmdir /q /s tz & echo !TIME!" 14:34:09.06 14:36:23.36 H:\>cmd /v:on /c "echo !TIME! & rmdir /q /s tz & echo !TIME!" 14:38:44.69 14:40:02.16 H:\>rmd tz removing: .\tz finished! time:17536 ms
Re: unzip parallel, 3x faster than 7zip
On Saturday, 7 April 2012 at 11:41:41 UTC, Rainer Schuetze wrote: > Maybe it is the trim command being executed on the sectors previously occupied by the file. No, perhaps I didn't make it clear that the rmdir slowness is only an issue on hard drives. I can unzip the 2GB archive in about 17.5 sec on the ssd drive, and delete it using the rmd multi-thread delete example program in less than 17 secs on the ssd drive. The same operations on a hard drive take around 60 seconds to extract, but 1.5 to 3 minutes to delete. H:\>uzp tzip.zip tz unzipping: .\tzip.zip finished! time: 17405 ms H:\>rmd tz removing: .\tz finished! time:16671 ms I've been doing some reading on the web and studying the procmon logs. I am convinced the slow hard drive delete is an issue with seek times, since it is not an issue on the ssd. It may be caused by fragmentation of the stored data or the mft itself, or else it could be that ntfs is doing some book-keeping journaling. You are right that it could be doing delete notifications to any application watching the disk activity. I've already turned off the virus checker and the indexing, but I'm going to try the tweaks in the second link and also try the mydefrag program in the third link and see if anything improves the hd delete times. http://ixbtlabs.com/articles/ntfs/index3.html http://www.gilsmethod.com/speed-up-vista-with-these-simple-ntfs-tweaks http://www.mydefrag.com/index.html That mydefrag has some interesting ideas about sorting folders by full pathname on the disk as one of the defrag algorithms. Perhaps using it, and also using unzip and zip algorithms that match the defrag algorithm, would be a nice combination. In other words, if the zip algorithm processes the files in a sorted-by-pathname order, and if the defrag algorithm has created folders that are sorted on disk by the same order, then you would expect optimally short seeks while processing the files in the order they are stored. The mydefrag program uses the ntfs defrag api. There is an article at the following link showing how to access it to get the Logical Cluster Numbers on disk for a file. I suppose you could sort your file operations by start LCN, of the file, for example during compression, and that might reduce the seek related delays. http://blogs.msdn.com/b/jeffrey_wall/archive/2004/09/13/229137.aspx
Re: unzip parallel, 3x faster than 7zip
On Saturday, 7 April 2012 at 05:02:04 UTC, dennis luehring wrote: 7zip took 55 secs _on the same file_. that is ok but he still compares different implementations 7zip is the program. It unzips many formats, with the standard zip format being one of them. The parallel d program is three times faster at decoding the zip format than 7zip decodes the same file on the same ssd drive. That is an appropriate comparison since 7zip has been my utility of choice for unzipping zip format files on windows for many years. I provided the source code in the examples folder for the complete command line utility that I used, so you may build it and compare it to whatever you like and report the results.
Re: unzip parallel, 3x faster than 7zip
On Friday, 6 April 2012 at 14:55:14 UTC, Sean Cavanaugh wrote: If you delete a directory containing several hundred thousand directories (each with 4-5 files inside, don't ask), you can see windows freeze for long periods (10+seconds) of time until it is finished, which affects everything up to and including the audio mixing (it starts looping etc). Yeah, I saw posts by people doing video complaining about such things. One good suggestion was to create may small volumes for separate projects and just do a fast format on them rather than trying to delete folders. I got procmon to see what is going on. Win7 has doing indexing and thumbnails, and there was some virus checker going on, but you can get rid of those. Still, most of the problem just boils down to the duration of the delete on close being proportional to the size of the file, and apparently related to the access times of the disk. I sometimes see .25 sec duration for a single file during the close of the delete operations on the hard drive. I've been using an intel 510 series 120GB drive for recording concerts. It is hooked up with an ineo usb3 adaptor to the front panel port of an rme ufx recorder. The laptop is just used as a controller ... the ufx does all the mixing and recording to the hard drive.
Re: unzip parallel, 3x faster than 7zip
I think he is talking about 7zip the standalone software, not 7zip the compression algorithm. 7zip took 55 secs _on the same file_. Yes, that's right, both 7zip and this uzp program are using the same deflate standard format of zip for this test. It is the only expand format that is supported in std.zip. 7zip was used to create the zip file used in the test. 7zip already has multi-core compression capability, but no multi-core uncompress. I haven't seen any multi-core uncompress for deflate format, but I did see one for bzip2 named pbzip2. In general, though, inflate/deflate are the fastest algorithms I've seen, when comparing the ones that are available in 7zip. I'm happy with the 7zip performance on compress with the inflate format, but not on the uncompress, so I will be using this uzp app. I'm curious why win7 is such a dog when removing directories. I see a lot of disk read activity going on which seems to dominate the delete time. This doesn't make any sense to me unless there is some file caching being triggered on files being deleted. I don't see any virus checker app being triggered ... it all seems to be system read activity. Maybe I'll try non cached flags, write truncate to 0 length before deleting and see if that results in faster execution when the files are deleted...
Re: unzip parallel, 3x faster than 7zip
On Thursday, 5 April 2012 at 15:07:47 UTC, Jay Norwood wrote: so, a few comments about std.zip... I attempted to use it and found that its way of unzipping is a memory hog, keeping the full original and all the unzipped data in memory. It quickly ran out of memory on my test case. The classes didn't lend themselves to parallel execution, so I broke them into a few pieces ... one creates the directory structure, one reads in compressed archive entries, one expands archive entries. The app creates the directory structure non-parallel using the mkdir recursive. I found that creating the directory structure only took about 0.4 secs of the total time in that 2GB test. I found that creating the directory structure, reading the zip entries, and expanding the data, without writing to disk, took less than 4 secs, with the expansion done in parallel. The other 13 to 14 secs were all taken up by writing out the files, with less than a half sec of that required to update the timestamps. This is on about 39k directory entries. The 17 sec result is on the intel 510 series ssd drive. on a hard drive 7zip took 128 secs and uzp took about 70 sec. G:\>uzp tzip.zip tz unzipping: .\tzip.zip finished! time: 69440 ms It is interesting that win7 takes longer to delete these directories than it does to create them.
Re: unzip parallel, 3x faster than 7zip
On Thursday, 5 April 2012 at 14:04:57 UTC, Jay Norwood wrote: I uploaded a parallel unzip here, and the main in the examples folder. So, below is a demo of how to use the example app in windows, where I unzipped a 2GB directory structure from a 1GB zip file, tzip.zip. 02/18/2012 03:23 PM test 03/30/2012 11:28 AM 968,727,390 tzip.zip 04/05/2012 08:07 AM 462,364 uzp.exe 03/21/2012 10:26 AM 1,603,584 wc.exe 03/06/2012 12:20 AM xx8 13 File(s) 1,071,302,938 bytes 14 Dir(s) 49,315,860,480 bytes fre H:\>uzp tzip.zip tz unzipping: .\tzip.zip finished! time: 17183 ms 02/18/2012 03:23 PM test 04/05/2012 08:12 AM tz 03/30/2012 11:28 AM 968,727,390 tzip.zip 04/05/2012 08:07 AM 462,364 uzp.exe 03/21/2012 10:26 AM 1,603,584 wc.exe 03/06/2012 12:20 AM xx8 13 File(s) 1,071,302,938 bytes 15 Dir(s) 47,078,543,360 bytes free The example supports several forms of commandline: uzp zipFilename to unzip in current folder, or uzp zipFilename destFoldername to unzip into the destination folder, or uzp zipf1 zipf2 zipf3 destFoldername to unzip multiple zip files to dest folder, or uzp zipf* destFoldername to unzip multiple zip files (wildarg expansion)to dest folder It overwrites existing directory entries without asking in the current form.
unzip parallel, 3x faster than 7zip
I uploaded a parallel unzip here, and the main in the examples folder. Testing on my ssd drive, unzips a 2GB directory structure in 17.5 secs. 7zip took 55 secs on the same file. This restores timestamps on the regular files. There is also a loop which will restore timestams on folders. It can be uncommented if the fix is added to std.file.setTimes that allows timestamp updates on folders. I documented a fix that I tested in issue 7819. https://github.com/jnorwood/file_parallel http://d.puremagic.com/issues/show_bug.cgi?id=7819 This has similar limitations to std.zip, Only does inflate or store, doesn't do decryption. There is a 4GB limit based on the 32 bit offsets limit of the zip format used. It processes 40MB blocks of files, and uses std.parallelism foreach loop. If the archived entry is larger than 40MB it will attempt to load it into memory, but there currently is no expansion technique in there to split a large single entry into blocks. I used the streams io to avoid the 2GB file limits still in stdio.
Re: Low feature GNUPlot controller for D2, problem solved on intel box
On Wednesday, 14 March 2012 at 07:33:52 UTC, Jay Norwood wrote: On Wednesday, 14 March 2012 at 07:16:39 UTC, Jay Norwood wrote: > I just tried this on Win7 64 bit using the latest TangoD2 and the gnuplot from this link http://sourceforge.net/projects/gnuplot/files/ I had to substitute pgnuplot.exe, which is one of the windows gnuplot exe versions that accepts the piped input GNUPlot = new Process(true, "pgnuplot.exe -persist"); With that change one of the graphs displayed, the one with title "Raw gnuplot commands". The rest all failed, maybe due to the binary record input, which they try to echo to their text command shell. The use of pgnuplot.exe wasn't necessary. The problem turns out to be that it needed "endian=little" in the format string in C2DPlot Plot(...) for my box. After that, all the plots ran perfectly well using the original "gnuplot -persist" string for the process.
Re: Pegged, a Parsing Expression Grammar (PEG) generator in D
On Saturday, 10 March 2012 at 23:28:42 UTC, Philippe Sigaud wrote: * Grammars can be dumped in a file to create a D module. In reading the D spec I've seen a few instance where there are infered items such as auto for variables and various parameters in templates. I presume your D grammar will have to have some rules to be able to infer these as well. It seems like it would not be a big step to output what are the infered proper statements as the .capture output. Is that correct? I think that would be helpful in some cases to be able to view the fully expanded expression.
Re: Low feature GNUPlot controller for D2
On Wednesday, 14 March 2012 at 07:16:39 UTC, Jay Norwood wrote: > I just tried this on Win7 64 bit using the latest TangoD2 and the gnuplot from this link http://sourceforge.net/projects/gnuplot/files/ I had to substitute pgnuplot.exe, which is one of the windows gnuplot exe versions that accepts the piped input GNUPlot = new Process(true, "pgnuplot.exe -persist"); With that change one of the graphs displayed, the one with title "Raw gnuplot commands". The rest all failed, maybe due to the binary record input, which they try to echo to their text command shell. Maybe if there were an option to use text format input, or maybe there is some terminator expected. Their command shell fails to return to the prompt after echoing the binary data, and doesn't create the graph shell on these others that use the binary record input. There was discussion of windows problems with the binary data input elsewhere, including some discussion of the input being opened in text mode. Looks like the cygwin version might be a solution. http://octave.1599824.n4.nabble.com/Gnuplot-scripts-as-output-td1680624.html http://sourceforge.net/tracker/?func=detail&atid=102055&aid=2981027&group_id=2055
Re: Low feature GNUPlot controller for D2
On Sunday, 11 March 2012 at 21:45:02 UTC, SiegeLord wrote: > Anyway, the repository for it is here: https://github.com/SiegeLord/DGnuplot It requires TangoD2 to build and gnuplot 4.4.3 to run (unless you're saving commands to a file as described above). It works on Linux, and maybe on Windows (untested). -SiegeLord I just tried this on Win7 64 bit using the latest TangoD2 and the gnuplot from this link http://sourceforge.net/projects/gnuplot/files/ I had to substitute pgnuplot.exe, which is one of the windows gnuplot exe versions that accepts the piped input GNUPlot = new Process(true, "pgnuplot.exe -persist"); With that change one of the graphs displayed, the one with title "Raw gnuplot commands". The rest all failed, maybe due to the binary record input, which they try to echo to their text command shell. Maybe if there were an option to use text format input, or maybe there is some terminator expected. Their command shell fails to return to the prompt after echoing the binary data, and doesn't create the graph shell on these others that use the binary record input.
Re: Pegged, a Parsing Expression Grammar (PEG) generator in D
On Tuesday, 13 March 2012 at 05:25:38 UTC, Jay Norwood wrote: Admittedly I have not heard of PEGs before, so I'm curious: Is this powerful enough to parse a language such as C? I've just read a few articles referenced from this page, and the second link was by someone who had done java 1.5, the second link http://bford.info/packrat/ http://www.romanredz.se/papers/FI2007.pdf Also in the later paper he did a C parser, so I suppose that is the answer ... http://www.romanredz.se/papers/FI2008.pdf
Re: Pegged, a Parsing Expression Grammar (PEG) generator in D
Admittedly I have not heard of PEGs before, so I'm curious: Is this powerful enough to parse a language such as C? I've just read a few articles referenced from this page, and the second link was by someone who had done java 1.5, the second link http://bford.info/packrat/ http://www.romanredz.se/papers/FI2007.pdf It is interesting but that article left me with some questions about the implementation in order to make it useful for my needs. I had done an experiment with mvel expression evaluation in java and gotten good improvements relative to homebrew expression evaluators. However, the mvel expressions are missing the ability to express array operations clearly, which is something that is very clear in D, and my particular need is to enable the user to express array operations. With this pegged embedded parser, it appears to me you could provide a group of context symbols as part of a language definition, similar to providing a list of reserved words, so that the parsing of the user's expression would also validate the symbols used. Also, I've been reading David Simcha's parallel_algorithm.d, here: https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d and in the parallelArrayOp portion, he has presented a way to turn the D array expressions into code that is executed in parallel on multicore systems. That is something I'd want to use, but the manual lexing requirement is a bit clunky, and the rules are unclear to me, so it seems to me a combination with the pegged parser could make that more accessible. Thanks, Jay
Re: parallel copy directory, faster than robocopy
So here is the output of a batch file I just ran on the ssd drive for the 1.5GB copy. Robocopy displays that it took around 14 secs, while the release build of the D commandline cpd utility took around 12 secs. That's a pretty consistent result on the ssd drive, which are more sensitive to cpu pr. 06:12 PM H:\xx8>robocopy /E /NDL /NFL /NC /NS /MT:8 xx8c xx8ca --- ROBOCOPY :: Robust File Copy for Windows --- Started : Mon Mar 05 18:12:33 2012 Source : H:\xx8\xx8c\ Dest : H:\xx8\xx8ca\ Files : *.* Options : *.* /NS /NC /NDL /NFL /S /E /COPY:DAT /MT:8 /R:100 /W:30 -- 100% -- TotalCopied Skipped MismatchFAILED Extras Dirs : 2627 2626 1 0 0 0 Files : 36969 36969 0 0 0 0 Bytes : 1.502 g 1.502 g 0 0 0 0 Times : 0:02:05 0:00:12 0:00:00 0:00:01 Ended : Mon Mar 05 18:12:47 2012 H:\xx8>time /T 06:12 PM H:\xx8>rmd xx8ca\* removing: xx8ca\Cross_Tools removing: xx8ca\eclipse removing: xx8ca\gnu removing: xx8ca\PA finished! time:17889 ms H:\xx8>time /T 06:13 PM H:\xx8>cpd xx8c\* xx8ca copying: xx8c\Cross_Tools copying: xx8c\eclipse copying: xx8c\gnu copying: xx8c\PA finished! time: 11681 ms H:\xx8>time /T 06:13 PM btw, I just ran robocopy with /mt:1, and it took around 42 seconds on the same drive, which is about what I see with the standard windows copy, including the gui copy. So, at least for these ssd drives the parallel processing results in worthwhile speed-ups. Started : Mon Mar 05 18:24:31 2012 Ended : Mon Mar 05 18:25:13 2012
Re: parallel copy directory, faster than robocopy
On Monday, 5 March 2012 at 16:35:09 UTC, dennis luehring wrote: do you compare single-threaded robocopy with your implementation or multithreaded? you can command robocopy to use multiple threads with /MT[:n] yes, I tested vs multithread robocopy. As someone pointed out, robocopy has lots of nice options, which I didn't try to duplicate, and is only about 10% slower on my test. I was happy to see the D app in the same ballpark as robocopy, which means to me that the very simple and clean std.parallism taskpool foreach loop can produce very good multi-core results in a very concise and readable piece of code. I've done some projects previously using omp pragmas in C++ and it is just so ugly.
Re: parallel copy directory, faster than robocopy
On Monday, 5 March 2012 at 12:48:54 UTC, Andrei Alexandrescu wrote: Sounds great! Next step, should you be interested, is to create a pull request for phobos so we can integrate your code within. Andrei I considered that. I suppose the wildArgv code could go in std.path, and the file operations into std.file. and the pull requests against those files. I haven't followed the discussions closely enough to know what are the rules/politics about adding another std library import into those. It would require adding import of std.parallelism into std.file.
Re: parallel copy directory, faster than robocopy
I placed the two parallel file operations, rmdir and copy on github in https://github.com/jnorwood/file_parallel These combine the std.parallelism operations with the std.file operations to speed up the processing on Windows. --- I also put a useful function that does argv pathname wildcard expansion in https://github.com/jnorwood/file_utils This makes use of one of the existing dirEntries call that has the pattern matching parameter which enables simple * and ? expansions in windows args. I'm only allowing expansions in the basename, and only expanding in one level of the directory. There are example Windows commandline utilies that use each of the functions in file_parallel/examples. I've only testsd these on win7, 64 bit.
Re: parallel copy directory, faster than robocopy
On Wednesday, 15 February 2012 at 00:11:32 UTC, Sean Cavanaugh wrote: > more of an 'FYI/reminder': At a minimum Robocopy does additional work to preserve the timestamps and attributes of the copies of the files (by default) so it can avoid redundant copies of files in the future. This is undoubtedly creating some additional overhead. Its probably also quite a bit worse with /SEC etc to copy permissions. On the plus side you would have windows scheduling the IO which in theory would be able to minimize seeking to some degree, compared to robocopy's serial copying. Yeah, Robocopy has a lot of nice options. Currently the D library has copy (srcpath, destpath), which goes directly to the OS copy. If it had something like copy(DirectoryEntry,destpath,options), with the options being like the Robocopy options, that might be more efficient. On the ssd seeking is on the order of 0.2msec vs 16msec on my 7200rpm seagate hard drive. I do think seeks on a hard drive will be a problem with all the small, individual file copies. So is Robocopy bundling these up in some way? I did find a nice solution in std.file for the argv expansion, btw, and posted an example on D.learn. It uses a version of dirEntries that has an extra parameter that is used for expansion that is available in std.path.
Re: parallel copy directory, faster than robocopy
An improvement is to change this first mkdir to mkdirRecurse. if (!exists(dest)){ mkdir (dest); // makes dest root }
Re: parallel copy directory, faster than robocopy
ok, I didn't test that first one very well. It worked for directory copies, but I didn't test non directories. So here is the fixed operation for non directories, where it just copies the single file. So it now does two cases: copy regular_file destinationDirectory copy folder destinationDirectory What I'd like to add is wildcard support for something like copy folder/* destinationDirectory I suppose also it could be enhanced to handle all the robocopy options, but I'm just trying out the copy speeds for now. module main; import std.stdio; import std.file; import std.path; import std.datetime; import std.parallelism; int main(string[] argv) { if (argv.length != 3){ writeln ("need to specify src and dest dir"); return 0; } // TODO expand this to handle wildcard string dest = argv[$-1]; foreach(string dir; argv[1..$-1]) { writeln("copying directory: "~ dir ); auto st1 = Clock.currTime(); //Current time in local time. cpdir(dir,dest); auto st2 = Clock.currTime(); //Current time in local time. auto dif = st2 - st1 ; auto ts= dif.toString(); writeln("time:"~ts); } writeln("finished !"); return 0; } void cpdir(in char[] pathname ,in char[] dest){ DirEntry deSrc = dirEntry(pathname); string[] files; if (!exists(dest)){ mkdir (dest); // makes dest root } DirEntry destDe = dirEntry(dest); if(!destDe.isDir()){ throw new FileException( destDe.name, " is not a directory"); } string destName = destDe.name ~ '/'; string destRoot = destName ~ baseName(deSrc.name); if(!deSrc.isDir()){ copy(deSrc.name,destRoot); } else{ string srcRoot = deSrc.name; int srcLen = srcRoot.length; mkdir(destRoot); // make an array of the regular files only, also create the directory structure // Since it is SpanMode.breadth, can just use mkdir foreach(DirEntry e; dirEntries(deSrc.name, SpanMode.breadth, false)){ if (attrIsDir(e.linkAttributes)){ string destDir = destRoot ~ e.name[srcLen..$]; mkdir(destDir); } else{ files ~= e.name; } } // parallel foreach for regular files foreach(fn ; taskPool.parallel(files)) { string dfn = destRoot ~ fn[srcLen..$]; copy(fn,dfn); } } }
Re: parallel copy directory, faster than robocopy
ok, so I guess the Add File didn't work for some reason, so here's the source. module main; import std.stdio; import std.file; import std.path; import std.datetime; import std.parallelism; int main(string[] argv) { if (argv.length != 3){ writeln ("need to specify src and dest dir"); return 0; } // TODO expand this to handle wildcard string dest = argv[$-1]; foreach(string dir; argv[1..$-1]) { writeln("copying directory: "~ dir ); auto st1 = Clock.currTime(); //Current time in local time. cpdir(dir,dest); auto st2 = Clock.currTime(); //Current time in local time. auto dif = st2 - st1 ; auto ts= dif.toString(); writeln("time:"~ts); } writeln("finished !"); return 0; } void cpdir(in char[] pathname ,in char[] dest){ DirEntry deSrc = dirEntry(pathname); string[] files; if (!exists(dest)){ mkdir (dest); // makes dest root } DirEntry destDe = dirEntry(dest); if(!destDe.isDir()){ throw new FileException( destDe.name, " is not a directory"); } string destName = destDe.name ~ '/'; if(!deSrc.isDir()){ copy(deSrc.name,dest); } else{ string srcRoot = deSrc.name; int srcLen = srcRoot.length; string destRoot = destName ~ baseName(deSrc.name); mkdir(destRoot); // make an array of the regular files only, also create the directory structure // Since it is SpanMode.breadth, can just use mkdir foreach(DirEntry e; dirEntries(deSrc.name, SpanMode.breadth, false)){ if (attrIsDir(e.linkAttributes)){ string destDir = destRoot ~ e.name[srcLen..$]; mkdir(destDir); } else{ files ~= e.name; } } // parallel foreach for regular files foreach(fn ; taskPool.parallel(files)) { string dfn = destRoot ~ fn[srcLen..$]; copy(fn,dfn); } } }
parallel copy directory, faster than robocopy
Attached is the source for a small parallel app that copies a source folder to a destination. It creates the directory structure first using the breadth ordering, then uses a parallel foreach loop with the taskPool to copy all the regular files in parallel. On my corei7, this copied a 1.5GB folder with around 36K entries to a destination in about 11.5 secs (src and dest on the same ssd drive). This was about a second better than robocopy, which is the fastest alternative I could find. The regular win7-64 copy takes 41 secs for the same folder. I'd like to add wildcard processing for the sources, but haven't found a good example.
Re: 4x speedup of recursive rmdir in std.file
Andrei Alexandrescu Wrote: > That's why I'm saying - let's leave the decision to the user. Take a > uint parameter for the number of threads to be used, where 0 means leave > it to phobos, and default to 0. > > Andrei > ok, here is another version. I was reading about the std.parallelism library, and I see I can do the parallel removes more cleanly. Plus the library figures out the number of cores and limits the taskpool size accordingly. It is only a slight bit slower than the other code. It looks like they choose 7 threads in the taskPool when you have 8 cores. So, I do the regular files in parallel, then pass it back to the original library code which cleans up the directory-only tree non-parallel. I also added in code to get the directory names from argv. module main; import std.stdio; import std.file; import std.datetime; import std.parallelism; int main(string[] argv) { if (argv.length < 2){ writeln ("need to specify one or more directories to remove"); return 0; } foreach(string dir; argv[1..$]){ writeln("removing directory: "~ dir ); auto st1 = Clock.currTime(); //Current time in local time. rmdirRecurse2(dir); auto st2 = Clock.currTime(); //Current time in local time. auto dif = st2 - st1 ; auto ts= dif.toString(); writeln("time:"~ts); } writeln("finished !"); return 0; } void rmdirRecurse2(in char[] pathname){ DirEntry de = dirEntry(pathname); rmdirRecurse2(de); } void rmdirRecurse2(ref DirEntry de){ string[] files; if(!de.isDir) throw new FileException( de.name, " is not a directory"); if(de.isSymlink()) remove(de.name); else{ // make an array of the regular files only foreach(DirEntry e; dirEntries(de.name, SpanMode.depth, false)){ if (!attrIsDir(e.linkAttributes)){ files ~= e.name ; } } // parallel foreach for regular files foreach(fn ; taskPool.parallel(files,1000)) { remove(fn); } // let the original code remove the directories only rmdirRecurse(de); } }
Re: 4x speedup of recursive rmdir in std.file
== Quote from Andrei Alexandrescu > > Suppose all the cores but one are already preoccupied with other stuff, or > > maybe you're even running on a single-core. Does the threading add enough > > overhead that it would actually go slower than the original single-threaded > > version? > > > > If not, then this would indeed be a fantastic improvement to phobos. > > Otherwise, I wonder how such a situation could be mitigated? > There's a variety of ways, but the simplest approach is to pass a > parameter to the function telling how many threads it's allowed to > spawn. Jay? > Andrei I can tell you that there are a couple of seconds improvement in the execution time running 16 threads vs 8 on the i7 on the ssd drive, so we aren't keeping all the cores busy with 8 threads. I suppose they are all blocked waiting for file system operations for some portion of time even with 8 threads. I would guess that even on a single core it would be an advantage to have multiple threads available for the core to work on when it blocks waiting for the fs operations. The previous results were on an ssd drive. I tried again on a Seagate sata3 7200rpm hard drive it took 2 minutes 12 sec to delete the same layout using OS, and never used more than 10% cpu. The one thread configuration of the D program similarly used less than 10% cpu but took only 1 minute 50 seconds to delete the same layout. Anything above 1 thread configuration on the sata drive began degrading the D program performance when using the hard drive. I'll have to scratch my head on this a while. This is for an optiplex 790, win7-64, using the board's sata for both the ssd and the hd. The extract of the zip using 7zip takes 1:55 on the seagate disk drive, btw ... vs about 50 secs on the ssd.
4x speedup of recursive rmdir in std.file
It would be good if the std.file operations used the D multi- thread features, since you've done such a nice job of making them easy. I hacked up your std.file recursive remove and got a 4x speed-up on a win7 system with corei7 using the examples from the D programming language book. Code is below with a hard-coded file I was using for test. I'm just learning this, so I know you can do better ... Delete time dropped from 1minute 5 secs to less than 15 secs. This was on an ssd drive. module main; import std.stdio; import std.file; import std.datetime; import std.concurrency; const int THREADS = 16; int main(string[] argv) { writeln("removing H:/pa10_120130/xx8"); auto st1 = Clock.currTime(); //Current time in local time. rmdirRecurse2("H:/pa10_120130/xx8"); auto st2 = Clock.currTime(); //Current time in local time. auto dif = st2 - st1 ; auto ts= dif.toString(); writeln("time:"); writeln(ts); writeln("finished !"); return 0; } void rmdirRecurse2(in char[] pathname){ DirEntry de = dirEntry(pathname); rmdirRecurse2(de); } void rmdirRecurse2(ref DirEntry de){ if(!de.isDir) throw new FileException( de.name, " is not a directory"); if(de.isSymlink()) remove(de.name); else{ Tid tid[THREADS]; int i=0; for(;i