Re: Redundant "g" flag for regex?
On Saturday, 23 June 2018 at 13:45:32 UTC, Basile B. wrote: On Saturday, 23 June 2018 at 12:17:08 UTC, biocyberman wrote: I get the same output with or without "g" flag at line 6: https://run.dlang.io/is/9n7iz6 So I don't understand when I have to use "g" flag. My bet is that Regex results in D are lazy so "g" doesn't make sense in this context however I'm able to see an effect with "match": match("12000 + 42100 = 54100", regex(r"(?<=\d)(?=(\d\d\d)+\b)", "")).writeln; match("12000 + 42100 = 54100", regex(r"(?<=\d)(?=(\d\d\d)+\b)", "g")).writeln; matchFirst would be like without "g" matchAll would be like with "g" I should have read the doc more thoroughly: https://dlang.org/phobos/std_regex.html#match Delegating the kind of operation to "g" flag is soon to be phased out along with the ability to choose the exact matching scheme. So case closed for me
Re: why explicitly use "flags" in regex does not work?
On Saturday, 23 June 2018 at 12:20:10 UTC, biocyberman wrote: I got "Error: undefined identifier flags" in here: https://run.dlang.io/is/wquscz Removing "flags =" works. I kinda found an answer. It's a bit of a surprise anyway: https://forum.dlang.org/thread/wokfqqbexazcguffw...@forum.dlang.org?page=1 Long story short, "named" parameter function calling still does not work. IHO, this goes against the readability tendency of D. And I still don't know how if I want to do this: auto func(string a = "a", string b = "b", string c = "c") { write("a: ", a, " b: ", b, " c: ", c); } void main() { func(); func(b ="B"); // Changes default for b only func(c = "C"); // Changes default for c only }
why explicitly use "flags" in regex does not work?
I got "Error: undefined identifier flags" in here: https://run.dlang.io/is/wquscz Removing "flags =" works.
Redundant "g" flag for regex?
I get the same output with or without "g" flag at line 6: https://run.dlang.io/is/9n7iz6 So I don't understand when I have to use "g" flag.
Re: Convert a huge SQL file to CSV
On Friday, 1 June 2018 at 10:15:11 UTC, Martin Tschierschke wrote: On Friday, 1 June 2018 at 09:49:23 UTC, biocyberman wrote: I need to convert a compressed 17GB SQL dump to CSV. A workable solution is to create a temporary mysql database, import the dump, query by python, and export. But i wonder if there is something someway in D to parse the SQL file directly and query and export the data. I imagine this will envolve both parsing and querying because the data is stored in several tables. I am in the process of downloading the dump now so I can’t give excerpt of the data. You don't need python: https://michaelrigart.be/export-directly-mysql-csv/ SELECT field1, field2 FROM table1 INTO OUTFILE '/path/to/file.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' FIELDS ESCAPED BY '\' LINES TERMINATED BY '\n'; Most important: INTO OUTFILE : here you state the path where you want MySQL to store the CSV file. Keep in mind that the path needs to be writeable for the MySQL user You can write a parser for SQL in D, but even if the import into mysql would take some time, it's only compute time and not yours. Regards mt. Ah yes, thank you Martin. I forgot that we can do a "batch" SQL query where mysql server can parse and run query commands. So no need for Python. But I am still currently waiting for the import to finish the importing of mysql dump. It took 18 hours and is still counting! The whole mysql database is 68GB at the moment. Can we avoid the import and query the database dump directly?
Convert a huge SQL file to CSV
I need to convert a compressed 17GB SQL dump to CSV. A workable solution is to create a temporary mysql database, import the dump, query by python, and export. But i wonder if there is something someway in D to parse the SQL file directly and query and export the data. I imagine this will envolve both parsing and querying because the data is stored in several tables. I am in the process of downloading the dump now so I can’t give excerpt of the data.
Re: Logging inside struct?
On Wednesday, 30 May 2018 at 10:07:35 UTC, Simen Kjærås wrote: On Wednesday, 30 May 2018 at 09:58:16 UTC, biocyberman wrote: [...] This line: writeln("got num: %s, of type: %s", num, typeof(num)); [...] Problem solved. Thanks Simen!
Logging inside struct?
How do I add logging for this struct? https://run.dlang.io/is/9N6N4o If not possible, what's the alternative?
Re: Translate C/C++ patern: return a pointer
On Thursday, 24 May 2018 at 17:44:19 UTC, Jacob Carlborg wrote: On 2018-05-24 11:10, biocyberman wrote: Thanks for the hints. `Read` in C++ and D are both classes. And the function is inside the class definition itself. In that case specifying the type as `Read` is the correct thing to do. Note that `new` always allocates on the heap and returns a pointer or reference type. Thanks. I did that and it worked correctly.
Re: return type of std.algorithm.mutation.reverse changed for good?
On Thursday, 24 May 2018 at 12:34:38 UTC, Steven Schveighoffer wrote: On 5/24/18 8:08 AM, rikki cattermole wrote: On 25/05/2018 12:06 AM, biocyberman wrote: I am testing with DMD 2.078.2 locally. This tiny snippet works on dlang's online editor: https://run.dlang.io/is/nb4IV4 But it does not work on my local dmd. import std.algorithm.mutation; import std.stdio; char[] arr = "hello\U00010143\u0100\U00010143".dup; writeln(arr.reverse); Error: template std.stdio.writeln cannot deduce function from argument types !()(void) The document says reverse returns a range: https://dlang.org/phobos/std_algorithm_mutation.html#reverse https://docarchives.dlang.io/v2.078.0/phobos/std_algorithm_mutation.html#reverse This doesn't quite tell the whole story. An array used to have a .reverse property that the compiler implemented, which returned the array after reversing it. So in history, this actually worked without std.algorithm. You get a nice history of what happens using "all dmd versions" on run.dlang.io. If you remove the "mutation" part from the import, you get: Up to 2.071.2: Success with output: 𐅃Ā𐅃olleh 2.072.2 to 2.074.1: Success with output: - onlineapp.d(6): Deprecation: use std.algorithm.reverse instead of .reverse property 𐅃Ā𐅃olleh - 2.075.1: Failure with output: - onlineapp.d(6): Error: template std.stdio.writeln cannot deduce function from argument types !()(void), candidates are: /path/to/dmd.linux/dmd2/linux/bin64/../../src/phobos/std/stdio.d(3553): std.stdio.writeln(T...)(T args) - 2.076.1 to 2.077.1: Failure with output: - onlineapp.d(6): Error: template std.stdio.writeln cannot deduce function from argument types !()(void), candidates are: /path/to/dmd.linux/dmd2/linux/bin64/../../src/phobos/std/stdio.d(3571): std.stdio.writeln(T...)(T args) - 2.078.1: Failure with output: - onlineapp.d(6): Error: template std.stdio.writeln cannot deduce function from argument types !()(void), candidates are: /path/to/dmd.linux/dmd2/linux/bin64/../../src/phobos/std/stdio.d(3657): std.stdio.writeln(T...)(T args) - Since 2.079.0: Success with output: 𐅃Ā𐅃olleh -Steve @Rikki and Steve: Many thanks for the good tips. I upgraded to dmd.2.080.0 now, but the server seems to be very slow. It's another story anyway. % ./install.sh dmd !9767 Downloading and unpacking http://downloads.dlang.org/releases/2.x/2.080.0/dmd.2.080.0.linux.tar.xz curl: (28) Operation too slow. Less than 1024 bytes/sec transferred the last 30 seconds Failed to download 'http://downloads.dlang.org/releases/2.x/2.080.0/dmd.2.080.0.linux.tar.xz'
return type of std.algorithm.mutation.reverse changed for good?
I am testing with DMD 2.078.2 locally. This tiny snippet works on dlang's online editor: https://run.dlang.io/is/nb4IV4 But it does not work on my local dmd. import std.algorithm.mutation; import std.stdio; char[] arr = "hello\U00010143\u0100\U00010143".dup; writeln(arr.reverse); Error: template std.stdio.writeln cannot deduce function from argument types !()(void) The document says reverse returns a range: https://dlang.org/phobos/std_algorithm_mutation.html#reverse
Re: Translate C/C++ patern: return a pointer
On Thursday, 24 May 2018 at 08:58:02 UTC, Nicholas Wilson wrote: On Thursday, 24 May 2018 at 08:16:30 UTC, biocyberman wrote: [...] it looks like Read is a D class? in which case it already returns by reference. If you make Read a struct then all you need do is change the function signature from Read reverseComplement() to Read* reverseComplement() about the function body use mQuality.dup.representation.reverse; [...] does not do what you want it to do. Thanks for the hints. `Read` in C++ and D are both classes. And the function is inside the class definition itself.
Re: Efficient idiom for fastest code
On Wednesday, 23 May 2018 at 03:12:52 UTC, IntegratedDimensions wrote: On Wednesday, 23 May 2018 at 03:00:17 UTC, Nicholas Wilson wrote: [...] I knew someone was going to say that and I forgot to say DON'T! Saying to profile when I clearly said these ARE cases where they are slow is just moronic. Please don't use default answers to arguments. This was a general question about cases on how to attack a problem WHEN profiling says I need to optimize. Your SO 101 answer sucks! Sorry! To prove to you that your answer is invalid: I profile my code, it says that it is very slow and shows that it is do to the decision checking... I then I have to come here and write up a post trying to explain how to solve the problem. I then get a post telling me I should profile. I then respond I did profile and that this is my problem. A lot of wasted energy when it is better to know a general attack strategy. Yes, some of us can judge if code is needed to be optimized before profiling. It is not difficult. Giving a generic answer that always does not apply and is obvious to anyone trying to do optimization is not helpful. Everyone today pretty must does not even optimize code anymore... this isn't 1979. It's not ok to keep repeating the same mantra. I guess we should turn this in to a meme? The reason I'm getting on to you is that the "profile before optimization" sounds a bit grade school, specially since I wasn't talking anything about profiling but a general programming pattern speed up code, which is always valid but not always useful(and hence that is when profiling comes in). Very challenging. Wish I could help you out with the tough work. People don't share the same context, especially via online, so it is necessary to clarify the problem so other can understand and help. I've been beaten on stackoverflow many times for not providing sufficient information for my questions. It seems like one can do the reverse here at forum.dlang.org. With that said, I think you know what you are doing, and you can do it. Just relax and give it more time and experimentation.
Translate C/C++ patern: return a pointer
Some C and C++ projects I am working on use pointers and references extensively: to pass as function arguments, and to return from a function. For function argument I would use `ref`, but for return types, I can't use `ref` and can't return a pointer. What should be the proper way to handle this? Do I have to change function signature (i.e. return type) For example, the following function: ``` //C++ version, from: https://github.com/bioslaD/fastp/blob/orig/src/read.cpp#L69 Read* Read::reverseComplement(){ Sequence seq = ~mSeq; string qual; qual.assign(mQuality.rbegin(), mQuality.rend()); string strand = (mStrand=="+") ? "-" : "+"; return new Read(mName, seq, strand, qual); } // D version: Read reverseComplement(){ Sequence seq = ~mSeq; dchar[] qual = cast(dchar[])mQuality.dup; reverse(qual); string strand = (mStrand=="+") ? "-" : "+"; Read newRead = new Read(mName, seq, strand, cast(string)qual); // return &newRead does not work: returning `& newRead` escapes a reference to local variable newRead return newRead; } ``` Let's not focus on the function body, I don't know how to handle the return type in cases like this for the D version.
Re: Coding Challenges at Dconf2018: Implement Needleman–Wunsch and Smith–Waterman algorithms
On Friday, 4 May 2018 at 14:13:19 UTC, Luís Marques wrote: On Monday, 30 April 2018 at 18:47:21 UTC, biocyberman wrote: I am attending Dconf 2018 and giving a talk there on May 4. Link: https://dconf.org/2018/talks/le.html. It will be very interesting to talk about the outcome of the following challenges. If we can't have at least 3 solutions by three individuals by 10:00 GMT+2 May 4, I will have to postpone the deadline one week. Please see below for more details. Too bad this didn't go on announce. I'm looking forward to it. I'll try to send my solution if this is postponed. Hi Louis, I will wait :)
Re: Coding Challenges at Dconf2018: Implement Needleman–Wunsch and Smith–Waterman algorithms
On Monday, 30 April 2018 at 20:34:41 UTC, Steven Schveighoffer wrote: On 4/30/18 2:47 PM, biocyberman wrote: Hellow D community. I am attending Dconf 2018 and giving a talk there on May 4. Link: https://dconf.org/2018/talks/le.html. It will be very interesting to talk about the outcome of the following challenges. If we can't have at least 3 solutions by three individuals by 10:00 GMT+2 May 4, I will have to postpone the deadline one week. Please see below for more details. This should really go in announce ;) -Steve Thought about that too. But then I imagined the "announce" is for official use of the D dev team and forum admins. These challenges are for learning and for fun, therefore I put them topic here. Anyway, I can't move this to announce now.
Coding Challenges at Dconf2018: Implement Needleman–Wunsch and Smith–Waterman algorithms
Hellow D community. I am attending Dconf 2018 and giving a talk there on May 4. Link: https://dconf.org/2018/talks/le.html. It will be very interesting to talk about the outcome of the following challenges. If we can't have at least 3 solutions by three individuals by 10:00 GMT+2 May 4, I will have to postpone the deadline one week. Please see below for more details. Implement Needleman–Wunsch and Smith–Waterman algorithm ══ • Introduction about alignment problems: [http://bit.do/seqalign] (notes) and [http://bit.do/alignslides] (slides) or search on wikipedia. • Send zipped source code of the solution via email to "biocyberman at gmail dot com" with "DAC2018" in the subject line. • Implement in D. • Output in [MSA] (global alignment) or BAM (local alignments) • Wining criteria: Combination of readability, reusability, scalability and speed • License: GPLv3. I will publish the solutions on Github. • Prize: USD100+ (more if others also want to sponsor?) for each problem (global or local aligment), sent via Papal or directly if we meet at Dconf2018. • Deadline: At least 3 solutions until 10:00 GMT+2 May 4, or 24:00 GMT+2, Saturday 12 May 2018 [MSA] https://en.wikipedia.org/wiki/Multiple_sequence_alignment
Re: Tuts/Aritcles: Incrementasl C++-to-D conversion?
On Thursday, 22 February 2018 at 08:43:24 UTC, ketmar wrote: Nick Sabalausky (Abscissa) wrote: [...] from my experience (various codebases up to middle size, mostly C, some C++): fsck the "one module at a time" idea! even in D modules are interwined, and in C and C++ they're even more so. besides, converting tests is tedious, it is much funnier to have something working. so, i'm usually converting alot of code, up to the whole codebase. it is not fun when compler spits 100500 errors, but when it finally stops... oh, joy! trick: use 'sed' (or your favorite regexp search-and-replace tool) alot. basically, before HDD crash i almost had a set of scripts that does 80-90 percents of work translating C to D with sed. ;-) then use editor with "jump to error line" support, and simply compile your code, fixing errors one by one. tip: try to not rewrite code in any way until it works. i know how tempting it to replace "just this tiny thing, it is so ugly, and in D we have a nice idiom!" NEVAR. this is by far the most important thing to remember (at least for me), so i'll repeat it again: no code modifications until it works! personal memories: C code often using things like `a == &arr[idx]`, where idx can go just past the last array element. it got me when i was doing enet conversion. nasty trick. otherwise, sweat and blood, and patience. These are good starting point so we don't get lost in the process. Still not much exprience doing, but I think these pieces of advice are especially true if the codebase is big or complicated, making it difficult to understand what the C/C++ is doing. When we don't understand the code, re-writing from scratch is not possible.
check mountpoint status and send email on timeout/failure?
For someone using NFS or some other remote filesystems, one may have experienced many times the nasty silent hang. For example, if I run `ls /mnt/remote/nfsmount`, and the remote NFS server is down while /mnt/remote/nfsmount was mounted, it will take very long time or forever for the `ls` command to return an error. Imagine if it were not `ls` but a data producing program, or user's home directly, it will be very inconvenient. Since I want to learn D, I want to write a program that does: 1. Check a path and to see it is a mount point. If it is not a mount point, try to mount it, and send an email. If it is a mount point, go to step 2. 2. If it is amount point, but fails to response after a certain time period (e.g 5 seconds), then send an email. I know nothing about how to write it in D, or which library to use. So, some help please.
Re: fasta parser with iopipe?
On Wednesday, 23 August 2017 at 13:06:36 UTC, Steven Schveighoffer wrote: On 8/23/17 5:53 AM, biocyberman wrote: [...] I'll respond to all your questions with what I would do, instead of answering each one. I would suggest an approach similar to how I approached parsing JSON data. In your case, the protocol is even simpler, as there is no nesting. 1. The base layer iopipe should be something that tokenizes the input into reference-based structs. If you look at the jsoniopipe library (https://github.com/schveiguy/jsoniopipe), you can see that the lowest level finds the start of the next JSON token. In your case, it should be looking for >[...] This code is pretty straightforward, and roughly corresponds to this: while(cannot find start sequence in stream) stream.extend; make sure you aren't re-doing work that has already been done (i.e. save the last place you looked). Once you have this, you can deduce each packet by the data between the starts. 2. The next layer should validate and parse the data into structs that contain referencing data from the buffer. I recommend not using actual ranges from the buffer, but information on how to build the ranges. The reason for this is that the buffer can move while being streamed by iopipe, so your data could become invalid if you take actual references to the buffer. If you look in the jsoniopipe library, the struct for storing a json item has a start and length, but not a reference to the buffer. Potentially, you could take this mechanism and build an iopipe on top of the buffered data. This iopipe's elements would be the items themselves, with the underlying buffer hidden in the implementation details. Extending would parse out another set of items, releasing would allow those items to get reclaimed (and the underlying stream data). This is something I actually wanted to explore with jsoniopipe but didn't have time before the conference. I probably will still build it. 3. build your real code on top of that layer. What do you want to do with the data? Easiest thing to do for proof of concept is build a range out of the functions. That can allow you to test performance with your lower layers. One of the awesome things about iopipe is testing correctness is really easy -- every string is also an iopipe :) I actually worked with a person at dconf on a similar (maybe identical?) format and explained how it could be done in a very similar way. He was looking to remove data that had a low percentage of correctness (or something like that, not in bioinformatics, so I don't understand the real semantics). With this mechanism in hand, the decompression is pretty easy to chain together with whatever actual stream you have, just use iopipe.zip. Good luck, and email me if you need more help (schvei...@yahoo.com). -Steve Hi Nic and Steve Thank you both very much for your inputs. I am trying to make use of them. I will try to adapt jsoniopipe for fasta. This is on going and broken code: https://github.com/biocyberman/fastaq . PRs are welcome. @Nic: I am too very interested in bringing D to bioinformatics. I will be happy to share information I have. Feel free to email me at vql(.at.)rn.dk and we talk further about it. @Steve: Yes we talked at dconf 2017. I had to other things so D learning got slow down. I am trying with Fasta format before jumping to Fastq again. The jsoniopipe is full feature, and relatively small project, which can be used to study case. However there are some aspects I still haven't fully understood. Would I be lucky enough to have you make the current broken code of fastaq to work? :) That will definitely save me time and headache dealing with newbie problems.
Re: Parameter File reading
On Wednesday, 23 August 2017 at 10:25:48 UTC, Vino.B wrote: Hi All, Can anyone provide me a example code on how to read a parameter file and use those parameter in the program. From, Vino.B Parameter file is a plain text file, with some structure. I've seen in other languages people use YAML file for configuration. So you can also use YAML in D: https://github.com/dlang-community/D-YAML. Check the examples directory for inspiration.
fasta parser with iopipe?
I lost my momentum to learn D and want to gain it up again. Therefore I need some help with this seemingly simple task: # Fasta sequence \>Entry1_ID header field1|header field2|... CAGATATCTTTGATGTCCTGATTGGAAGGACCGTTGGCCACCCTTAGGCAG TGTATACTCTTCCATAAACGAGCTATTAGTTATGAGGTCCGTAGATTGGGG TGACGGAATTCGGCCGAACGGGAAAGACGGACATCTAGGTATCCTGAGCACGGTT GCGCGTCCGTATCAAGCTCCTCTTTATAGGG \>Entry2_ID header field1|header field4|... GTTACTGTTGGTCGTAGAGCCCAGAACGGGTTGGGCAGATGTACGACAATATCGCT TAGTCACCCTTGGGCCACGGTCCGCTACCTTACAGGAATTGAGA \>Entry3_ID header field1|header field2|... GGCAGTACGATCGCACGACGTGAACGATTGGTAAACCCTGTGGCCTGTGAGC GACGCTTTAATGGGAAATACGCGCCCATAACTTGGTGCGA # Some characteristics: - Entry_ID is >[[:alphanumeric:]]. Where '>' marks the entry start. In this post I have to put an escape character (\) to make the '>' visible. - Headers may contain annotation information separated by some delimiter (i.e. | in this case). - Entry ID and header is a single line, which does not contain newline characters. - Sequence under the header line is [ATCGN\n]* (Perl regex). - A fasta file can be plain-text or gzip compressed. # Goals: Write a parser that uses Dlang range with iopipe library for performance and ease of use. A big fasta file can be dozens of gigabytes. # Questions: 1. How do I model a fasta entry with a struct or class? 2. How to I implement a range of fasta entries with iopipe. A range in this case can be a forward range, but preferably a random access range. 3. I want to do with range to explore the power and elegance of ranges. But if performance is a big concern, what can I do alternatively?
Lazy range, extract only Nth element, set range size constraint?
Following is the code for a more generalized Fibonacci range. Questions: 1. How do I get only the value of the Nth (i.e. N = 25) element in an idiomatic way? 2. Can I set constraints in the range so that user gets warning if he asks for Nth element greater than a limit, say N> 30; or if the actually range value at N is greater than datatype limit (e.g. max long)? Maybe this should be done outside of the range, i.e. do check before accessing the range? #!/usr/bin/env rdmd import std.stdio : writeln; long multifactor = 4; int elemth = 25; struct FibonacciRange { long first = 1; long second = 1; bool empty() const @property { // how to stop at n = 30? return false; } void popFront() { long tmp = 0; tmp = first*multifactor + second; first = second; second = tmp; } long front() const @property { return first; } } void main() { import std.range : take; import std.array : array; FibonacciRange fib; auto fib10 = take(fib, elemth); long[] the10Fibs = array(fib10); }
Re: Which editor to use for editing DDOCs?
On Tuesday, 23 May 2017 at 10:10:24 UTC, Russel Winder wrote: On Tue, 2017-05-23 at 07:40 +, biocyberman via Digitalmars-d-learn wrote: […] Adding DDOC support for D Mode require some more work obviously. I will see if I can make some changes to that. For the time being, I would like to know which editors people are using. Or is it a plain black and white editor ? Until IntelliJ IDEA and/or CLion works for D, Emacs is my only D editor. Actually Emacs is my favorite editor. What's you setup to work with DDOCs?
Re: Which editor to use for editing DDOCs?
On Monday, 22 May 2017 at 15:33:36 UTC, Russel Winder wrote: On Mon, 2017-05-22 at 14:14 +, biocyberman via Digitalmars-d-learn wrote: Which one do you use? I am using Linux and Emacs for editing other D source file. But the DDOC syntaxes and keywords are not well high-lighted. There has been no work on handling DDOC comments specially in the Emacs D Mode as far as I know. There is some attempt to do things for Doxygen, but I am not sure how successful that is as I am not using it. It is not clear to me that all D's comment mechanisms are handled as they should be. It should be possible, albeit non-trivial I suspect, to add support for all the comment forms and the DDOC macro markup. The question does anyone have the energy to get stuck into the E-Lisp to achieve the goal – and write the tests to prove it? Adding DDOC support for D Mode require some more work obviously. I will see if I can make some changes to that. For the time being, I would like to know which editors people are using. Or is it a plain black and white editor ?
Which editor to use for editing DDOCs?
Which one do you use? I am using Linux and Emacs for editing other D source file. But the DDOC syntaxes and keywords are not well high-lighted.
Re: [OT] #define
On Monday, 22 May 2017 at 13:11:15 UTC, Andrew Edwards wrote: Sorry if this is a stupid question but it eludes me. In the following, what is THING? What is SOME_THING? #ifndef THING #define THING #endif #ifndef SOME_THING #define SOME_THING THING * #endif Is this equivalent to: alias thing = void; alias someThing = thing*; Thanks, Andrew Hi Andrew This is why need to learn more about C and C++ when I want to port them to D. You can get a bit of reading about C preprocessor here: https://www.tutorialspoint.com/cprogramming/c_preprocessors.htm Regarding your question: I've been porting some C code with macros, they can be translated into D as aliases, functions, structs, templates, mixins etc. So maybe an excerpt from the real code would be more straight forward.
Re: Code improvement for DNA reverse complement?
On Monday, 22 May 2017 at 10:35:36 UTC, ag0aep6g wrote: On 05/22/2017 10:58 AM, biocyberman wrote: [...] For reference, here is the version of revComp3 I commented on: string revComp3(string bps) { const N = bps.length; enum chars = [Repeat!('A'-'\0', '\0'), 'T', Repeat!('C'-'A'-1, '\0'), 'G', Repeat!('G'-'C'-1, '\0'), 'C', Repeat!('T'-'G'-1, '\0'), 'A']; [...] Very illustrative. I could easily miss and I did miss this subtle but important aspect. I wonder how D should widen the 'pit of success' that Scott Meyers mentioned about more than once. A take home message for myself, if one ever use an array as a lookup table, make it 'static immutable. And enum array does not make sense'. And in Ali's book: Consider the hidden cost of enum arrays and enum associative arrays. Define them as immutable variables if the arrays are large and they are used more than once in the program. One thing also became clear: 'is' is not '=='. Therefore writeln([10,20] is [10,20]); /* false */ writeln([10,20] == [10,20]); /* true */ I did not notice that because I haven't come across 'is' so often.
Re: Code improvement for DNA reverse complement?
On Monday, 22 May 2017 at 06:50:45 UTC, Biotronic wrote: On Friday, 19 May 2017 at 22:53:39 UTC, crimaniak wrote: On Friday, 19 May 2017 at 12:55:05 UTC, Biotronic wrote: revComp6 seems to be the fastest, but it's probably also the least readable (a common trade-off). Try revComp7 with -release :) string revComp7(string bps) { char[] result = new char[bps.length]; auto p1 = result.ptr; auto p2 = &bps[$ - 1]; enum AT = 'A'^'T'; enum CG = 'C'^'G'; while (p2 > bps.ptr) { *p1 = *p2 ^ ((*p2 == 'A' || *p2 == 'T') ? AT : CG); p1++; p2--; } return result.assumeUnique; } In fact, when the size of the sequence is growing time difference between procedures is shrinking, so it's much more important to use memory-efficient presentation than to optimize logic. revComp7 is pretty fast, but I followed ag0aep6g's advice: On Friday, 19 May 2017 at 13:38:20 UTC, ag0aep6g wrote: Use `static immutable` instead. It still forces compile-time calculation, but it doesn't have copy/paste behavior. Speeds up revComp3 a lot. Also, with DMD (2.073.0) -release -O instead of -debug from this point. I'd blame someone else, but this is my fault. :p Anyways, full collection of the various versions I've written, plus crimaniak's revComp7 (rebranded as revComp8, because I already had 7 at the time): https://gist.github.com/Biotronic/20daaf0ed1262d313830bc8cd4199896 Timings: revComp0:158 ms, 926 us revComp1: 1 sec, 157 ms, 15 us revComp2:604 ms, 37 us revComp3: 18 ms, 545 us revComp4: 92 ms, 293 us revComp5: 86 ms, 731 us revComp6: 94 ms, 56 us revComp7:917 ms, 576 us revComp8: 62 ms, 917 us This actually matches my expectations - the table lookup version should be crazy fast, and it is. It beats even your revComp7 (revComp8 in the table). LDC (-release -O3) timings: revComp0: 166 ms, 190 us revComp1: 352 ms, 917 us revComp2: 300 ms, 493 us revComp3: 10 ms, 950 us revComp4: 148 ms, 106 us revComp5: 144 ms, 152 us revComp6: 142 ms, 307 us revComp7: 604 ms, 274 us revComp8: 26 ms, 612 us Interesting how revComp4-6 got slower. What I really wanted to see with this though, was the effect on revComp1, which uses ranges all the way. Wow!!! Someone grab me a chair, I need to sit down. I can't tell enough how grateful I am to all you guys. This is so much fun to learn. Some specific comments and replies: @Nicolas Wilson: Your explanation of the enum is clear and very helpful. I can recall to the same technique used in kh_hash in samtools and the associated. With that said, the chars enum is only to 'T' (85) elements. Regarding BioD, I have plan to work on it to add some more functionality. But first I need to sharpen my D skills a bit more. @Laeeth Isharc: I do like ldc as well. I've came across several projects that use ldc, and learnt that it is a good choice for speed in general. @ag0aep6g You fell into a trap there. The value is calculated at compile time, but it has >copy/paste-like behavior. That is, whenever you use `chars`, the code behaves as if you >typed out the array literal. That means, the whole array is re-created on every iteration. Use `static immutable` instead. It still forces compile-time calculation, but it doesn't > have copy/paste behavior. Speeds up revComp3 a lot. With 'iteration' here you mean running lifetime of the function, or in other words, each one of the 10_000 cycles in the benchmark? Could you provide some more reading for what you are telling here? I can only guess it is intrinsic behavior of an 'enum'. @crimaniak, Nicolas Wilson and Biotronic: I've realized before the reversible/negate property of XOR: 'A'^'T'^'T' = 'A' and 'A'^'T'^'A' = 'T'; To help myself and see it in bit patterns, I wrote this snippet: void main(){ enum AT = 'A'^'T'; enum CG = 'C'^'G'; enum chars = [Repeat!('A'-'\0', '\0'), 'T', Repeat!('C'-'A'-1, '\0'), 'G', Repeat!('G'-'C'-1, '\0'), 'C', Repeat!('T'-'G'-1, '\0'), 'A']; writef("BIN %0 8b DEC %d\n", 'A', 'A'); writef("BIN %0 8b DEC %d\n", 'T', 'T'); writef("XOR %0 8b DEC %d\n", AT, AT); writef("TOR %0 8b DEC %d\n", AT^'T', AT^'T', AT^'T'); writef("AOR %0 8b DEC %d\n", AT^'A', AT^'A', AT^'A'); foreach (i, c; chars){ if (i >= 60) writef("%02d: %0 8b, %d\n",i, c, c); // elements before 60 are all \0 } } // Output BIN 0101 DEC 65 BIN 01010100 DEC 84 XOR 00010101 DEC 21 TOR 0101 DEC 65 AOR 01010100 DEC 84 60: , 0 61: , 0 62: , 0 63: , 0 64: , 0 65: 01010100, 84 66: , 0 67: 01000111, 71 68: , 0 69: , 0 70: , 0 71: 0111, 67 72: , 0 73: , 0 74: , 0 75: , 0 76: , 0 77: , 0 78: , 0 79: , 0 80: , 0 81: , 0 82: , 0 83: , 0 84: 0101, 65
Re: Code improvement for DNA reverse complement?
On Friday, 19 May 2017 at 09:17:04 UTC, Biotronic wrote: On Friday, 19 May 2017 at 07:29:44 UTC, biocyberman wrote: [...] Question about your implementation: you assume the input may contain newlines, but don't handle any other non-ACGT characters. The problem definition states 'DNA string' and the sample dataset contains no non-ACGT chars. Is this an oversight my part or yours, or did you just decide to support more than the problem requires? [...] Firstly, thank you for showing me various solutions, and even cool benchmark code. To answer you questions: Yes I assume the input file would realistically contain newlines, even though the problem does not care about them. I also thought about non-CATG bases, but haven't taken care of those cases. In reality we should deal with at least ambiguous bases (N). I ran your code and also see that switch is faster than AA (i.e. revComp0 is the fastest). And Stefan is right about this. Some follow up questions: 1. Why do we need to use assumeUnique in 'revComp0' and 'revComp3'? 2. What is going on with the trick of making chars enum like that in 'revComp3'?
Re: Code improvement for DNA reverse complement?
On Friday, 19 May 2017 at 07:46:13 UTC, Stefan Koch wrote: On Friday, 19 May 2017 at 07:29:44 UTC, biocyberman wrote: I am solving this problem http://rosalind.info/problems/revc/ as an exercise to learn D. This is my solution: https://dpaste.dzfl.pl/8aa667f962b7 Is there some D tricks I can use to make the `reverseComplement` function more concise and speedy? Any other comments for improvement of the whole solution are also much appreciated. I think doing a switch or even a if-else chain would be faster then using an AA. Thank you Stefan. I will try that and benchmark the two implementations. I used AA approach because it looks more readable to me.
Code improvement for DNA reverse complement?
I am solving this problem http://rosalind.info/problems/revc/ as an exercise to learn D. This is my solution: https://dpaste.dzfl.pl/8aa667f962b7 Is there some D tricks I can use to make the `reverseComplement` function more concise and speedy? Any other comments for improvement of the whole solution are also much appreciated.
Re: Fails to use testFilename in unittest
On Thursday, 18 May 2017 at 10:05:41 UTC, Jonathan M Davis wrote: On Thursday, May 18, 2017 09:56:36 biocyberman via Digitalmars-d-learn wrote: [...] My point is that it's a private function for testing std.stdio and not intended to be part of the public API or be used by anyone else (it's not even used anywhere else in Phobos). None of the functions in Phobos that do that sort of thing are in the public API. You can copy-paste testFilename (and std.file.deleteme, since it uses that) into your own code and use them if you like, but the ones in Phobos are just there for Phobos. The only unit testing-specific functionality that Phobos provides beyond what the language itself has is in std.exception with functions such as assertThrown. - Jonathan M Davis Understood. I copied the code. Thanks
Re: Fails to use testFilename in unittest
On Thursday, 18 May 2017 at 09:49:26 UTC, Jonathan M Davis wrote: On Thursday, May 18, 2017 09:40:33 biocyberman via Digitalmars-d-learn wrote: [...] Actually, it's not used all over the place in Phobos. It's only used std.stdio, where it's a private function in a version(unittest) block. It's not part of the public API. And other modules that need something similar have their own solution. [...] That's exactly the code I looked at. And yes, I checked std.stdio to see many occurrences of testFilename.
Re: Fails to use testFilename in unittest
This is the compile error message by the way: dmd -unittest ./testFile.d !6009 testFile.o: In function `_D8testFile14__unittestL4_1FZv': ./testFile.d:(.text._D8testFile14__unittestL4_1FZv+0x1a): undefined reference to `_D3std5stdio12testFilenameFNfAyamZAya' collect2: error: ld returned 1 exit status Error: linker exited with status 1
Fails to use testFilename in unittest
There is a ongoing discussion about temp file over here: http://forum.dlang.org/thread/sbehcxusxxibmpkae...@forum.dlang.org I have a question about generating a temporary file to write test data. I can create my own file and use it but just want to use the existing tool for convenience. testFilename() is used all over phobos. So, I don't understand why it does not work on my code. The following code fails to compile. % cat testFile.d #!/usr/bin/env rdmd import std.stdio; unittest{ static import std.file; auto deleteme = testFilename(); scope(failure) printf("Failed test at line %d\n", __LINE__); scope(exit) std.file.remove(deleteme); // Do some stuffs with open or writing and reading of the temp file. assert(true); } void main(string [] args){ writeln("Main"); }
Re: How to move append to an array?
On Monday, 15 May 2017 at 21:38:52 UTC, Yuxuan Shui wrote: Suppose I have a struct A { @disable this(this); } x; How do I append it into an array? Do I have to do array.length++; moveEmplace(x, array[$-1]); ? Judging form the way you write the struct. It is of C/C++ style. With that said, it's not clear what you are trying to do. There is a basic reference about array here: http://dlang.org/spec/arrays.html And this works: cat arrayappend.d // arrayappend.d content unittest { auto a = [1, 2]; a ~= 3; assert( a == [1, 2, 3]); } // Finish content Running test: rdmd -unittest -main arrayappend.d No error message means the test passes.
Re: Convert this C macro kroundup32 to D mixin?
On Saturday, 8 April 2017 at 21:34:30 UTC, Ali Çehreli wrote: You can mixin declarations with a template but I don't see how it can help here. A string mixin would work but it's really ugly at the use site: string roundUp(alias x)() if (is (typeof(x) == uint)) { import std.string : format; return format(q{ --%1$s; %1$s |= %1$s >> 1; %1$s |= %1$s >> 2; %1$s |= %1$s >> 4; %1$s |= %1$s >> 8; %1$s |= %1$s >> 16; ++%1$s; }, x.stringof); } void main() { uint i = 42; mixin (roundUp!i);// <-- Ugly assert(i == 64); } Compare that to the following natural syntax that a function provides: void roundUp(ref uint x) { // ... } void main() { uint i = 42; i.roundUp();// <-- Natural } Ali You made the point, it looks really ugly :). However, sometimes if this ugliness offer better performance, I would - in a desperate mood - use it. That's only 'if'. I put two other variant to a test, and this ugly version does worst as well. You can check out here: https://gist.github.com/biocyberman/0ad27721780e66546cbb6a39c0770d99 Maybe it is because string formating cost. Moving the import statement out of the function does not speed things up.
Re: Convert this C macro kroundup32 to D mixin?
On Saturday, 8 April 2017 at 11:24:02 UTC, Nicholas Wilson wrote: The ':' means that it applies to everything that follows it, so while it doesn't matters in this example if you had pragma( inline, true ): int kroundup32( int x) { ... } auto someVeryLargeFunction( Args args) { // ... } and then you used someVeryLargeFunction in a bunch of places then that would cause a lot of binary bloat. That's big difference! Thank you for pointing this out for me. if you want the the function to affect the variable use a 'ref' as in void kroundup32(T)(ref T x) { pragma(inline, true); --(x); (x)|=(x)>>1; (x)|=(x)>>2; (x)|=(x)>>4; (x)|=(x)>>8; (x)|=(x)>>16; return ++(x); } int main(){ int num = 31; writeln("Before: ",num); // 31 kroundup32(num); writeln("After: ", num); //32 return 0; } is it a good idea? I would not think it is necessary. As an aside the C version has parentheses around the "x" because it is a macro and it is substituted as text not symbolically, they are not needed in D. This thing now is clear and settled while I try to navigate my mind around many new things. Really appreciate your help, Nicolas.
Re: Convert this C macro kroundup32 to D mixin?
On Saturday, 8 April 2017 at 10:09:47 UTC, Mike Parker wrote: T kroundup32(T)(T x) { pragma(inline, true); --(x); (x)|=(x)>>1; (x)|=(x)>>2; (x)|=(x)>>4; (x)|=(x)>>8; (x)|=(x)>>16; return ++(x); } I also came up with this: import std.stdio; pragma( inline, true ): static int kroundup32( int x){ --(x); writeln("X: ",x); (x)|=(x)>>1; writeln("X: ",x); (x)|=(x)>>2; writeln("X: ",x); (x)|=(x)>>4; writeln("X: ",x); (x)|=(x)>>8; writeln("X: ",x); (x)|=(x)>>16; writeln("X: ",x); ++(x); writeln("X: ",x); return x; } int main(){ int num = 31; num = kroundup32(num); writeln("Num:", num); return 0; } Is this way of using pragma the same as your way? I am still new to this so I want to understand more. And is it a good idea to do manipulate 'num' directly so I can omit 'return' and avoid re-assigning statement? That's what C version does.
Re: Convert this C macro kroundup32 to D mixin?
On Saturday, 8 April 2017 at 10:02:01 UTC, Mike Parker wrote: I would expect if you implement it as a function the compiler will inline it. You can always use the pragma(inline, true) [1] with -inline to verify. [1] https://dlang.org/spec/pragma.html#inline Thanks for mentioning pragma. However, anyway to do it with mixin? It's so cool so I want to do more stuffs with it :)
Convert this C macro kroundup32 to D mixin?
What is the D mixin version equivalent to this macro: #define kroundup32(x) (--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x)) The macro looks cryptic. What the macro does has been explained here: http://stackoverflow.com/questions/3384852/could-someone-help-explain-what-this-c-one-liner-does But I still don't know how to convert that to D mixin. I would like 'mixin' instead of a function is to avoid function call overhead. Also because it is short, so I think a mixin is enough, not a 'template mixin'.
Re: Using template mixin, with or without mixin ?
On Friday, 7 April 2017 at 23:53:12 UTC, Ali Çehreli wrote: The difference is that you can't use funcgen as a regular template: funcgen!(void, void); Error: template instance funcgen!(void, void) mixin templates are not regular templates I think it's good practice to use 'mixin template' if it's intended to be so. Ali Thanks for a very concise answer.
Using template mixin, with or without mixin ?
I want to use mixin to generate function in-place. In template declaration, I can see 'mixin' keyword is optional. Is it true? What is the difference and when I must use one way over another? This is my program: // This works with and without 'mixin' attribute. mixin template funcgen(T, U){ T func1(string pr2){ writeln("Func1: ", pr2); } U func2(string pr3){ writeln("Func2: ", pr3); } } int main(string[] args){ mixin funcgen!(void, void); func1("func1"); func2("func2"); return 0; }
Re: Covert a complex C header to D
On Monday, 3 April 2017 at 23:10:49 UTC, Stefan Koch wrote: On Monday, 3 April 2017 at 11:18:21 UTC, Nicholas Wilson wrote: prefer template over string mixins where possible. This will make the code much more readable. My advise would be the opposite. templates put much more pressure on the compiler then string-mixins do. Also the code that templates expand to is hard to get. Whereas the code that string mixins expand to can always be printed one way or another. Could you elaborate more about this (i.e. show where mixins is more readable, debugable and less stressful to the compiler) ? This kind of information is good for tuning stage later. My goal now is to finish the conversion and running of the header and the test code (https://github.com/attractivechaos/klib/blob/master/test/khash_test.c). @Ali: I noticed the -E option recently but haven't really used it. I now generated the pre-processed source and try to make use of it.
Re: OT: It is convert, not covert
On Tuesday, 4 April 2017 at 05:29:42 UTC, Ali Çehreli wrote: Covert has a very different meaning. :) Ali Thanks Ali. My fingers argued they are the same :) And I can't find a way to edit my post after posting. I would love to have your input. I am revisited your book several times to read relevant sections. But these complex macros are still holding me back.
Re: Covert a complex C header to D
On Monday, 3 April 2017 at 00:00:04 UTC, Nicholas Wilson wrote: On Sunday, 2 April 2017 at 21:43:52 UTC, biocyberman wrote: template __KHASH_TYPE(string name){ "struct kh_" ~ name ~"_t { " ~ "khint_t n_buckets, size, n_occupied, upper_bound; " ~ "khint32_t *flags; " ~ "khkey_t *keys; " ~ "khval_t *vals; " ~ "}" } Not that you'll get bitten by it in this case but in D the pointer declarator * is left associative. i.e. in C int *pInt, Int; // "Int" is int not an int* int *pInt, Int[3]; // Int is a static array of 3 ints. but in D misleading: int *pInt, Int; // Int is an int*!! wrong: int *pInt, three_Ints[3]; // Error cannot mix declared types not misleading int* pInt, pInt2; // BOTH int* int*pInt; //pointer to int int[3] three_Ints; // static array of 3 ints. Thank you for some excellent tips, Nicholas Wilson. I made this repo https://github.com/biocyberman/klibD. You are more than welcome to make direct contributions with PRs there. The next milestone want to reach is to complete to conversion of khash.d and have to test code with it.
Covert a complex C header to D
khash.h (http://attractivechaos.github.io/klib/#Khash%3A%20generic%20hash%20table) is a part of klib library in C. I want to covert it to D in the process of learning deeper about D. First I tried with Dstep (https://github.com/jacob-carlborg/dstep) and read the C to D article (https://dlang.org/ctod.html). I managed to covert the basic statements to D, but all multiline 'define' macros are stripped off. So I am trying to recreate them with D way. For example: #define __KHASH_TYPE(name, khkey_t, khval_t) \ typedef struct kh_##name##_s { \ khint_t n_buckets, size, n_occupied, upper_bound; \ khint32_t *flags; \ khkey_t *keys; \ khval_t *vals; \ } kh_##name##_t; I changed to: template __KHASH_TYPE(string name){ "struct kh_" ~ name ~"_t { " ~ "khint_t n_buckets, size, n_occupied, upper_bound; " ~ "khint32_t *flags; " ~ "khkey_t *keys; " ~ "khval_t *vals; " ~ "}" } // NEXT: use mixin with this template. I am currently get a bit intimidated looking at KHASH_INIT2 macro in khash.c. How do I convert this to the equivalent and idiomatic D?
Re: high performance client server solution in D?
@Laeeth Isharc and rikki cattermole: Thank you for your inputs. Msgpack is definitely something I will consider. I tried search some show cases and open-source projects of this kind for Dlang but still haven't found one. Those applications will give clearer ideas.
high performance client server solution in D?
I am considering to use D and its library to build a high performance client-server application. The client will be a cross platform (Windows, Mac, Linux) GUI program that can synchronize analysis results with the remote central server, and analyze data locally. It will also visualize big data files (about 10GB of binary data each). The term 'high performance' means it can serve several hundreds users with desktop application speed. Further more, heavy computation tasks will be done locally on the client side. This description is still vague, I know. But that's the best I can give for now. I would choose 'dlangui' and check 'vibe.d' for a start. However, I do not need to access the central server via web browsers. Hope that you can give some thoughts about this design, what GUI library to use, and what back-end library to use.
Re: Is it possbile to specify a remote git repo as dub dependency?
On Monday, 19 December 2016 at 16:23:45 UTC, Guillaume Piolat wrote: What you can do for private package as of today is: - use path-based dependencies and put your packages in the same repo - use git submodules and path-based dub dependencies together If it's a public package, you can register yourself on the DUB repositery. Regarding path-based dependencies, how can I use a C or non-dub-backing project as a dependency? I defined in dub.json like this: "importPaths": ["non-dub-pkg"], "dependencies": { "non-dub-pkg": {"versions": "~master", "path": "./non-dub-pkg"} } And I got an error saying something like "dub could not find dub.json or pakage.json in 'non-dub-pkg' directory". That's true because it is a Makefile project. @Eugene Wissner: I think a good official central registry is more reliable for production use. I don't know that js has ten registries. But if that is true, it is the lack of performance of the first official registry maintainer. Have one, do well, and recruit more talents. That will benefits people more than scattering resources to reinvent the wheel. Nonetheless, I agree that the support for ad-hoc dependencies is a good idea for development.
Re: Is it possbile to specify a remote git repo as dub dependency?
On Monday, 19 December 2016 at 14:18:17 UTC, Jacob Carlborg wrote: On 2016-12-19 13:11, biocyberman wrote: I can write a short script to clone the remote git repo and use it as a submodule. But if it is possible to do with dub, it will be more convenient. It's not currently possible. I see, it is both a good thing and a bad thing. Good thing is to encourage developers to submit packages to central dub registry. Bad thing is, when that does not happen soon enough, other developers who use the package will have to do something for themselves.
Is it possbile to specify a remote git repo as dub dependency?
I can write a short script to clone the remote git repo and use it as a submodule. But if it is possible to do with dub, it will be more convenient.