Re: AA rehash threshold
Steven Schveighoffer schvei...@yahoo.com writes: On 11/18/14 9:46 PM, deadalnix wrote: After all, unless your hash table is very small, the first hit is most likely a cache miss, meaning ~300 cycles. at this point the cache line is in L1, with a 3 cycle access time. So accessing several element in the same cache line is often preferable. In the case of AAs, all bucket elements are pointers, so this point is moot for our purposes -- one may have to jump outside the cache to find the first element. A good improvement (this is likely to come with a full library type) is to instead store an inline element for each bucket entry instead of just a pointer to an element. I recall when writing dcollections, this added a significant speedup. This works nicely for small types, but has gotchas. For example, if you've got an AA of ints, what value indicates that this is a value folded into the bucket entry vs actually being a pointer? You'll need extra space to make sure it's safe. Alignment is another concern. Let's say you have bool for the entry/element flag. With ints you've doubled the size of the bucket array. If you have large objects as the elements, you'll waste space. Probably the implementation needs to be switchable depending on the size of the object. As an aside, I find it invaluable to have statistics about hash table buckets, ala C++11. When building custom hash functions and hash table implementations, it's almost necessary. I'd like to see bucket stats available in some manner for built-in AAs. If they're going away for a library type, we should have such stat info as part of the API. Jerry
Re: AA rehash threshold
Steven Schveighoffer schvei...@yahoo.com writes: On 11/20/14 5:30 PM, Jerry Quinn wrote: This works nicely for small types, but has gotchas. For example, if you've got an AA of ints, what value indicates that this is a value folded into the bucket entry vs actually being a pointer? You'll need extra space to make sure it's safe. Alignment is another concern. Let's say you have bool for the entry/element flag. With ints you've doubled the size of the bucket array. Hm.. the bucket entry will simply be what a node is now. You are actually saving space, because you don't need to store that extra pointer to the first node. Almost. You do need a way to indicate that the bucket is empty. So you need another flag though it can be smaller than a pointer (for 64 bit). But the overhead is low except for things like ints. Where it gets dicey is when you rehash. Some of the nodes will be moved into the bucket space, some will move out of it. But I don't think it's a big deal, as a rehash doesn't happen very often. True. Umm, that does raise a question - are addresses of AA contents required to be stable? C++11 requires that of its hashtables. But it ties your hands when trying to write customized hash tables.
Re: std.benchmark is in reviewable state
Walter Bright Wrote: We already have std.perf: http://www.d-programming-language.org/phobos/std_perf.html A review should compare the two. std.perf doesn't show up in the nav bar to the left. I didn't know it existed until your post :-) Jerry
Re: Multithreaded file IO?
Jonathan M Davis Wrote: On Saturday, September 24, 2011 01:05:52 Jerry Quinn wrote: Jonathan M Davis Wrote: On Friday, September 23, 2011 23:01:17 Jerry Quinn wrote: A direct rewrite would involve using shared and synchronized (either on the class or a synchronized block around the code that you want to lock). However, the more idiomatic way to do it would be to use std.concurrency and have the threads pass messages to each other using send and receive. I'm trying the direct rewrite but having problems with shared and synchronized. class queue { File file; this(string infile) { file.open(infile); } synchronized void put(string s) { file.writeln(s); } } queue.d(10): Error: template std.stdio.File.writeln(S...) does not match any function template declaration queue.d(10): Error: template std.stdio.File.writeln(S...) cannot deduce template function from argument types !()(string) Remove the synchronized and it compiles fine with 2.055. Technically, sychronized should go on the _class_ and not the function, but I'm not sure if dmd is currently correctly implemented in that respect (since if it is, it should actually be an error to stick synchronized on a function). Regardless of that, however, unless you use shared (I don't know if you are), each instance of queue is going to be on its own thread. So, no mutexes will be necessary, but you won't be able to have multiple threads writing to the same File. It could get messy. I get similar errors if I put synchronized on the class, or shared. My best guess is that a File struct cannot currently (2.055) be shared or accessed in a synchronized context. If I make the class synchronized and use __gshared on the File then it compiles. I don't know if it works, though. You could use a synchronized block instead of synchronizing the class void put(string s) { synchronized(this) { file.writeln(s); } } and see if that works. But you still need the variable to be shared. I'll try that. So, what you'd probably do is spawn 3 threads from the main thread. One would read the file and send the data to another thread. That second thread would process the data, then it would send it to the third thread, which would write it to disk. I think that would become messy when you have multiple processing threads. The reader and writer would have to handshake with all the processors. The reader therad would be free to read at whatever rate that it can. And the processing thread would be free to process at whatever rate that it can. The writer thread would be free to write and whatever rate it can. I/O issues would be reduced (especially if the files be read and written to are on separate drives), since the reading and writing threads would be separate. I don't see why it would be particularly messy. It's essentially what TDPL suggests a file copy function should do except that there's a thread in the middle which does so processing of the data before sending it the thread doing the writing. Really, this is a classic example of the sort of situation which std.concurrency's message passing is intended for. If you only have one processing thread, this is true. However if you have, say, 16 processing threads, the code becomes messy. The reader has to handshake with each processor and select which processor to send the next chunk of input data to. The same dance has to happen when the processors are ready to write, since I want all the output ordered correctly. Broadcast (which isn't present in std.concurrency) wouldn't work because only one processor should get an input chunk. The problem is really a single-writer, multiple-reader problem. std.parallelism actually looks the closest to what I want. Not sure if I can make it work easily though. For std.parallelism to work, each iteration of a parallel foreach _must_ be _completely_ separate from the others. They can't access the same data. So, for instance, they could access separate elements in an array, but they can't ever access the same element (unless none of the write to it), and something like a file is not going to work at all, since then you'd be trying to write to the file from multiple threads at once with no synchronization whatsoever. std.parallelism is for when you do the same thing many times _in parallel_ with each other, and your use case does not sound like that at all. I really think that std.concurrency is what you should be using if you don't want to do it the C/C++ way and use shared. Well I would have used the C++ shared way if I could get it to compile :-) I actually am doing the same thing many times in parallel. I have a single input file, but each line is processed independently by a worker thread. The results are output to a single file in order. Is map() what I want, perhaps
Re: Multithreaded file IO?
Lutger Blijdestijn Wrote: If you didn't know, the concurrency chapter of tdpl is a free chapter: http://www.informit.com/articles/article.aspx?p=1609144 It has an example of file copying with message passing: http://www.informit.com/articles/article.aspx?p=1609144seqNum=7 What I really want is a shared fifo where the input is lines from a file, and many workers grab something from the fifo. They then push their results into a shared reordering output queue.
Re: Why do we have transitive const, again?
Adam D. Ruppe Wrote: IMO immutable has taken a step *backward* a few releases ago, and that's about all I've thought about it. immutable string a = lol; auto b = replace(a, lol, rofl); // used to work, now doesn't. This works for me in 2.055. Jerry
Multithreaded file IO?
Hi folks, I wasn't sure whether this should go here or in the D devel list... I'm trying to port a program where threads read from a file, process the data, then write the output data. The program is cpu-bound. In C++ I can do something like this: class QueueIn { ifstream in; mutex m; string get() { string s; m.lock(); getline(in, s); m.unlock(); return s; } }; class QueueOut { ofstream out; mutex m; void put(string s) { m.lock(); out.write(s); m.unlock(); } }; In D, I'm so far having trouble figuring out the right idiom to do what I want. I looked at std.parallel, but it seems a bit tricky to make my stuff work in this setting. A std.stdio File cannot be part of shared class. How would you do this with the latest D2? Thanks Jerry
Re: Multithreaded file IO?
Jonathan M Davis Wrote: On Friday, September 23, 2011 23:01:17 Jerry Quinn wrote: A direct rewrite would involve using shared and synchronized (either on the class or a synchronized block around the code that you want to lock). However, the more idiomatic way to do it would be to use std.concurrency and have the threads pass messages to each other using send and receive. I'm trying the direct rewrite but having problems with shared and synchronized. class queue { File file; this(string infile) { file.open(infile); } synchronized void put(string s) { file.writeln(s); } } queue.d(10): Error: template std.stdio.File.writeln(S...) does not match any function template declaration queue.d(10): Error: template std.stdio.File.writeln(S...) cannot deduce template function from argument types !()(string) Remove the synchronized and it compiles fine with 2.055. So, what you'd probably do is spawn 3 threads from the main thread. One would read the file and send the data to another thread. That second thread would process the data, then it would send it to the third thread, which would write it to disk. I think that would become messy when you have multiple processing threads. The reader and writer would have to handshake with all the processors. Unfortunately, I'm not aware of any good code examples of this sort of thing online. TDPL has some good examples, but obviously you'd have to have the book to read it. Given some time, I could probably cook up an example, but I don't have anything on hand. std.parallelism actually looks the closest to what I want. Not sure if I can make it work easily though.
Support for feeding data to engine threads?
I'm looking at porting an app that maintains a work queue to be processed by one of N engines and written out in order. At first glance, std.parallelism already provides the queue, but the Task concept appears to assume that there's no startup cost per thread. Am I missing something or do I need to roll a shared queue object?
Re: Support for feeding data to engine threads?
Timon Gehr Wrote: On 09/12/2011 07:23 PM, Jerry Quinn wrote: I'm looking at porting an app that maintains a work queue to be processed by one of N engines and written out in order. At first glance, std.parallelism already provides the queue, but the Task concept appears to assume that there's no startup cost per thread. Am I missing something or do I need to roll a shared queue object? I don't know if I get you right, but std.parallelism uses a task pool. Usually no threads are started or stopped during processing. OK I guess what I'm looking for is WorkerLocalStorage. I can create an engine per thread. However, I probably need to have each thread do the initialization work. If I create the engine on the main thread, it won't be properly accessed by the worker thread, right? I.e. I need each thread to run engine.init() which will do a whole pile of loading and setup first before I can start feeding data to the pool. The impression I get is that for (int i=0; i nthreads; i++) taskPool.workerLocalStorage(new engine(datafile)) will not get me what I want.
Re: Support for feeding data to engine threads?
Timon Gehr Wrote: On 09/12/2011 08:01 PM, Jerry Quinn wrote: Timon Gehr Wrote: On 09/12/2011 07:23 PM, Jerry Quinn wrote: I'm looking at porting an app that maintains a work queue to be processed by one of N engines and written out in order. At first glance, std.parallelism already provides the queue, but the Task concept appears to assume that there's no startup cost per thread. Am I missing something or do I need to roll a shared queue object? I don't know if I get you right, but std.parallelism uses a task pool. Usually no threads are started or stopped during processing. OK I guess what I'm looking for is WorkerLocalStorage. I can create an engine per thread. However, I probably need to have each thread do the initialization work. If I create the engine on the main thread, it won't be properly accessed by the worker thread, right? I.e. I need each thread to run engine.init() which will do a whole pile of loading and setup first before I can start feeding data to the pool. The impression I get is that for (int i=0; i nthreads; i++) taskPool.workerLocalStorage(new engine(datafile)) will not get me what I want. Calling workerLocalStorage once suffices. auto engine=taskPool.workerLocalStorage(new Engine(datafile)); This will create one engine per working thread and and the same datafile. You can access the engine from each thread with engine.get. What is the exact role of datafile? Does it have to be distinct for each engine? datafile is just a set of parameters for configuring the engine, i.e location of data, parameter values, etc. In this setting it would be the same for each engine.
Re: Support for feeding data to engine threads?
dsimcha Wrote: == Quote from Jerry Quinn (jlqu...@optonline.net)'s article I'm looking at porting an app that maintains a work queue to be processed by one of N engines and written out in order. At first glance, std.parallelism already provides the queue, but the Task concept appears to assume that there's no startup cost per thread. Am I missing something or do I need to roll a shared queue object? Not sure if I'm misunderstanding what you're asking, but I'll answer the question I think you're asking. std.parallelism's Task can be executed in two ways. executeInNewThread() does start a new thread for every Task. However, you can also submit a Task to a TaskPool and have it executed by an existing worker thread. The whole point of using TaskPool to execute a Task is to avoid paying the thread start-up cost for every Task. The question I was asking was how to execute a huge amount of per-thread initialization once per thread in the TaskPool framework.
Re: Support for feeding data to engine threads?
dsimcha Wrote: == Quote from Jerry Quinn (jlqu...@optonline.net)'s article Timon Gehr Wrote: On 09/12/2011 07:23 PM, Jerry Quinn wrote: I'm looking at porting an app that maintains a work queue to be processed by one of N engines and written out in order. At first glance, std.parallelism already provides the queue, but the Task concept appears to assume that there's no startup cost per thread. Am I missing something or do I need to roll a shared queue object? I don't know if I get you right, but std.parallelism uses a task pool. Usually no threads are started or stopped during processing. OK I guess what I'm looking for is WorkerLocalStorage. I can create an engine per thread. However, I probably need to have each thread do the initialization work. If I create the engine on the main thread, it won't be properly accessed by the worker thread, right? I.e. I need each thread to run engine.init() which will do a whole pile of loading and setup first before I can start feeding data to the pool. The impression I get is that for (int i=0; i nthreads; i++) taskPool.workerLocalStorage(new engine(datafile)) will not get me what I want. The parameter for taskPool.workerLocalStorage is lazy, so it's even easier: taskPool.workerLocalStorage(new engine(datafile)); will create one new engine **per thread** because the lazy parameter is passed to it and evaluated **once per thread**. I'm not sure what an engine is in this context, though. An engine is simply a complex class that takes a lot of memory and setup work to prepare. I'm working on language processing, so we use large multiGB models. Great, I'm getting closer. So the evaluation happens in the worker thread, not the main thread, right? The other thing I think I have to do is sequence the initializations. Otherwise the disk/network will thrash trying to load N copies simultaneously. I imagine this can be done with a shared variable inside the engine. As far as initialization, you could do something like: auto isInitialized = taskPool.workerLocalStorage(false); void doStuff() { if(!isInitialized.get) { engines.get.initialize(); isInitialized.get = true; } // Do some real processing. } I prefer to do all the initializations before work starts so that timing can be more accurate. These things take a lot to get fully loaded and ready to run. Thanks for all the help! It might be useful to indicate in the workerLocalData call that the lazy processing happens in the worker's thread. Jerry
Re: Support for feeding data to engine threads?
dsimcha Wrote: == Quote from Jerry Quinn (jlqu...@optonline.net)'s article dsimcha Wrote: == Quote from Jerry Quinn (jlqu...@optonline.net)'s article I'm looking at porting an app that maintains a work queue to be processed by one of N engines and written out in order. At first glance, std.parallelism already provides the queue, but the Task concept appears to assume that there's no startup cost per thread. Am I missing something or do I need to roll a shared queue object? Not sure if I'm misunderstanding what you're asking, but I'll answer the question I think you're asking. std.parallelism's Task can be executed in two ways. executeInNewThread() does start a new thread for every Task. However, you can also submit a Task to a TaskPool and have it executed by an existing worker thread. The whole point of using TaskPool to execute a Task is to avoid paying the thread start-up cost for every Task. The question I was asking was how to execute a huge amount of per-thread initialization once per thread in the TaskPool framework. Hmm, currently there isn't an easy/obvious way, though I'm thinking maybe one should be added. I've kind of needed it, too, and I don't anticipate it being very hard to implement. If you can suggest a good API for this, I'll work on it for next release. I'd probably naturally want to do something like: auto engines = taskPool.init(new Engine(params)); auto lines = File(foo.txt).byLine(); auto results = taskPool.map!(engines.process)(lines) foreach (auto r; results) writeln(r); This would create a set of new Engine objects per object, but initialized within each thread as thread-local data. Each Engine object has a process() call taking a line as input and returning a string in this case. Any free engine can process more work as long as there is still work available. For the large-scale use case, it could take an extra argument and allow only N simultaneous init's under the covers.
Google +1 button for dpl.org?
Hi folks, This seemed like the right place to toss this out... What do folks think of adding a google +1 button to d-programming-language.org? I just created a google+ account and discovered it. Jerry
Re: Question about D, garbage collection and fork()
Steven Schveighoffer Wrote: Do you know what causes the OS to regard that memory as read-only? Since fork() is a C system call, and D gets its heap memory the same as any other unix process (brk()), I can't see why it wouldn't work. As long as you do the same thing you do in C, I think it will work. It's not that the OS considers the memory actually read-only. It uses copy-on-write so the pages will be shared between the processes until one or the other attempts to write to the page. So if the garbage collector moves things around, it will cause the pages to be copied and unshared. So my question is really probably whether the garbage collector will tend to dirty shared pages or not. Jerry
Question about D, garbage collection and fork()
Where I work, we find it very useful to start a process, load data, then fork() to parallelize. Our data is large, such that we'd run out of memory trying to run a complete copy on each core. Once the process is loaded, we don't need that much writable memory, so fork is appealing to share the loaded pages. It's possible to use mmap for some of the data, but inconvenient for other data, even though it's read-only at runtime. So here's my question: In D, if I create a lot of data in the garbage-collected heap that will be read-only, then fork the process, will I get the benefit of the operating system's copy-on-write and only use a small amount of additional memory per process? In case you're wondering why I wouldn't use threading, one argument is that if you have a bug and the process crashes, you only lose one process instead of N threads. That's actually useful for robustness. Thoughts?
Re: Proposal for std.path replacement
Lars T. Kyllingstad Wrote: As mentioned in the std.path.getName(): Screwy by design? thread, I started working on a rewrite of std.path a long time ago, but I got sidetracked by other things. The recent discussion got me working on it again, and it turned out there wasn't that much left to be done. So here it is, please comment: http://kyllingen.net/code/ltk/doc/path.html https://github.com/kyllingstad/ltk/blob/master/ltk/path.d Rather than: drive() stripDrive() extension() stripExtension() would it make sense to combine them? string[2] splitDrive(path) string[2] splitExtension(path) Just a thought. Jerry
Re: xxxInPlace or xxxCopy?
Andrei Alexandrescu Wrote: On 1/19/11 6:53 PM, Jonathan M Davis wrote: On Wednesday, January 19, 2011 15:33:16 Andrei Alexandrescu wrote: I'm consolidating some routines from std.string into std.array. They are specialized for operating on arrays, and include the likes of insert, remove, replace. One question is whether operations should be performed in place or on a copy. For example: So I guess vote stays unchanged :o). Thoughts? Haven't we been using the approach that string operations generally make copies (in many cases slices) and marking functions that do it in place with InPlace? Problem is, even though the example uses strings, the functions apply to all arrays. The big difference is operating on immutable arrays vs mutable ones. For immutable arrays, you have to do copies. But mutable ones allow in-place editing. If I'm working with mutable arrays of ints, I don't want to have to type InPlace after every function and I *really* don't want the array to be copied or efficiency will go down the tubes. Nor do I want to add Copy to every string operation. This might be an argument to leave the string functions where they are. To a certain extent, strings are special, even though they really aren't. Is it too ugly to contemplate algorithms doing in-place operations on mutable arrays and return a copy instead for immutable ones?
Re: eliminate junk from std.string?
Andrei Alexandrescu Wrote: On 1/9/11 4:51 PM, Andrei Alexandrescu wrote: There's a lot of junk in std.string that should be gone. I'm trying to motivate myself to port some functions to different string widths and... it's not worth it. What functions do you think we should remove from std.string? Let's make a string and then send them the way of the dino. Thanks, Andrei I have uploaded a preview of the changed APIs here: http://d-programming-language.org/cutting-edge/phobos/std_string.html Unclear if iswhite() refers to ASCII whitespace or Unicode. If Unicode, which version of the standard? Same comment for icmp(). Also, in the Unicode standard, case folding can depend on the specific language. There is room for ascii-only functions, but unless a D version of ICU is going to be done separately, it would be nice to have full unicode-aware functions available. You've got chop() marked as deprecated. Is popBack() going to make sense as something that removes a variable number of chars from a string in the CR-LF case? That might be a bit too magical. Rather than zfill, what about modifying ljustify, rjustify, and center to take an optional fill character? One set of functions I'd like to see are startsWith() and endsWith(). I find them frequently useful in Java and an irritating lack in the C++ standard library. Jerry
Re: eliminate junk from std.string?
Jerry Quinn Wrote: One set of functions I'd like to see are startsWith() and endsWith(). I find them frequently useful in Java and an irritating lack in the C++ standard library. Just adding that these functions are useful because they're more efficient than doing a find and checking that the match is in the first position. Jerry
Re: eliminate junk from std.string?
Andrei Alexandrescu Wrote: On 1/11/11 1:45 PM, Jerry Quinn wrote: Unclear if iswhite() refers to ASCII whitespace or Unicode. If Unicode, which version of the standard? Not sure. enum dchar LS = '\u2028'; /// UTF line separator enum dchar PS = '\u2029'; /// UTF paragraph separator bool iswhite(dchar c) { return c = 0x7F ? indexOf(whitespace, c) != -1 : (c == PS || c == LS); } Which version? This looks pretty incomplete if the goal is to return true for any unicode whitespace character. My comment was really that if we're going to offer things like this, they need to be more completely defined. Same comment for icmp(). Also, in the Unicode standard, case folding can depend on the specific language. That uses toUniLower. Not sure how that works. And doesn't mention details about the Unicode standard version it implements. You've got chop() marked as deprecated. Is popBack() going to make sense as something that removes a variable number of chars from a string in the CR-LF case? That might be a bit too magical. Well I found little use for chop in e.g. Perl. People either use chomp or want to remove the last character. I think chop is useless. Agreed, chomp is more useful. My question is whether popBack() should automatically act like perl chomp() for strings or not? One set of functions I'd like to see are startsWith() and endsWith(). I find them frequently useful in Java and an irritating lack in the C++ standard library. Yah, those are in std.algorithm. Ideally we'd move everything that's applicable beyond strings to std.algorithm. Ah, missed those. Jerry
Re: eliminate junk from std.string?
Jerry Quinn Wrote: Same comment for icmp(). Also, in the Unicode standard, case folding can depend on the specific language. That uses toUniLower. Not sure how that works. And doesn't mention details about the Unicode standard version it implements. Actually it does. *munch* *munch* my words are delicious. It would be good to have better docs on what icmp() does. Also, it might make sense to do icmp() using unicode case folding and normalization rather than simple lowercase. Thinking out loud here.
Re: [review] new string type
Steven Schveighoffer Wrote: On Fri, 03 Dec 2010 11:15:36 -0500, spir denis.s...@gmail.com wrote: On Thu, 02 Dec 2010 16:24:03 -0500 Steven Schveighoffer schvei...@yahoo.com wrote: Yes, it does seem odd, but then again, how often do you need the individual characters of a string? I wrote php code for about 6 months as a full time job before I found I needed to access individual characters, and then I had to look up how to do it :) It's just not a common thing. Well, I wonder whether I have ever written any piece of code without string indexing ;-) Surely depends on your app domain(s)... Indexing a string is rare, unless you are parsing something (yes it does truly depend on the domain), but even rarer is indexing a string based on a hard-coded value. Really, this last case is where the type would behave in a confusing way. I tend to do a lot of transforming strings, but I need to track offsets back to the original text to maintain alignment between the results and the input. For that, indexes are necessary and we use them a lot. Probably the right thing to do in this case is just pay for the cost of using dchar everywhere, but if you're working with large enough quantities of data, storage efficiency matters. Jerry
Re: [review] new string type
Steven Schveighoffer Wrote: On Fri, 03 Dec 2010 14:40:30 -0500, Jerry Quinn jlqu...@optonline.net wrote: I tend to do a lot of transforming strings, but I need to track offsets back to the original text to maintain alignment between the results and the input. For that, indexes are necessary and we use them a lot. In my daily usage of strings, I generally use a string as a whole, not individual characters. But I do occasionally use it. Let's also understand that indexing is still present, what is deactivated is the ability to index to arbitrary code-units. It sounds to me like this new type would not affect your ability to store offsets (you can store an index, use it later when referring to the string, etc. just like you can now). My string type does not allow for writeable strings. My plan was to allow you access to the underlying char[] and let you edit that way. Letting someone write a dchar into the middle a utf-8 string could cause lots of problems, so I just disabled it by default. That's reasonable. Im not trying to create invalid strings. I'm actually working in C++ but keeping an eye on things going on in D-land. The kind of stuff we do is to normalize text in preparation for natural language processing. As a simple example, let's say you want to use a set of regexes to identify patterns in text. You want to return the offsets of each regex that matches. However, before the regexes run, you replace all html tags with a placeholder, so they can easily span tags without worrying about the contents. Before you return the results to the user, though, you need to translate those offsets back to ones for the original string. Everything is unicode of course and we care about processing unicode code points, but want to maintain UTF-8 storage underneath to keep size down. In reality, we're often doing things like single character normalizations as well as larger spans, but still need to maintain alignment to the original data. As long as this is reasonable to do, I'm fine. I just wasn't sure from the descriptions I was seeing. Jerry
Re: FloatLiteral 1f
On 27/07/2010 21:46, Philip Trettner wrote: Hello. When I came across the lexical definition of a FloatLiteral I wondered how '1f' gets recognized. The definition says: FloatLiteral: Float Float Suffix Integer ImaginarySuffix Integer FloatSuffix ImaginarySuffix Integer RealSuffix ImaginarySuffix Float contains a dot or an exponent, so this cannot be. The suffixes for integers are only imaginary, so '1f' won't be recognized either. But the D Compiler compiles fine (as it should be). Am I missing something or is this a flaw in the FloatLiteral definition? kind regards, Philip This was reported in bug 949: http://d.puremagic.com/issues/show_bug.cgi?id=949
Re: container stuff
Andrei Alexandrescu Wrote: On 05/25/2010 06:04 PM, Steven Schveighoffer wrote: On Tue, 25 May 2010 18:27:32 -0400, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: I've uploaded a work in progress on the container design here: http://erdani.com/d/phobos/std_container.html It's deceptively simple - the entire design is a nomenclature, really. Any given container may choose to implement whichever primitives it can, if (and only if) it can satisfy the requirements. Beyond that, each container can define its own primitives. There are a bunch of soft primitives. Those are meant to put a stop to the iterator invalidation problems experienced in the STL. The container implementor may alias softXyz to xyz if she knows the operation won't mess the ranges currently iterating the container (which is the case for most node-based containers). I haven't yet discussed subtler cases in which a range starts with a removed element etc., but I plan to. softXXX might be better named safeXXX or stableXXX. The names might be more suggestive of preserving iteration. The doc isn't quite clear whether make is a member function or standalone. I assume it's standalone, but that's worth firming up. I don't like insertInstead, can we make it replace? replace was the original name. insertInstead is consistent with the other two, so we have (softI|i)nsert[Before|After|Instead]. I second the request for the name replace. Despite the consistency, I think replace is clear to programmers as well as being more familiar and comfortable. Also, the operation really isn't insert instead, it's delete then insert instead. So there is lack of symmetry because it removes elements while the other does not. Another issue I see is that opIndex and its brethren takes KeyType, but for something like vector, this should be size_t.. I don't think you want ElementType to be tuple!(size_t, ElementType) in this case. Related to this, do you intend removal of a single element to use remove(range) or removeKey? Finally, opSlice(begin, end) is not there. Was this intentional or overlooked? Jerry
Is this a bug with goto?
In the spec it says that it's illegal to skip an initialization using goto. Unless I'm mistaken, the code below does that for b, s, and c. However, it compiles without complaint. So, should the compiler be complaining, or is the text about goto really saying that behavior is undefined. Obviously the latter makes it easier for the compiler writer :-) void foo() { goto L0; int b; struct S { int a; this(int c) { a=c; } } S s = S(5); class C { int a; } C c = new C; L0: ; s.a++; c.a++; } thanks, Jerry
Re: Is this a bug with goto?
Andrei Alexandrescu Wrote: (Hi Jerry! Glad to see you're ramping up participation lately.) Yes, that's a bug. There are many other bugs related to jumps, including in switch statements. http://d.puremagic.com/issues/show_bug.cgi?id=4101 cheers, Jerry
Re: Does D allow paradoxical code?
Daniel Keep Wrote: This generally raises the question of what should be the order of evaluation for constructs like mixins, conditional compilation, and CTFE. Each has the potential to modify the other. What should be the rule to break the ambiguity? Or do we leave it implementation-defined? Thoughts? Jerry The D spec, as far as I know, says nothing on this. Then again, the D spec says nothing about a lot of things it should. Personally, I just ignore the spec; DMD is the only reliable reference for the language. That's why I'm bringing this issue up. The spec needs to become the reliable reference rather than the DMD implementation. Otherwise, D will be difficult to use seriously because you'll never know exactly what the language really allows. Now, being as I'm supposed to go off and do my walk around the block now, I'll be lazy and start making educated guesses. I don't think the DMD frontend is smart enough to handle this; nor do I think it should. If you were to allow this sort of thing, you'd never be able to bloody well compile it. The simplest solution is to simply make canonical what I think DMD already does: static code [1] can be arranged in any order [2], but meta code [3] is processed strictly in lexical order. Indeed, version actually already does this: you're not allowed to set a version after you've already used it. I agree with you that this is a bit silly and I'm not looking to have the compiler properly run forever. I'm raising the issue to get the spec to be firmer about how these meta constructs get evaluated at compile time. Lexical order makes sense to me. Another variant of the problem I described is if you were to mixin a protection attribute where the attribute makes the string that builds it no longer visible. Jerry
Does D allow paradoxical code?
What should happen when the following code is compiled? mixin(vs); version(v1) { const string vs = version = v2;; pragma(msg, yes); } else { const string vs = version = v1;; pragma(msg, no); } Currently, there's an error saying that vs must be a string. However, as far as I can tell, the spec says nothing about order affecting module-scope declarations. So vs should be valid, and once it substitutes in the mixin, you have a cycle where no matter what vs is, vs should be the other value. If you don't have the conditional compilation and put the declaration of vs after the mixin, it works as expected. This generally raises the question of what should be the order of evaluation for constructs like mixins, conditional compilation, and CTFE. Each has the potential to modify the other. What should be the rule to break the ambiguity? Or do we leave it implementation-defined? Thoughts? Jerry
Re: An example of Clang error messages
Walter Bright Wrote: grauzone wrote: How many man years would you assume writing a full D compiler takes? (Frontend starting from scratch, using an existing backend.) But there's little reason to implement a D compiler from scratch. For open source uses, you can just go ahead and use dmd. For closed, proprietary uses, we can work something out. Hi, me again. Following up on this comment, can I bring up the proposal of contributing a snapshot of the dmd frontend to the GCC project? I think for D to be really viable in GCC, it must be a part of the source tree. I hope I'm not being a pest. Thanks, Jerry
Re: Container hierarchy vs. container types
Andrei Alexandrescu Wrote: I decided to let no assumption unchallenged and got back to the question: do we really need a container _hierarchy_? How much joy and usefulness do people derive from dynamically changing an array with a list under the same interface? Or a rb-tree vs. a binary tree vs. a left-leaning binary tree? As far as I can tell such changes of containers are a design-time decision, and extremely rarely (never as far as I remember) a runtime decision to hide behind a binary interface. I'm starting to think that it's quite the other way around, and a Sapir-Whorf kind of thing: The fact that Java lacks symbol aliasing capability conditions the thinking that a hierarchy is the right thing to do, whereas the true solution is to design a flat federation of container types with name- and semantics-compatible primitives, to then use them directly or via design-time aliases. I've rarely found it helpful that an ArrayList is a List, or that HashMap is a Map. If an API I'm passing to takes the base class, it could easily have been done as a template. And dynamic switching hasn't ever been useful to me. As long as the API is the same, recompiling will work and the containers will continue to behave well. If dynamic switching is really necessary for an application, it can be done by wrapping the container with a class hierarchy as needed. Obviously I might be wrong in any number of ways. Switching containers dynamically might be very useful. The precision, flexibility, and beauty I'm aiming for may be useless. Container hierarchies may have some other awesomeness I'm failing to see. But for the time being I'm very seriously considering a simple design: * Each specific container is an unconnected class (or struct, haven't decided yet - probably structs to accommodate reference-counted containers); Also, structs would reduce the indirections required to access the data, when placing them inside other aggregates as well as reduce memory allocation traffic. I'd rather not pay for 2 allocations to create an instance of: class C { vector!(int) v; } * Naming matches dictate semantic matches. There should be no false friends - if a container defines removeFront, that should mean the same thing as for other containers (i.e. O(1) removal of the first element). This I would hesitate a bit on. I recently changed STL maps for custom hashmaps and not changing operator[] for different code is useful. I'm changing the computational complexity of an operation but conceptually it's the same kind of access. I'm doing so because I know that for my application, the tradeoffs elsewhere are acceptable. This design follows and enriches STL's design even after we have, as a community, seen many hierarchy-based designs, and even though it looks like hierarchies have won. The main improvements I want to bring to STL's design are (a) a better categorization of containers, (b) of course replace iterators with ranges, (c) either eliminate allocators or make them work. Needless to say, I'd be very curious to hear other opinions in this matter. Please speak up - the future of D depends, in small part, on this too. The concept of allocators is nice in principal, though I've never had to put them into action. A couple of places I could see having used STL allocators: wrapping JNI memory or GC memory in an STL container. Neither is particularly necessary in D. Another place I could envision an allocator is if I load a large static data structure with mmap and want to wrap an STL structure around it. I've wanted to do this in particular with strings. This is where C++ allocators fall down at least. Trying to mix custom allocated data structures with standard ones is going to be painful. Jerry
Re: Making all strings UTF ranges has some risk of WTF
Don Wrote: We seem to be approaching the point where char[], wchar[] and dchar[] are all arrays of dchar, but with different levels of compression. It makes me wonder if the char, wchar types actually make any sense. If char[] is actually a UTF string, then char[] ~ char should be permitted ONLY if char can be implicitly converted to dchar. Otherwise, you're performing cast(char[])(cast(ubyte[])s ~ cast(ubyte)c) which will not necessarily result in a valid unicode string. Well, if you're working with a LOT of text, you may be mmapping GB's of UTF-8 text. Yes, this does happen. You better be able to handle it in a sane manner, i.e. not reallocating the memory to read the data in. So, there is a definite need for casting to array of char, and dealing with the inevitable stray non-unicode char in that mess. Real-world text processing can be a messy affair. It probably requires walking such an array and returning slices cast to char after they've been validated.
Re: D compiler as part of GCC
Brad Roberts Wrote: On 1/23/2010 4:15 PM, Walter Bright wrote: Leandro Lucarella wrote: Walter Bright, el 23 de enero a las 12:54 me escribiste: Jerry Quinn wrote: Walter Bright Wrote: Will they take a fork of the dmd source, such that they own the copyright to the fork and Digital Mars still has copyright to the original? Hi, Walter, The answer appears to be yes: http://gcc.gnu.org/ml/gcc/2010-01/msg00430.html Jerry That's great news. I suppose I should look over the forms they talk about! Great news indeed! Since DMD FE is GPL I think it won't be any trouble to fold in the new changes back to GDC as they did (and LDC too), so it won't be really a *fork*, right? Well, still I won't be supporting gdc directly. It would mean a team that would be willing to take new DMD FE updates and fold them into GDC, and then follow whatever gcc's build and release conventions are. I don't think you got the answer you were looking for. You got an answer to a different question. If you assign the copyright over to the FSF, they then own the code. You'd have a license to use it as you like in return, but you would no longer be the owner. Additionally, as pointed out in the gcc@ thread, contributions coming into the gcc tree wouldn't have anything other than the gpl license attached to them and that would likely make them problematic to re-distribute from your tree with the dual gpl/artistic license. In simpler words, this is still far from straightforward. I think you're slightly incorrect, Brad. DigitalMars still owns the copyright to the original source (call it copy A). A fork (called copy B) is donated to the FSF. DigitalMars still gets to make changes to copy A and license them as it sees fit. Copy B is part of the GCC codebase and would evolve separately. Moving changes between them would require the same kind of donation process as the original transfer. Folks making changes to the DMD FE would have to contribute those changes to FSF as well to get them into copy B and vice versa. My apologies if that's the same as what you said. I read your comment a couple of times and was a bit confused. I'd still love for there to be fewer split efforts on the compiler front, so I do encourage trying to find a workable solution.. but tread carefully. The GCC java front end currently uses the Eclipse compiler to produce bytecode and then compiles the bytecode to native. But that's the only front end I'm aware of that isn't fully integrated into the GCC tree. The D front end doesn't produce a portable intermediate representation like that so I think it would be harder to always use the latest DMD front end. I see this possibility as Walter giving GCC a running start so that the D ecosystem has another viable compiler option available with relatively low effort. In the end, the language spec should be the thing that unifies the D community rather than the adhoc definition provided by a particular front end implementation. It's just a matter of how to get there. Later, Jerry
Re: D compiler as part of GCC
Walter Bright Wrote: Will they take a fork of the dmd source, such that they own the copyright to the fork and Digital Mars still has copyright to the original? Hi, Walter, The answer appears to be yes: http://gcc.gnu.org/ml/gcc/2010-01/msg00430.html Jerry
Re: D compiler as part of GCC
Eldar Insafutdinov Wrote: bearophile Wrote: Jerry Quinn: I'm interested in creating a D front end for GCC that would be part of the GCC codebase. What about helping LDC devs create a good D2 implementation instead? It's probably 1/5 or 1/10 of the work you think about, because lot of work is already done, and surely some people will help you (me too). One reason is that I'm already positioned to contribute code to GCC. and it is more difficult for me to become an LDC dev. ANother is that GCC has very broad backend support. I know LLVM backend support is expanding but it still has some distance to go. GCC is also the default compiler for many Linux distributions, and D be part of that may help it propagate. I also do think that the construction of another front end would provide positive benefit to the D community by improving the language specification and separating it from implementation. Of course that's only true if this succeeds :-) There's also a benefit to the GCC project in terms of improving the docs on the frontend interface. That's already happened as I've tried to figure out how it works :-) But that's not so relevant to the folks here. There's Dil, DMD, GDC, LDC, D#, etc, but one good, debugged and well optimizing fully open source D2 compiler is much better than ten broken and/or badly optimizing D compilers. Bye, bearophile I agree that having such a good intent the author of the post should better concentrate his effort on helping GDC/LDC. LDC took couple of years to become usable, and you have to consider that they took an existing front-end. Also what I think even when you complete this project, it is not only the licensing issues that are preventing GDC from being included into GCC. They will do that only if they are interested in this project, as it requires maintenance. They will not update GCC-D frontend with every release of GCC just because it is a part of it. This is very true. It would require people to be interested in continuing it's existence. Having a solid GDC implementation you can be sure that it will be included in distributions (Debian had GDC for quite a long time). In my mind, the endgame for GDC would be to have the work integrated into the official GCC sources. That would provide the similar benefits to the ones I'm chasing. Everyone who touched GDC would have to assign their code to the FSF and the DMD sources would also have to be assigned. Perhaps that's the right answer in the end but I don't know. It does seem to be a substantial effort to make that happen. Jerry
Re: Tidy auto [Was: Re: @disable]
bearophile Wrote: myself). If this is true then a syntax like: auto immutable x = y * 2; can be seen as too much long and boring to write all the time, so the immutable keyword may need to be changed again :-) For example into val (also as retard has said), so it becomes shorter (here auto is present if and only if the programmer wants type inference, as I have written in other posts, to clean up its semantics): auto val x = y * 2; How about fixed? It's a little longer than val, but still carries set in stone semantics. auto fixed x = y * 2;
Re: one suggestion for improving the performance of gc and memroy management
Michel Fortin Wrote: My opinion is that immutable and shared immutable should be two different type modifiers. Contrary to other shared values, shared immutable values can be cast to non-shared immutable (and non-shared const) since they do not require memory barriers, but the reverse should be forbidden (sending a non-shared immutable value to another thread) because it would allow thread-local objects to escape. You don't need the language to prevent the non-shared to shared cast as long as the object can be moved to shared GC once it crosses the barrier. The compiler could do this when it sees the cast happening. Jerry
Re: yank ''?
Simen kjaeraas Wrote: On Mon, 07 Dec 2009 02:11:16 +0100, Jerry Quinn jlqu...@optonline.net wrote: Well, I could see the value of poviding a rotate operator. Since is tainted, what about @ and @ for integral rotation? I was thinking and . They represent the fact that some bits end up on the wrong side. Still, I don't think there're enough use cases for an operator. I'd argue against and since they'd be very easy to misread and mistype. I can believe that an operator isn't necessary, but there should definitely be a standard way for folks to end up with single-asm instructions for rotation without resorting to ugliness. Consider PowerPC that has a whole collection of powerful rotation instructions.
Re: yank ''?
Walter Bright Wrote: dsimcha wrote: == Quote from KennyTM~ (kenn...@gmail.com)'s article No, it will _silently_ break code that uses as unsigned right shift. Well, we could get around this by making an error for a few releases, and then only after everyone's removed their s that mean unsigned shift, we could drop in the rotate semantics. It'll still silently break code moving from D1 to D2. Well, I could see the value of poviding a rotate operator. Since is tainted, what about @ and @ for integral rotation? Jerry
Re: Array, AA Implementations
dsimcha Wrote: If anyone can think of any more, please let me know. Also, just thought of this now: I wonder if it would make sense to use some polymorphism tricks (AA operations are slow enough that an extra pointer dereference or virtual function call isn't going to make or break their performance) to allow implementations to be automatically and dynamically changed depending on some of the attributes listed above. This is not necessarily true. If you are hammering a map of int to int, function call overhead will matter a great deal and you would like the compiler to be able to inline it. Size will also matter for huge AA tables. On the other hand, perhaps you're willing to say that built-in AA's will not be suitable for every case. For example, maybe binary trees are fastest for small N. (Hypothetical, haven't tested it.) Obviously for sufficiently large N hash tables will be fastest. We could create an abstract AA type that automatically switches from binary trees to hash tables when it makes sense, make this the default builtin AA, and expose both of the underlying implementations in std.containers for when the user wants more control over the details rather than DWIM. One thing I think is important is to codify the interface to the AA implementation, so that files built with different compilers could be linked together. I'd also like to see a way to allow the user to implement a custom AA for some uses and still be able take advantage of the nice syntactic sugar the language provides.
Follow-on question about delegates
OK, having thought a bit more about delegates, I now have another question. The ABI shows a delegate as consisting of a context ptr and a function ptr. The context ptr can be a class reference, ptr to struct, ptr to closure or ptr to stack frame. If you're passing one of these delegates into another function, how does the receiving function figure out what kind of context it's looking at? Each of these things will look different in memory. Also, structs at least don't have a header that can be used to disambiguate the situation. Thanks, Jerry
question about foreach, opApply, and delegates
Hi, all. I find myself a little confused about how foreach, opApply, and delegates interact according to the docs. Foreach on an aggregate will use the opApply call (assuming ranges aren't being used). So if we use a simple example below, what exactly is the delegate that is passed to opApply? The docs say a delegate is a pairing of an object reference and a function, where the object is passed as the 'this' parameter to the function. But that doesn't seem to be the case here. Is a virtual object with a function encompassing the body of the foreach being silently created and passed in? Thanks, Jerry class C { uint[] a; int opApply(int delegate(ref uint) dg) { int result = 0; for (size_t i=0; i a.length; i++) { result = dg(a[i]); if (result) break; } return result; } } void foo() { C c = new C; foreach (uint v; c) { writefln(v); } }
Re: question about foreach, opApply, and delegates
downs Wrote: Jerry Quinn wrote: Hi, all. I find myself a little confused about how foreach, opApply, and delegates interact according to the docs. Foreach on an aggregate will use the opApply call (assuming ranges aren't being used). So if we use a simple example below, what exactly is the delegate that is passed to opApply? The docs say a delegate is a pairing of an object reference and a function, where the object is passed as the 'this' parameter to the function. But that doesn't seem to be the case here. If the docs say that, they're wrong. Generally speaking, a delegate is a pairing of a function pointer and a context, where the context can be a struct pointer, a class reference *or a stackframe*, as is the case with opApply. Hope that clears things up. Yes, that helps. All 3 replies are basically the same, and the docs are clearly insufficient to describe what's actually happening in a delegate. I'll file a bug. Thanks!
Re: const?? When and why? This is ugly!
Walter Bright Wrote: When we first got into what to do with strings and const/immutable/mutable, I was definitely in the camp that strings should be mutable char[], or at worst const(char)[]. The thing is, Andrei pointed out to me, languages that are considered very good at dealing with strings (like Perl) use immutable strings. The fascinating thing about strings in such languages is: Nobody notices they are immutable, they just work. So what is it about immutability that makes strings just work in a natural and intuitive manner? The insight is that it enables strings, which are reference types, to behave exactly as if they were value types. After all, it never occurs to anyone to think that the integer 123 could be a mutable integer and perhaps be 133 the next time you look at it. If you put 123 into a variable, it stays 123. It's immutable. People intuitively expect strings to behave the same way. Only C programmers expect that once they assign a string to a variable, that string may change in place. C has it backwards by making strings mutable, and it's one of the main reasons why dealing with strings in C is such a gigantic pain. But as a longtime C programmer, I was so used to that I didn't notice what a pain it was until I started using other languages where string manipulation was a breeze. The way to do strings in D is to have them be immutable. If you are building a string by manipulating its parts, start with mutable, when finished then convert it to immutable and 'publish' it to the rest of the program. Mutable char[] arrays should only exist as temporaries. This is exactly the opposite of the way one does it in C, but if you do it this way, you'll find you never need to defensively dup the string just in case and things just seem to naturally work out. So your suggestion is to do something like: string manipulate() { char[] buf = read_20M_string_data(); initialization_mangling(buf); return (string)buf; } string 20M_string = manipulate(); As long as we don't want to mangle the storage later in the program.
Re: const?? When and why? This is ugly!
Andrei Alexandrescu Wrote: Steven Schveighoffer wrote: I think what Burton is saying is by annointing immutable(char)[] as the type string, you are essentially sending a message to developers that all strings should be immutable, and all *string parameters* should be declared immutable. What this does is force developers who want to deal in const or mutable chars have to do lots of duplication or casting, which either makes your code dog slow, or makes your code break const. Evidence is how (at least in previous releases) anything in Phobos that took an argument that was a utf8 string of characters used the parameter type string, making it very difficult to use when you don't have string types. If you want to find a substring in a string, it makes no sense that you first have to make the argument invariant. a substring function isn't saving a pointer to that data. I think the complaint is simply that string is defined as immutable(char) [] and therefore is promoted as *the only* string type to use. Using other forms (such as in char[] or const(char)[] or char[]) doesn't look like the argument is a string, when the word string is already taken to mean something else. -Steve I see. Phobos is being changed to accept in char[] instead of string wherever applicable. As far as what the default string ought to be, immutable(char)[] is the safest of the three so I think it should be that. So far this discussion has no examples. Let me toss one out as a dart board to see what folks think: char[] s = get_some_data(); // accesses a large buffer that i don't want to copy size_t pos = my_find(s, blah); s[pos] = c; string s2 = another string; size_t pos2 = my_find(s2, blah); another_func(s2[pos2 .. pos2+4]); size_t my_find(string haystack, string needle) { // Perform some kind of search that is read-only } another_func(string s) {} --- It seems like string is the obvious parameter type to use for my_find, but it doesn't work because we're passing mutable data in. Instead we have to use const(char)[], which is less intuitive and uglier. Or did I miss the thrust of the argument?
Re: problem with declaration grammar?
Ellery Newcomer Wrote: jerry quinn wrote: Ellery Newcomer Wrote: Maybe I'm missing something. The grammar shown in http://www.digitalmars.com/d/2.0/declaration.html has the following rules: BasicType2: * [ ] [ Expression ] [ Expression .. Expression ] [ Type ] delegate Parameters FunctionAttributesopt function Parameters FunctionAttributesopt Declarator: BasicType2 Declarator DeclaratorSuffixesopt BasicType2 Identifier DeclaratorSuffixesopt With this definition, I don't see how you can get Declarator-Identifier. Jerry You are correct. BasicType2 can match nothing. It should also be able to match what it does above multiple times. As I'm looking at this further, there seems to be more issues. In particular, I don't think the grammar can parse: int (*x)(char); as specified. Doing so gives (my best attempt) Decl - BasicType Declarators ; BasicType - int Declarators - DeclaratorInitializer DeclaratorInitializer - Declarator Declarator - BasicType2 Identifier DeclaratorSuffixes BasicType2 - NULL (assuming that the grammar should be revised like this) Identifier - BAD PARSE yeah, if you haven't figured out by now, the grammar is a bunch of hooey. I spent like a month building an ANTLR grammar based on the above, and then realized much of it was garbage. Then I spent two months going through the source code and rewriting most of the rules. Just got done with it a week or two ago :) That was all version 1, but it looks the same, so if memory serves the above rules should look something like this: BasicType2_x: * [ ] [ Expression ] [ Expression .. Expression ] [ Type ] delegate Parameters FunctionAttributesopt function Paramters FunctionAttributesopt BasicType2: BasicType2_x BasicType2 BasicType2_x epsilon Declarator: BasicType2 Identifier DeclaratorSuffixesopt BasicType2 ( Declarator ) DeclaratorSuffixesopt Apologies for any BNF misuse Cool. Do you feel like posting the whole thing somewhere? As an aside comment, it might be better from a grammar perspective to make usage of BasicType2 optional, rather than have the epsilon in the BasicType2 rule itself. Then every rule would consume at least 1 token, and _opt is the only expansion shortcut needed. In the form show, you can simplify the BasicType2 rule to BasicType2: BasicType2 BasicType2_x epsilon
Re: problem with declaration grammar?
Sergey Gromov Wrote: Thu, 19 Feb 2009 01:30:36 -0500, jerry quinn wrote: Christopher Wright Wrote: jerry quinn wrote: Hi there, I'm not sure if I'm missing something, but I'm having trouble seeing that a simple declaration will parse correctly with the D grammar. If we take a declaration statment like: int x = 3; we have (my best guess): DeclarationStatement - Declaration Declaration - Decl Decl - BasicType Declarators ; BasicType - int Declarators - DeclaratorInitializer DeclaratorInitializer - Declarator = Initializer Declarator - BasicType2 Identifier BasicType2 - I'm thinking that BasicType2 is optional here, rather than required as the grammar shows. Is that correct? Thanks Jerry . Declaration - Decl . Decl - BasicType Declarators . BasicType - int . Declarators - DeclaratorInitializer . DeclaratorInitializer - Declarator = Initializer We agree up to here. . Declarator - Identifier Here, you don't need BasicType2, and if you use it, you recurse, so using the rule Declarator - BasicType2 Declarator here is useless. What you describe sounds like what I'd expect. Maybe I'm missing something. The grammar shown in http://www.digitalmars.com/d/2.0/declaration.html has the following rules: BasicType2: * [ ] [ Expression ] [ Expression .. Expression ] [ Type ] delegate Parameters FunctionAttributesopt function Parameters FunctionAttributesopt Declarator: BasicType2 Declarator DeclaratorSuffixesopt BasicType2 Identifier DeclaratorSuffixesopt With this definition, I don't see how you can get Declarator-Identifier. Jerry The grammar works the other way around: int x = 3 ; int - BasicType(int) // this is either Decl or Type, need more tokens, expect Declarators, // Declarator, or Declarator2 - x - Identifier(x) // either DeclaratorInitializer (Declarators), Declarator, // IdentifierList (not expecting), StructMemberInitializer (not // expecting), or PrimaryExpression (not expecting) // therefore expecting '=' or DeclaratorSuffixes - = - = // token // Identifier(x) = - definitely DeclaratorInitializer, expecting // Initializer, that is , either void, AssignExpression, // ArrayInitializer, or StructInitializer This is incorrect. We have BasicType(int) Identifier(x) '= ' You're suggesting the Identifier begins DeclaratorInitializer, but it must start with a Declarator. We don't have one, because Declarator must start with BasicType2. This is where I think the bug in the grammar is. If BasicType2 were optional, then the parse would complete as you showed. Finita la comedia. Unfortunately, not yet fini
Re: problem with declaration grammar?
Ellery Newcomer Wrote: Maybe I'm missing something. The grammar shown in http://www.digitalmars.com/d/2.0/declaration.html has the following rules: BasicType2: * [ ] [ Expression ] [ Expression .. Expression ] [ Type ] delegate Parameters FunctionAttributesopt function Parameters FunctionAttributesopt Declarator: BasicType2 Declarator DeclaratorSuffixesopt BasicType2 Identifier DeclaratorSuffixesopt With this definition, I don't see how you can get Declarator-Identifier. Jerry You are correct. BasicType2 can match nothing. It should also be able to match what it does above multiple times. As I'm looking at this further, there seems to be more issues. In particular, I don't think the grammar can parse: int (*x)(char); as specified. Doing so gives (my best attempt) Decl - BasicType Declarators ; BasicType - int Declarators - DeclaratorInitializer DeclaratorInitializer - Declarator Declarator - BasicType2 Identifier DeclaratorSuffixes BasicType2 - NULL (assuming that the grammar should be revised like this) Identifier - BAD PARSE
problem with declaration grammar?
Hi there, I'm not sure if I'm missing something, but I'm having trouble seeing that a simple declaration will parse correctly with the D grammar. If we take a declaration statment like: int x = 3; we have (my best guess): DeclarationStatement - Declaration Declaration - Decl Decl - BasicType Declarators ; BasicType - int Declarators - DeclaratorInitializer DeclaratorInitializer - Declarator = Initializer Declarator - BasicType2 Identifier BasicType2 - I'm thinking that BasicType2 is optional here, rather than required as the grammar shows. Is that correct? Thanks Jerry
Re: problem with declaration grammar?
Christopher Wright Wrote: jerry quinn wrote: Hi there, I'm not sure if I'm missing something, but I'm having trouble seeing that a simple declaration will parse correctly with the D grammar. If we take a declaration statment like: int x = 3; we have (my best guess): DeclarationStatement - Declaration Declaration - Decl Decl - BasicType Declarators ; BasicType - int Declarators - DeclaratorInitializer DeclaratorInitializer - Declarator = Initializer Declarator - BasicType2 Identifier BasicType2 - I'm thinking that BasicType2 is optional here, rather than required as the grammar shows. Is that correct? Thanks Jerry . Declaration - Decl . Decl - BasicType Declarators . BasicType - int . Declarators - DeclaratorInitializer . DeclaratorInitializer - Declarator = Initializer We agree up to here. . Declarator - Identifier Here, you don't need BasicType2, and if you use it, you recurse, so using the rule Declarator - BasicType2 Declarator here is useless. What you describe sounds like what I'd expect. Maybe I'm missing something. The grammar shown in http://www.digitalmars.com/d/2.0/declaration.html has the following rules: BasicType2: * [ ] [ Expression ] [ Expression .. Expression ] [ Type ] delegate Parameters FunctionAttributesopt function Parameters FunctionAttributesopt Declarator: BasicType2 Declarator DeclaratorSuffixesopt BasicType2 Identifier DeclaratorSuffixesopt With this definition, I don't see how you can get Declarator-Identifier. Jerry
Re: ref?
dsimcha Wrote: == Quote from Andrei Alexandrescu (seewebsiteforem...@erdani.org)'s article Walter and I have been discussing what the regime of statically-sized arrays should be. In short, the behavior that's most consistent with everything else is to treat them as values. This also brings the problems of e.g. containers - should they have consistent value semantics (like in STL) Oh God no! I always thought the value semantics of STL containers were largely a kludge to work around the fact that C++ doesn't have garbage collection. They make it easier to use RAII for memory management, since every object has a clear owner. This leads to tons and tons of copying if code is written the obvious way, and lots of kludges to prevent it. On the other hand, for relatively small objects, the last thing I want is to have everything be a reference. This is one of the major flaws of Java, I think, from a memory efficiency standpoint. I'd hate to have to have references to 16 byte objects if I don't need the sharing behavior. Having classes with reference semantics in D largely solves this issue, doesn't it? Use a class when you have things that make sense to share, and use structs for when it makes more sense to have distinct copies. Jerry
Re: ref?
Nick Sabalausky Wrote: Jerry Quinn jlqu...@optonline.net wrote in message news:gn9i7n$2uj...@digitalmars.com... Use a class when you have things that make sense to share, and use structs for when it makes more sense to have distinct copies. Couldn't that rule conflict with cases where you'd want distinct copies but need features of classes that aren't available for structs (like inheritance)? However, with reference semantics, you have no way to achieve objects laid out in a contiguous array, unless I'm missing something. If you want inheritance, copy semantics are an issue. For example, if you have struct A : B, and an array of B, you can't put an A in it, since there's not enough space for an A (unless A adds 0 storage). If you have an array of A, inheritance isn't really buying you anything over having a B as the first member of A. Any place where you'd want to use A as a B, you can get to the member B struct directly.
Re: random cover of a range
Andrei Alexandrescu Wrote: Bill Baxter wrote: On Sat, Feb 14, 2009 at 1:03 PM, Andrei Alexandrescu seewebsiteforem...@erdani.org mailto:seewebsiteforem...@erdani.org wrote: bearophile wrote: Andrei Alexandrescu: Say at some point there are k available (not taken) slots out of n. There is a k/n chance that a random selection finds an unoccupied slot. The average number of random trials needed to find an unoccupied slot is proportional to 1/(k/n) = n/k. So the total number of random trials to span the entire array is quadratic. Multiplying that by 0.9 leaves it quadratic. It's like in hashing: if you want to fill 99% of the available space in a hash, then you take ages to find empty slots. But if you fill it only at 75-90%, then on average you need only one or two tries to find an empty slot. So your time is linear, with a small multiplicative constant. When the slots start to get mostly full, you change algorithm, copying the empty slots elsewhere. Well I don't buy it. If you make a point, you need to be more precise than such hand-waving. It's not like in hashing. It's like in the algorithm we discuss. If you make a clear point that your performance is better than O(n*n) by stopping at 90% then make it. I didn't go through much formalism, but my napkin says you're firmly in quadratic territory. Well he has a point that the number of trials required to find an empty depends not on the absolute number of empty items, but only the ratio of empties to fulls. Even your own claim about average number of trials was n/k -- not sure how you got that though. If you toss a N-side dice hoping for a specific face to show up (and stopping afterwards), how many times do you have to toss it on average? I recall (without being sure) that you need to toss it a number of times proportional to N. Could anyone confirm or deny? avg_trials(N) = sum(t = 1-inf) { 1/N * ((N-1)/N)^(t-1) } where t is the number of trials. Experimentally, it converges to N
Re: Optimizing Immutable and Purity
Walter Bright Wrote: I've been working on improving the optimizer to take advantage of immutability and purity. http://www.reddit.com/r/reddit.com/comments/7l5x4/optimizing_immutable_and_purity/ http://dobbscodetalk.com/index.php?option=com_myblogshow=Optimizing-Immutable-and-Purity.htmlItemid=29 This was an interesting read. It would be nice to see a discussion of how const is going to fit in in terms of optimization potential, since you can always cast it away. Jerry
Re: Initializing const member post-construction?
Steven Schveighoffer Wrote: Jerry Quinn wrote Hi. I'm trying to port a C++ program to D as an exercise in exploring D. As I'm doing this, I've run into a bit of confusion with the const system. I have something like class A {} class B { const A a; void init(A aa) { a = aa; } } This doesn't work, because dmd (2.020) complains that you can't initialize a const member after the constructor. The catch is that the value of aa is not available at construction time, but only later on. However, I'd still like to declare that once set, the object referred to by a is const. The C++ code used a pointer, but it seemed to me like D's references were more capable than C++'s, so I'm trying to use them. To me it seems like this should still be allowed. Even though the object referred to by a is const, the reference itself shouldn't need to be. This seems morally equivalent to: const(A)* a; which is allowed by dmd. In both cases I'm trying to tell the compiler that the object referred to by a is const. Is there a way to do what I'm trying to do? You can hide a behind a property: class B { private A aPriv; void init(A aa) { aPriv = aa; } const(A) a() const { return aPriv;} } Now, do not use aPriv anywhere else in your code, and you should be all set. Use -inline when compiling and you should see no performance penalty. Yes, that seems to be a reasonable workaround. Thanks. What's the reason for not allowing this? I was unaware you could even set a in the constructor. I don't think there's any general 'set once' type modifier. No, I didn't see one either. In fact what I was asking for is not a set once. It's to allow a reference to a const object to be reassigned, since it wasn't obvious to me that the reference itself should be kept const. Jerry
Initializing const member post-construction?
Hi. I'm trying to port a C++ program to D as an exercise in exploring D. As I'm doing this, I've run into a bit of confusion with the const system. I have something like class A {} class B { const A a; void init(A aa) { a = aa; } } This doesn't work, because dmd (2.020) complains that you can't initialize a const member after the constructor. The catch is that the value of aa is not available at construction time, but only later on. However, I'd still like to declare that once set, the object referred to by a is const. The C++ code used a pointer, but it seemed to me like D's references were more capable than C++'s, so I'm trying to use them. To me it seems like this should still be allowed. Even though the object referred to by a is const, the reference itself shouldn't need to be. This seems morally equivalent to: const(A)* a; which is allowed by dmd. In both cases I'm trying to tell the compiler that the object referred to by a is const. Is there a way to do what I'm trying to do? What's the reason for not allowing this? Thanks, Jerry