Video Codecs?
Does anybody know of any D libraries implementing or wrapping video codecs? I need to read video files (AVI or MPEG would be fine) using DIVX, XVID, or any other popular codec. In addition to playing those files in a media player control, I need to extract individual frames and perform various filtration and processing operations on them, for a computer vision project I'm about to start working on. I looked around at DSource but didn't find anything there. Any ideas? --benji
Re: Template Metaprogramming Made Easy (Huh?)
Rainer Deyke wrote: I'm not entirely happy with the way Scala handles the division between statements - Scala's rules seem arbitrary and complex - but semicolons *are* noise, no matter how habitually I use them and how much time I waste removing them afterwards. I don't know anything about scala, but I've been working on an Actionscript compiler recently (the language is based on ECMAScript, so it's very much like JavaScript in this respect) and the optional semicolon rules are completely maddening. The ECMAScript spec basically says: virtual semicolons must be inserted at end-of-line whenever the non-insertion of semicolons would result in an erroneous parse. So there are really only three ways to handle it, and all of them are insane: 1) Treat the newline character as a token (rather than as skippable whitespace) and include that token as an optional construct in every single production where it can legally occur. This results in hundreds of optional semicolons throughout the grammar, and makes the whole thing a nightmare to read, but at least it still uses a one-pass CFG. CLASS := "class" NEWLINE? IDENTIFIER NEWLINE? "{" NEWLINE? ( MEMBER NEWLINE? )* "}" 2) Use lexical lookahead, dispatched from the parser. The tokenizer determines whether to treat a newline as a statement terminator based on the current parse state (are we in the middle of a parenthetized expression?) and the upcoming tokens on the next line. This is nasty because the grammar becomes context-sensitive and conflates lexical analysis with parsing. 2) Whenever the parser encounters an error, have it back up to the beginning of the previous production and insert a virtual semicolon into the token stream. Then try reparsing. Since there might be multiple newlines contained in a single multiline expression, it might take arbitrarily many rewrite attempts before reaching a correct parse. The thing about most compiler construction tools is that they don't allow interaction between the context-guided tokenization, and they're not designed for the creation of backup-and-retry processing, or the insertion of virtual tokens into the token stream. Ugly stuff. Anyhoo, I know this is waaay off topic. But I think any language designer including optional semicolons in their language desperately deserves a good swift punch in the teeth. --benji
Re: reddit.com: first Chapter of TDPL available for free
Andrei Alexandrescu wrote: Daniel Keep wrote: Andrei Alexandrescu wrote: Michel Fortin wrote: On 2009-08-09 11:10:48 -0400, Andrei Alexandrescu said: It's also arguable that all functions in std.string should take const(char)[]. Or, you know, const(T)[], since D supports encodings other than UTF-8, despite what std.string leads you to believe. Yah, I think they should all be parameterized so they can work with various character widths and even encodings. But shouldn't they work with *ranges* in general, a string being only a specific case? That's true as well! In my dreams, me and the famous actress... oh wait, wrong dream. In my dreams, I eliminate std.string and put all of its algorithms, properly generalized, in std.algorithm, to work on more than just arrays, and more than just characters. Andrei How do you define 'tolower' on non-characters? That and others would remain specific for characters. I do help to be able to abstract functions such as e.g. strip(). Andrei How would you generalize the string functions into ordinary array functions while still taking into account the different character types? For example... dchar needle = 'f'; char[] haystack = "abcdefg"; auto index = haystack.indexOf(needle); That code is roughly equivalent to this code for generalized arrays, which seems reasonable enough... float needle = 2.0; double[] haystack = [ 1.0, 2.0, 3.0 ]; auto index = haystack.indexOf(needle); ...since "float" is implicitly castable to "double". But the string example has weird monkey-business going on under the covers, since dchar is wider than char, and therefore a single dchar element might consume multiple slots within the char[] array. Are there any analogous examples of that behavior with other types, where you'd search for a single element striding multiple indexes within an array of narrower values? --benji
Re: Naming things in Phobos - std.algorithm and writefln
Daniel Keep wrote: That way, if someone writes logging functions one day that takes formatted strings in the same way, he can reuse the convention: log logLine logFormat logLineFormat instead of "log", "logln", "logf", and "logfln". If you create a hash function, you can reuse the pattern too: hash hashLine hashFormat hashLineFormat instead of "hash", "hashln", "hashf" and "hashfln". And it goes on. How is this an improvement? If we accept that people know what the "f" and "ln" suffixes mean (and given that they will be exposed to this in the course of writing a Hello, World! program), what benefit is gained from increasing the length and complexity of the identifiers? Saying you can re-use the convention is irrelevant because the exact same thing can be said of the shorter suffixes. The thing about one-letter abbreviations is that they mean different things in different contexts. An "f" might mean "formatted" in a "writefln" function, but it means "file" in an "ifstream" and "floating point" in the "fenv" module. In those cases (and in many more), there's no convention than can be reused. You just have to memorize stuff. Memorization was a perfectly acceptable solution back in the days of C, when standard libraries were small. But I think any modern standard library, with scores of modules and hundreds (or thousands) of functions, needs a better strategy. Coming from a Java background, I much prefer to give up terseness in favor of clarity. Though I recognize that verbosity has its own pitfalls, I think it's the lesser evil. --benji
Re: DIP6: Attributes
Frank Benoit wrote: Andrei Alexandrescu schrieb: Ary Borenszweig wrote: call!(foo)(5, "hello") with varaidic args? Well some don't like to need to remember the order of arguments. Andrei Assigning the argument by name instead of order has two other benefits, i can think of... 1. on the call side, it is documented for what the given values are used. 2. it may be possible to let all parameters have default values and for example just give a value for the last parameter. This is not possible with just the parameter order. But these aren't issues with reflection. These are just the same function calling rules applied elsewhere in the language: 1) If you want to call a function: you must know its name. 2) If you want to pass parameters: you must know the correct order. I can't imagine a circumstance where someone uses reflection to call a function and knows how to create the correct set of arguments, but doesn't know what order to put them in. --benji
Re: property syntax strawman
Andrei Alexandrescu wrote: Jarrett Billingsley wrote: I think it's funny that for a week, Andrei has been arguing against throwing around new syntax to solve this problem, and that's exactly what you guys have come up with. Really, how much more complicated would this make the parser, compared to adding a new attribute? We couldn't find a good solution without adding new syntax, so this is now on the table. Adding syntax or keywords is the next thing to look at. I'd still be unsatisfied if: (a) there would be significant syntactic noise to defining a read-only property (b) we had to add a keyword Andrei The nice thing about a keyword (or an @attribute) is that it's greppable. Syntax, not so much. --b
Re: property syntax strawman
Steven Schveighoffer wrote: On Mon, 03 Aug 2009 11:18:26 -0400, Daniel Keep wrote: You can't trivially disambiguate between the getter and the setter with the current system, either. How is this a new issue? You can't *trivially* but you can do it (that's another issue that probably should be addressed in general for overloaded functions). Agreed. I don't think this is so much an issue with properties as it's an issue with overloads. A good solution that works really well for overloads will work well for properties too. Besides which, why can't you just add this: __traits(getter, aggregate.property) Problem solved. That works too. That's probably the most sensable solution I've seen. Has my vote. -Steve Me too. --b
Re: property syntax strawman
Andrei Alexandrescu wrote: Michiel Helvensteijn wrote: void empty.set(bool value) { ... } bool empty.get() { ... } and have the same meaning as my earlier example. Yah, I was thinking the same. This is my #1 fave so far. Andrei Agreed! I see the appeal of putting getter/setter pairs within a single pair of braces, since it groups them together as one logical unit. BUT... I think it's more valuable to define them as completely separate, since you sometimes want to define get/set properties with different access modifiers (protected setter & public getter == very nice). And then the brace-enclosed syntax looks kinda goofy to my eyes: property MyProperty int { public get; protected set; // WEIRD } --benji
Re: DIP6: Attributes
Steven Schveighoffer wrote: Annotations have more usages than just how to serialize. Some uses I've seen in C#: * hints to an IDE about a GUI component (what it does, what properties to allow setting via the visual builder) * hints to the debugger about when to skip debugging certain functions (to avoid stepping into mundane crap such as property getters). * hints to another program about which classes would be interesting when dynamically loading a library In Actionscript (and the Flex framework), one very handy use of annotations is to mark a public field as "bindable". class MyClass { [Bindable] public var MyField:int = 0; } In this example, whenever the "MyField" value is updated, a property-change event will be send to all listeners. The XML-based Flex framework uses those annotations to create (unidirectional or bidirectional) bindings between variables. This creates a window with two controls, a horizontal numeric slider and an image. Whenever the user drags the slider control, the width and height of the image automatically update themselves. The reason this works is that the "value" field of the "HSlider" object is marked with the "Bindable" annotation. The compiler silently converts the field into a property getter/setter pair, and the setter sends out property-change events whenever called. (Good thing Actionscript properties exist, with a syntax identical to normal fields, or else the automatic data binding wouldn't work!) The cool thing that makes this work is that the compiler can perform code transformation based on the existence of various annotations. --benji
Re: DIP6: Attributes
Don wrote: Ary Borenszweig wrote: http://www.prowiki.org/wiki4d/wiki.cgi?LanguageDevel/DIPs/DIP6 This looks like a solution in search of a problem. What's the problem being solved? Keyword proliferation for a zillion tiny features? Annotations would help with that very nicely. --benji
Re: Omissible Parentheses...
Denis Koroskin wrote: Stdout("Hello, World!").newline.newline.newline; Ugh. This is one of the few things about Tango that really drives me nuts. I hate all the usage of the opCall overload and non-parenthetized function calls. At first glance, that code doesn't make any sense to me. My brain just doesn't grok what's going on. It takes me a split second to mentally parse it. --benji
Re: The XML module in Phobos
Michel Fortin wrote: On 2009-08-01 00:04:01 -0400, Benji Smith said: But XML documents aren't really lists. They're trees. Do ranges provide an abstraction for working with trees (other than the obvious flattening algorithms, like breadth-first or depth-first traversal)? Well, it depends at what level you look. An XML document you read is first a list of bytes, then a list of Unicode characters, then you convert those characters to a list of tokens -- the Tango pull-parser sees each tag and each attribute as a token, SAX define each tag (including attributes) as a token and calls it an event -- and from that list of token you can construct a tree. The tree isn't a list though, and a range is a unidimentional list of something. You need another interface to work with the tree. But then, from the tree, create a list in one way or another (flattening, or performing an XPath query for instance) and then you can have a range representing the list of subtrees for the query if you want. That's pretty good since with a range you can lazily iterate over the results. Oh sure. I agree that a range-based way of iterating over tokens is cool. And a range-based API for walking through the results of an XPath query would be great. But the real meat and potatoes of an XML API would need to be something more DOM-like, with a tree structure. The only reason I chimed in, in the first place, was Andrei's post saying that a replacement XML parser "ideally outputs ranges". I don't think that's right. Ideally, an XML parser outputs a tree structure. Though a range-based mechanism for traversing that tree would be nice too. --benji
Re: new DIP5: Properties 2
Bill Baxter wrote: On Fri, Jul 31, 2009 at 10:09 PM, Andrei Alexandrescu wrote: Benji Smith wrote: So the clusterfuck of unenforceable and useless conventions is already here. Here's my suggestions: if you think putting parentheses on a no-arg function is stupid, then it should be a syntax error for them to exist. That wouldn't be my first choice, but it'd be a thousand times better than the situation with optional parens. --benji I agree that it's not good to have two ways of doing the same thing. Now think of it for a second: a full-blown language feature has been proposed to not fix that, but reify it. D already has a *truckload* of such features. Aliases, typedefs, renamed imports, and overloaded operators all exists solely so that a programmer can pretend that one thing is another thing, so that an API designer can more precisely express the *intent* of the code, and with semantics that are enforced by the compiler. Compared with those other features, I don't see what's so different about the properties proposals. --benji
Re: new DIP5: Properties 2
Andrei Alexandrescu wrote: Thanks for these great points. As an additional example, most ranges define the method bool empty() { ... } whereas infinite ranges define the enum enum bool empty = false; It follows that if a range user wants to be compatible with finite and infinite ranges, they always must use no "()". It would be nice if the range's definition could enforce that. Andrei Huh. How does this reconcile with your previous posts, where you said it'd probably be a bad idea for the API designer to mandate the function calling style of the API consumer? Is this the same issue, and you've changed your mind? Or do you see this as a different issue? --benji
Re: Omissible Parentheses...
Andrei Alexandrescu wrote: Denis Koroskin wrote: On Sat, 01 Aug 2009 21:04:43 +0400, Chad J wrote: Omissible Parentheses Could someone remind me why we don't remove these? So far I have - They save typing. - Removing them breaks backwards compatibility. - They allow some features of properties, but with a list of limitations and gotchas. This is not intended to be a deep discussion. I'm writing a piece on properties, so I'm gathering information. Andrei likes them. http://igsoft.net/dpolls/poll/results.php?pollid=1 http://igsoft.net/dpolls/poll/results.php?pollid=2 Andrei If I'm not mistaken, each of those polls shows a two-to-one preference for getting rid of omissable parentheses and introducing a dedicated property syntax of some kind. --benji
Re: new DIP5: Properties 2
Andrei Alexandrescu wrote: Steven Schveighoffer wrote: So to sum up, with this feature lack of parentheses would imply no action, but would not be enforced. However, it would be considered incorrect logic if the rule was not followed, similar to naming your functions something other than what they do. I am leery of such a feature. It essentially introduces a way to define conventions that are in no way useful to, or checked by, language rules. In my experience this has been a bad idea more often than not. Like it or not, that's exactly the situation we have now, with the (sometimes)-optional parentheses. Some people are using a convention of never using the optional parens. Other people use the parens only when a function an action, and avoiding them otherwise. And some other people (like me) always use the parens. So the clusterfuck of unenforceable and useless conventions is already here. Here's my suggestions: if you think putting parentheses on a no-arg function is stupid, then it should be a syntax error for them to exist. That wouldn't be my first choice, but it'd be a thousand times better than the situation with optional parens. --benji
Re: The XML module in Phobos
Michel Fortin wrote: > Benji Smith wrote: Usually, I use something like XPath to extract information from an XML doc. Something liek this: auto doc = parser.parse(xml); auto nodes = doc.select("/root//whatever[...@id]"); I can see how you might do depth-first or breadth-first traversal of the DOM tree, or inorder traversal of the SAX events, with a range. But that's now how most people use XML. Are there are other range tricks up your sleeve that would support the a DOM or XPath kind of model? A range is mostly a list of things. In the example above, doc.select could return a range to lazily evaluate the query instead of computing the whole query and returning all the elements. This way, if you only care about the first result you just take the first and don't have to compute them all. Ranges can be used everywehere there are lists, and are especially useful for lazy lists that compute things as you go. I made an XML tokenizer (similar to Tango's pull parser) with a range API. Basically, you iterate over various kinds of token made available through an Algebraic, and as you advance it parses the document to get you the next token. (It'd be more useful if you could switch on various kinds of tokens with an Algebraic -- right now you need to use "if (token.peek!OpenElementToken)" -- but that's a problem with Algebraic that should get fixed I believe, or else I'll have to use something else.) But XML documents aren't really lists. They're trees. Do ranges provide an abstraction for working with trees (other than the obvious flattening algorithms, like breadth-first or depth-first traversal)? --benji
Re: new DIP5: Properties 2
Nick Sabalausky wrote: "Andrei Alexandrescu" wrote in message news:h4lsuo$au...@digitalmars.com... For me, I get a breath of fresh air whenever I get to not write "()". I can't figure how some are missing it. Every time I call a parameterless function in D, I curse under my breath at how incredibly sloppy it is. Great, just what I need: Yet another thing that forces me to make the completely unnecessary choice between using something inconsistently or making up and sticking to a completely arbitrary convention that can't be enforced. Sloppy, sloppy, sloppy. Especially considering it's all for the sake of a "feature" that doesn't accomplish a damn thing, doesn't solve any problem, not even a trivial one, doesn't do anything but clutter the language. My thoughts exactly. --benji
Re: properties
Andrei Alexandrescu wrote: Steven Schveighoffer wrote: On Tue, 28 Jul 2009 16:08:58 -0400, Andrei Alexandrescu wrote: Steven Schveighoffer wrote: However, when I see: x.empty; I can't tell what is implied here. You can. In either C# or D language it could execute arbitrary code that you better know what it's supposed to do. D simply doesn't make it "bad style" as C# stupidly does. still not getting it, are you... Just forget it, I think this is a lost cause, I keep making the same points over and over again, and you keep not reading them. I do read them and understand them. I mean, it's not rocket surgery. At the end of the day you say "x = a.b;" looks more like sheer access because that's what happens for fields already. Then you say "a.b()" in any context looks more like an action because it's clear that there's a function call involved. But your arguments are not convincing to me, and in turn I explained why. What would you do if you were me? Andrei I totally agree with Steven's arguments (and have enjoyed reading the discussion). I think the reason he says you're "not getting it" is because your examples tend to be "a.b" whereas his examples tend to be "a.empty". In your examples, you've stripped away the distinct function/field names and presented the argument from the compiler's perspective: in terms of arbitrary symbols that might either perform a pointer dereference or a function invocation. Steve's arguments, on the other hand, are all from the perspective of the programmer. The parentheses following the identifier act as *punctuation* that clarify intent. Good? Good. --benji
Re: The XML module in Phobos
Michael Rynn wrote: I did look at the code for the xml module, and posted a suggested bug fix to the empty elements problem. I do not have access rights to updating the source repository, and at the time was too busy for this. Andrei Alexandrescu wrote: It would be great if you could contribute to Phobos. Two things I hope from any replacement (a) works with ranges and ideally outputs ranges, (b) uses alias functions instead of delegates if necessary. Interesting. Most XML parsers either produce a "Document" object, or they just execute SAX callbacks. If an XML parser returned a range object, how would you use it? Usually, I use something like XPath to extract information from an XML doc. Something liek this: auto doc = parser.parse(xml); auto nodes = doc.select("/root//whatever[...@id]"); I can see how you might do depth-first or breadth-first traversal of the DOM tree, or inorder traversal of the SAX events, with a range. But that's now how most people use XML. Are there are other range tricks up your sleeve that would support the a DOM or XPath kind of model? --benji
Re: Properties: problems
John C wrote: Chad J wrote: John C wrote: Here's a couple of annoying problems I encounter quite often with D's properties. Would having some form of property syntax fix them? 1) Array extensions: class Person { string name_; string name() { return name_; } } auto person = getPerson(); auto firstAndLast = person.name.split(' '); The above line currently requires parentheses after 'name' to compile. This one is weird. After defining getPerson() I was able to rewrite the last line into this and make it compile: auto firstAndLast = split(person.name," "); Yes, that's D's special array syntax, where free functions can be called as if they were "methods" of an array. Yeah, this is one of those nasty cases where several different features (optional parentheses on functions & automatic extension method syntax on arrays) work ok in isolation, but where they have weird wonky behavior when combined. I've seen this one before. --benji
Re: Properties: a.b.c = 3
Jarrett Billingsley wrote: The issue is that the compiler accepts no-effect modifications of temporary values as valid statements. There is no setter being invoked here, nor should there be. Or should there? In the face of a value type, should the compiler rewrite this code as auto t = a.b(); t.c = 3; a.b = t; ? The last line of the rewrite is unnecessary if a.b() returns a reference type or a byref struct. But is this what people would expect to happen? I think the compiler should only rewrite the code (as above) if a.b() returns a struct, by value. The compiler can figure that out easily enough. Depending on the return types of all the different properties in a.b.c.d.e.f = 3, there might be a few ref types and a few value types returned. Each of those subexpressions would be rewritten with the appropriate semantics. --benji
Re: Properties: a.b.c = 3
Chad J wrote: Steven Schveighoffer wrote: struct Rectangle { float x,y,w,h; } class Widget { Rectangle _rect; Rectangle rect() { return _rect; } Rectangle rect(Rectangle r) { return _rect = r; } // etc } void main() { auto widget = new Widget(); // DOES WORK: auto tmp = widget.rect; tmp.w = 200; tmp.h = 100; widget.rect = tmp; // DOES NOT WORK: // widget.rect.w = 200; // widget.rect.h = 100; } Wouldn't the compiler write: //widget.rect.w = 200 translates to auto tmp1 = widget.rect; tmp1.w = 200; widget.rect = tmp1; //widget.rect.h = 100 translates to auto tmp2 = widget.rect; tmp2.h = 100; widget.rect = tmp2; ??? Unless you want some serious optimization requirements... -Steve It would. The optimization you speak of is reference caching. I often do it by hand in deeply nested loops where it actually means a damn. It's also an optimization I think compilers should do, because it is useful in many more cases than just this. Using some manner of property syntax would not preclude the programmer from writing the optimized version of the code by hand. And, in fact, this exact kind of optimization is made very simple if the compiler uses a static single assignment (SSA) form for its internal code representation. The LLVM suite already does it. --benji
Re: Properties: a.b.c = 3
Nick Sabalausky wrote: "Zhenyu Zhou" wrote in message news:h4rfif$2os...@digitalmars.com... e.g. Rectangle rect(Rectangle r) { _rect = r; redraw(); return _rect; } If you allow widget.rect.w = 200; widget.rect.h = 100; you will have to write much more code to handle the painting correctly. and we don't want to call redraw twice here I've dealt with that sort of thing in C# and it's a trivial issue. When you write code such as the above, it's very clear that you're changing the rect twice. If that's a problem, you just do this: widget.rect = Rect(200, 100); Easy. It's kind of a moot point anyhow, because most respectable graphics frameworks will defer any rendering until all properties have been set. Something like this: class Rect { private int _w; private int _h; private boolean _dirty; property set w(int value) { _w = value; _dirty = true; } property set h(int value) { _h = value; _dirty = true; } void draw() { if (_dirty) { // rendering code _dirty = false; } } } Rendering code is *never* invoked from within a property-setter, and property values are never changed during rendering code. (Also, there's usually a separate "measurement" phase, following the manual property-setting phase, within which properties can be changed to suit the positional constraints but where no rendering occurs.) Anyhow, I think those kinds of considerations are mostly orthogonal to a discussion of properties, in the general sense, except insofar as the existence of a property syntax makes it more convenient to implement things like dirty-flag marking, property-change listeners, and the like. --benji
Re: Properties: a.b.c = 3
Chad J wrote: Chad J wrote: Bill Baxter wrote: On Wed, Jul 29, 2009 at 1:14 PM, grauzone wrote: Chad J wrote: Thinking about it a little more, the extra temporaries could run you out of registers. That still sounds like a negligable cost in most code. Temporaries can be on the stack. That's not a problem. How is that not a performance issue? The stack is in main memory. --bb This is where my knowledge starts to run a bit thin. So correct me if I'm wrong, but isn't something like the stack (or at least the top/bottom/end in use) extremely likely to be in the nearest cache (L1)? If that's the case, then this kind of dereference is going to be of the cheaper variety. Also, really deep dot chains are unlikely to happen. I just feel like this won't create many more memory accesses than there were already. Especially for people with 64 bit OSes on x86_64 that are not register starved like the 32 bit x86. On x86 you are hitting the stack all the time anyways, and the extra access or two will go unnoticed. Especially especially because, if you prevent the a.b.c = x syntax, the only thing that'll happen is you'll cause people to write all that code themselves. The same number of assignments will happen anyhow, but the user will have to write them all manually. I'll all for having the compiler automate the boilerplate stuff. Also, note that the double-assignment case only happens when assigning to value types. Assigning to reference type properties will be unaffected. --benji
Re: new DIP5: Properties 2
Andrei Alexandrescu wrote: Benji Smith wrote: 3) The existence of "magical" identifiers complicates the language design. Because the rules that apply to those magical identifiers is different than the rules applying to non-magical identifiers. Well I agree with some of your points but this is factually incorrect. There's nothing special about opXxx identifiers. The compiler simply rewrites certain operations into regular calls to those operators. That's all. I happen to find that very elegant, and in fact I'd want D to rely more often on simple rewrites instead of sophisticated special casing. Andrei I should have been more clear. I understand the rewriting part of the proposal. What I was referring to was the fact that an opGet_x identifier would shadow the declaration of a variable named "x", making it impossible, within the type itself, to directly reference the variable itself. So, in this case... class MyClass { private int x; public int opGet_x() { return x; } } ...either the compiler would issue an error (my preference) or the private field would take precedence (within the class) in any name resolution logic. From outside the class, there would be no problem. Also related, is this case: class MyClass { public int x; public int opGet_x(); } I assume the compiler would have to throw an error. Eventually, people would learn to give their fields different names than their properties (probably with an underscore prefix or something). Anyhow, in both cases, I'd consider these to be changes to the language's identifier semantics. They're not *huge* changes, but the introduction of those magical rewriting rules is still something a programmer would have to be aware of. And those are the reasons I'd rather shy away from magical name-rewriting mechanisms. (NOTE: I have no problem with the implementation of the other operator overloading names. They work exactly as expected.) --benji
Re: new DIP5: Properties 2
On Mon, Jul 27, 2009 at 4:34 PM, Chad This seems to me like it adds more syntactic clutter than adding a keyword would: PropertyDecl: PropertyGetter PropertySetter PropertyGetter: Type 'opGet_' Identifier '(' ')' PropertySetter: Type 'opSet_' Identifier '(' Type ')' Jarrett Billingsley wrote: Nono, they're just functions with "magical" names. I agree with Chad. The opGet_X syntax is terrible, with both syntactic and semantic clutter. To whit: 1) This convention has four syntactic parts: "op", "Get|Set", "_", and an identifier. Adding a new keyword (like "property") would only add one syntactic element to the declaration. 2) A property is not an operator. So the "op" prefix is lying to you. 3) The existence of "magical" identifiers complicates the language design. Because the rules that apply to those magical identifiers is different than the rules applying to non-magical identifiers. There's nothing wrong with the mechanics of the proposal. I especially like how it allows the getter/setter to have different protection attributes, and that it allows each function to be overridden separately. You could even implement the getter in a read-only superclass and implement the setter in a read-write subclass. Nice! But I think the same thing can be more elegantly written using the "property" keyword: private int _x; public property int X() { return _x; } protected property X(int value) { _x = value; } The only disadvantage I see there is the introduction of a keyword. And that's definitely a disadvantage. But, compared to the "op" syntax, I think it's the lesser of two evils. --benji
Re: Reddit: why aren't people using D?
Andrei Alexandrescu wrote: Rainer Deyke wrote: Nick Sabalausky wrote: I can't be nice about this: Any programmer who has *any* aggrivation learning any even remotely sane property syntax is an idiot, period. They'd have to be incompetent to not be able to look at an example like this: // Fine, I'll throw DRY away: int _width; int width { get { return _width; } set(v) { _width = v; } } And immediately know exactly how the poroperty syntax works. I don't know exactly how this is supposed to work. The basic idea is obvious, but: - How does it interact with inheritance? Can I override properties? Can I partially override properties (setter but not getter)? - Can I write a setter that accepts another type? - Can I write a templated setter that accepts *all* types? If so, how? - Can I create a delegate from a setter/getter? If so, how? - I assume that getters/setters can have individual access specifiers (i.e. private/protected/public), but is that really the case? Dedicated property syntax isn't hard to learn, but it's not as obvious as you make it our to be. Note that none of these issues exist with opGet_foo, which follows the same rules as all functions. +1 Andrei Also agree. The C# syntax is a little too complex for my taste, and it makes some things ugly or impossible (like, what if I want a public getter but a protected setter?) I like the mechanics of the opGet_Xxx proposal, but aesthetically, it just makes my eyes bleed (as do the other "op" functions, like opApply, that don't technically overload any "op"erators). For my money, the best solution is a simple "property" keyword as a function modifier. Only functions with the "property" modifier would be allowed to pose as fields (getters called without parens, setters called using assignment syntax). But, in all other respects, they should act just like functions. --benji
Re: Developing a plan for D2.0: Getting everything on the table
Andrei Alexandrescu wrote: Benji Smith wrote: Maybe if Andrei put together a list of missing Phobos functionality, we could get people from the community to flesh out the libs. I think we'd need at a minimum: That would be great. In general, it would be awesome to gather more contributions from the community. There's a thirst to contribute and we'll be glad to involve this group into some serious design e.g. for concurrency support, as well as accept code for functionality that belongs to the standard library. In the bulleted list above there are many mini-projects that are confined enough to be done by one willing individual in a relatively short time. Are there contributor guidelines somewhere? For example, should the author of a container library prefer classes or structs? Should other (non-container) modules accept container classes as arguments? Or only container interfaces (if there are any such things) or just ranges? Is it appropriate to use an empty struct purely as a namespace for the introduction of free functions? Or should free functions be placed at the module level? Is it appropriate to define multiple classes, structs, templates, etc within a single module? What considerations should inform the decision regarding the placement of module boundaries? What constitutes appropriate/inappropriate usage of opCall? Anyhoo... Point being, Phobos_1 was a hodgepodge of different conventions and styles. Tango_1 was considerably better, in terms of stylistic uniformity. But it used a very different set of idioms than Phobos_1 (lots of predicate functions, "sink" delegates, etc). Probably any author contributing code to Phobos_2 should spend a little time getting up to speed with the preferred idioms before writing code. I suspect that my humble little JSON parser uses styles and idioms that would clash with the majority of Phobos_2 (since my programming pedigree comes from Java, C#, JavaScript, and Perl much moreso than C or C++). --benji
Re: Developing a plan for D2.0: Getting everything on the table
Jason House wrote: Other, less technical items: • A clear and "finalized" spec. If it isn't implemented, it should be yanked (or clearly marked as pending) • A plan for library support. Not just Tango, but also Phobos. D1 Phobos could not evolve. In D1, I enthusiastically used Tango. I haven't used D2 yet (because all my code is heavily tied to the Tango libs), but I suspect that when D2 is finalized, I'll port everything over to Phobos. I've read all the Phobos2 development discussions here (most notably the range discussions), but what about the feature disparities between the two libraries. What types of functionality are currently present in Tango but absent in Phobos? Maybe if Andrei put together a list of missing Phobos functionality, we could get people from the community to flesh out the libs. For example, I have a JSON parser implementation that I'd be happy to contribute. --benji
Re: Dynamic D Library
Nick Sabalausky wrote: "BCS" wrote in message news:78ccfa2d4382d8cbd4ffb8875...@news.digitalmars.com... Reply to teo, Well, to some extent this will do the job, but at some point you would need to extract some stuff and put it in libraries, so that it can be reused by other applications. Think about an application which consists of several executables which work together and should share common stuff. Wouldn't you extract it into a library? Yes, as a static .lib type library that is statically linked in as part of the .exe. Exactly, and it doesn't even have to be a compiled .lib, it could just be a source-library. I do that all the time. I really don't see any reason to think that modularity and code-reuse would require linking to be dynamic. At least certainly not in the general case. I agree that source-level modularity, and static linking are preferable most of the time (especially given D's dependency on templates, which don't work so well in compiled libraries). But there are plenty of legitimate situations that mandate dynamic linking, and I think the standard library needs a better solution than what it currently has. --benji
Re: Dynamic D Library
Daniel Keep wrote: If we have, for example, a C app that is using D code as plugins, each plugin will ask the system for "dmdrt.dll" using its minimal embedded DDL stub. But since they're system calls, we should only get one copy. I'm not sure exactly how the system will share that library, though; whether it's per-process or system-wide. In any case, the DDL stub should be able to pull in the full DDL from dmdrt.dll and then use that to link everything together. The nice bonus of this is that DDL just becomes an implementation detail AND we can say "yes, we can do DLLs in D!" even if we're only using them to contain a DDL payload. The one downside I can think of is that if you DID want to distribute a D plugin for a C/C++ program, you'd also need to ship dmdrt.dll alongside it. Although, in that case, it probably wouldn't hurt anything (aside from memory usage) to simply statically link the runtime and standard library in; if the host app is C/C++, then the plugins probably won't be able to stomp all over each other. My primary use of D right now is to build DLLs for C++ applications, so I'd be very annoyed if the standard Windows DLL functionality became more convoluted. For custom loading into D applications, why even bother using a DLL as a container? Why not design a file format (maybe even DDL as it currently exists) and use that as the primary dynamic loading & linking machanism, on all platforms? --benji
Re: Dynamic D Library
Jarrett Billingsley wrote: On Thu, Jul 16, 2009 at 4:44 PM, teo wrote: For two, there is *no problem* with creating D libraries on any platform other than Windows, and it is entirely through Windows' fault that it has the problems it does with DLLs. Well, let us assume that you can create dynamic libraries in D and you need to include in each of them Phobos (later maybe just the D Runtime). What is the benefit of that? Can you imagine all your nice dynamic libraries (DLLs, SOs, etc.) written in D and all of them including a huge “payload”? Wouldn't it be better just a simple library only containing the stuff you need? I don't think you're getting it. ON WINDOWS, DLLs are not allowed to have unresolved externals. So if you create a DLL in D, yes, Phobos will be linked in. THERE IS NOTHING THAT CAN BE DONE ABOUT THAT. It's a limitation on the way DLLs work. ON EVERY OTHER OPERATING SYSTEM (Linux, Unix, OSX, *whatever*), shared libraries CAN have unresolved externals, so Phobos *does not* have to be included in the shared libraries. Shared libraries ALREADY work the way you expect them to on every OS besides Windows. The ONLY way to solve the problem with DLLs on Windows is to not use DLLs. Java solves it by not using any platform-dependent libraries, instead using its own .class files. This is *exactly* what DDL does. So, I'm not sure what you see as the problem here. DDL works fine on Windows. Use it. You learn something new everyday. That's pretty cool. Incidentally, this is exactly the kind of stuff that I'd love to see built right into DRuntime or Phobos. I don't have a use for it right now (cuz my project is simple enough not to need dynamic loading), but in the future, I'd be reluctant to use DDL because: 1) Dynamic loading is something that, to me, seems completely fundamental to the runtime system, and I'd be hesitant to trust a third-party library to keep up-to-date with the current compiler & standard library. 2) DDL isn't even really a third-party library. It's more like a fourth-party, since (I assume) it really requires the h3r3tic patch to work correctly. Building this kind of functionality into the standard library would make those issues irrelevant. These kinds of issues are the ones that excite me the most and are the things I'd like to see D pay the most attention to. From my perspective, features of the runtime and standard library are often much more compelling than new language features. --benji
Re: Number literals (Was: Re: Case Range Statement ..)
Andrei Alexandrescu wrote: Benji Smith wrote: Andrei Alexandrescu wrote: Anyhow... it would be a bummer if the negative atmosphere as of late in the group would cause people like you just lose interest. I can't understand what's going on. I think it would help if you weren't so condescending to people all the time. People don't like that much. I understand. My perception is that negativity predates my being condescending, which roots from exasperation. For every annoying message of mine there are dozens patient messages making a similar point. But you're right, if a point is made the wrong way its correctness is not that relevant anymore. I empathize. I enjoy issuing a sly and well-worded skewer just as much as the next guy. But, when those kinds of retorts are perceived as coming from the top down, they create resentment. Like it or not, you're "the man". :) --benji
Re: Number literals (Was: Re: Case Range Statement ..)
Andrei Alexandrescu wrote: Anyhow... it would be a bummer if the negative atmosphere as of late in the group would cause people like you just lose interest. I can't understand what's going on. I think it would help if you weren't so condescending to people all the time. People don't like that much.
Re: optlink on multicore machines
Derek Parnell wrote: On Tue, 30 Jun 2009 20:54:55 +0200, dennis luehring wrote: Walter Bright schrieb: BCS wrote: I IS running fine on 3 or 4 multicore machines around here. That's a mystery, then. thats the wonderfull world of hard to catch and reproduce multithreading problems - hope D will help here in the future Ok then ... so optlink is going to be rewritten in D - excellent! And good luck to the brave developer too. Just out of curiosity... Why is a linker so hard to write? A few years ago, I developed a small domain specific language and implemented its compiler, outputting bytecode for a very specialized (and limited purpose) virtual machine. In my case, I decided it was easier to give good error messages if the compiler & linker were a single entity. I've always been annoyed by the discrepancy between compilers and linkers (mostly because build tools have their own special languages, pointlessly different than the development language). So my compiler combined compilation and linking into a single step. Every time the compiler encountered an "import" statement, it checked to see whether a symbol table existed for the imported module and, if not, it added the module to the parse queue. After processing a new module, it would add the resultant code into a namespace-aware symbol table for the given module. Once the parse queue was empty, I checked for unresolved symbols, cyclic dependency errors, etc. If there were no other referential errors (and if all the other semantic checks passed), then I'd start the code-generation process at the main entry point. The whole program was represented as a DAG, and writing bytecode was as simple as traversing that graph. Since the "linking" behavior was built right into the compiler, it was a piece of cake. Anyhow... Whenever someone on the NG complains about optlink, the inevitable conclusion is that it would be a huge undertaking to produce a new or improved linker. Why? Seems to me that a new linker implementation would be relatively straightforward. There are really only three steps: 1) Parse object files. 2) Create DAG structures using references in those object files. 3) Walk the graph, copying the code (with rewritten addresses) into the final executable. Is it really more complex than that? What am I missing? (Caveat: I don't know much about Windows PE, or any of the many other object file formats. Still, though... it doesn't seem like it could be THAT difficult. The compiler has already done most of the tricky stuff.) --benji
Re: std.string and std.algorithm: what to do?
Andrei Alexandrescu wrote: Yah, I defined enum CaseSensitive { no, yes } Minor nitpick: there are lots of different ways to canonicalize text before performing a comparison. Ascii case conversions are just one way. Instead of an enum with a yes/no value, what about future-proofing it with something more along the lines of... enum CaseSensitivity { None, Ascii, UnicodeChar, UnicodeSurrogatePair } ...or something like that. The yes/no enum will outlive its usefulness before long. --benji
Re: I wish I could use D for everything
Brad Roberts wrote: I'm going to play devils advocate too... struct ctor/dtor's are simplifiers. They remove a hard to explain difference and aren't even a little bit hard to understand. Ideally, that would be true. But there are some wonky rules abound struct ctors, static opCall, and struct literals that I can never quite remember. --benji
Re: RFC: naming for FrontTransversal and Transversal ranges
Andrei Alexandrescu wrote: Also something that wasn't discussed that much is the connection of whatever design we devise, with the GC. I am mightily excited by the possibility to operate without GC or with tight reference counting, and I thought many around here would share that excitement. If we go for non-gc ways of memory management, that will probably affect container design. Just out of curiosity, why do you like reference counting more than mark/sweep for containers? --benji
Re: Splitter quiz / survey
Brad Roberts wrote: Actually, perl is a risky language to take _syntax_ from, but _semantics_ aren't nearly as dangerous. Obviously there's some semantics that are horrible (see it's OOP mechanisms), but parts of the rest are quite good. I grip and groan every time I find myself having to touch perl code, but it's rarely due to non-syntactical issues. This is one of my favorite rants, anywhere on the world wide internets: http://steve.yegge.googlepages.com/ancient-languages-perl If nothing else, at least read the "Snake Eyes" section. It's not the syntax that make perl so bad. Sure, it takes some getting used to. But when the rubber hits the road, it's just syntax, and anyone can learn it. The semantics, though, are a complete and utter trainwreck. Even after two years of working at a company where perl was the primary development language, I still never felt comfortable unless I had the camel book within arm's reach. But amid that insanity there are a few gems. Most notably: regular expressions. And string splitting is largely based on the regex engine. So it's not too shocking to me that D might be influenced by it. On the other hand, I agree with most of the other people in this thread, that option (4) was the best of the possible splitting behaviors. --benji
Re: Keyword 'dynamic' of C#4
Unknown W. Brackets wrote: I wonder what the overhead times were. He should've timed them both and listed them separately. For example, is DynamicMethod a complete win, or is the dynamic keyword cheaper as far as base cost? Actually, he does. It's at the bottom of the "second look" post: Compile Time Bound: 6 ms Dynamically Bound with dynamic keyword: 45ms Dynamically Bound with MethodInfo.Invoke - 10943ms Dynamically Bound with DynamicMethod - 8ms --benji
Re: Fully dynamic d by opDotExp overloading
Danny Wilson wrote: Now let's go from that obvious observation to opDotExp() You know the class uses opDotExp() because it said so in the docs. Examples that could really benifit from this are: - XMLRPC and other kinds of remoting - Quick access to: XML / JSON / Yaml / Config files / DB access - Calling DLLs without bindings - Lots more All these would mention it in their docs, guaranteed. Because they use opDotExp it's implicitly mentioned. I don't think anyone would tell a documentation generator to list all public methods except opDotExp .. that would be just braindead. And you could generate the docs yourself if you have to code.. Incidentally, one ugly problem with using opDotExp is that the underlying invocation might allow characters that aren't legal in D identifiers. For example, let's say I have a dynamic object wrapping a JavaScript library, and I want to access a JQuery object. JavaScript allows the '$' character to appear in identifiers, and the JQuery people cleverly used that name for one of their core objects (which, I think, acts as an ID registry, or something like that). So, this is a perfectly legal JQuery expression: var a = $("hello"); Using the opDotExp syntax, I'd ideally prefer to call it like this: auto a = js.$("hello"); But the compiler will reject that syntax, since '$' isn't a legal D identifier. Of course, in cases like that, we'll just use some sort of dynamic invocation method: auto a = js.invoke("$", "hello"); Which makes me think this whole discussion is kind of a waste of time, since every single implementation of opDotExp is going to end up delegating to a string-based dispatcher method anyhow. THAT'S the really interesting discussion. In fact, I think I'll start a new topic... --benji
Re: The new, new phobos sneak preview
Andrei Alexandrescu wrote: Daniel Keep wrote: Actually, I've been thinking and I realised that in 95% of cases, you can assume a range is resumable if it has no references. Well I'm not so sure. How about a range around an integral file handle or socket? If ranges can advertise their resumability, it wouldn't be hard to write a simple template wrapper that provides resumability to an underlying non-resumable range. --benji
Re: Associative arrays with void values
bearophile wrote: Benji Smith: Especially since an associative array should have a .keys property that returns a set. I don't agree. I think associative arrays should have .keys/.values/.items that return a lazy view that acts like a .set/.list/.list of pairs. Such "lazy views" don't actually store anything, they are very light. This design is now present in Python3, Java and I have done very similar things in my dlibs (named xkeys/xvalues/xitems in my dlibs, but xkeys isn't a set-like thing yet). Actually I think we do agree. From an API perspective (rather than an implementation perspective), I think the .keys property should generally return a lazily constructed result (object? struct? I don't really care). But I think it should conform to some standardized notion of "set-ness" (interface? concept? again, I don't care). HashSets are a perfectly acceptable implementation for me, as are Set interfaces, but I know some people won't like them, and those impl details aren't a big deal to me. But whatever notion the language uses for its "Set" construct should be the same dohickey used by the AA .keys property. (Incidentally, I also think the natural set operations, like intersection and mutual exclusion, are just as handy for maps as for sets.) It's less semantically clean to define certain set operations on AAs, because for example you have to decide what to do when keys are equal but their values are not. You can avoid such semantic troubles altogether performing set operations just only on the lazy view of the keys. You just have to define those operations on pairs rather than just on single values (for example, the union of two maps is naturally a multimap). --benji
Re: Associative arrays with void values
dsimcha wrote: On the other hand, I'm not sure if it makes sense from a consistency perspective to have AAs as a builtin, first class type and sets as a library type. I'm not sure whether this argues more for AAs being a library type or sets being builtin, but the inconsistency is just weird. Especially since an associative array should have a .keys property that returns a set. (Incidentally, I also think the natural set operations, like intersection and mutual exclusion, are just as handy for maps as for sets.) The natural conclusion is that AAs should be library types. I like the fact that D provides literal syntax for AAs, but I think the correct implementation is for the compiler to pass the values from those literal expressions into a library type constructor. --benji
Re: bigfloat
bearophile wrote: Benji Smith: // Defaults to using built-in associative array type auto assocArray = [ "hello" : "world ]; // Uses my own custom type. auto hashtable = MyHashTableType!(string, string) [ "hello" : "world ]; In the second case the type inference of the compiler may find the types from the AA literal itself: auto hashtable = MyHashTableType ["hello" : "world]; Bye, bearophile If that were the case, I'd want the compiler to scan *all* the key/value pairs for instances of derived types (rather than just being based on the first K/V pair, like is currently the case with other array literals). For example (using tango classes, where HttpGet and HttpPost are both subclasses of HttpClient): // Type is: MyHashTableType!(string, HttpClient) auto hashtable = MyHashTableType [ "get" : new HttpGet(), "post" : new HttpPost() ];
Re: bigfloat
Daniel Keep wrote: Andrei Alexandrescu wrote: dsimcha wrote: Well, now that I understand your proposal a little better, it makes sense. I had wondered why the current AA implementation uses RTTI instead of templates. Even better would be if only the default implementation were in Object, and a user could somehow override which implementation of AA is given the blessing of pretty syntax by some pragma or export alias or something, as long as the implementation conforms to some specified compile-time interface. Great! For now, I'd be happy if at least the user could hack their import path to include their own object.d before the stock object.d. Then people can use straight D to implement the AssocArray they prefer. Further improvements of the scheme will then become within reach! Andrei dmd -object=myobject.d stuff.d That would require the user to duplicate everything in object, which is a little messy. Maybe it would be a good idea to break object itself into a bunch of public imports to core.internal.* modules, then allow this: dmd -sub=core.internal.aa=myaa stuff.d Of course, it's probably simpler still to have this: dmd -aatype=myaa.AAType stuff.d -- Daniel Instead, what if the literal syntax was amended to take an optional type name, like this: // Defaults to using built-in associative array type auto assocArray = [ "hello" : "world ]; // Uses my own custom type. auto hashtable = MyHashTableType!(string, string) [ "hello" : "world ]; You could accomplish that pretty easily, as long as the custom type had a no-arg constructor and a function with the signature: void add(K key, V val) --benji
Re: Tango: Out of Date Installation Instructions
Christopher Wright wrote: Benji Smith wrote: Anyhow, the particular error I'm getting when I try to compile my code (using "dsss build") is this: module FileConduit cannot read file 'tango\io\device\FileConduit.d' Does tango.io.device.FileConduit still exist? It doesn't in my copy of tango. You're right! Problem solved! I could have sworn I was using the 0.99.7 version of tango before, but I guess I had been using an older release. You don't need to compile FileConduit, but the frontend needs to know a lot of stuff that would be difficult or impossible to get from a .lib file -- things like function return types and parameter types, or templates. It's basically the same as needing a C header file, even though you have the compiled library. Gotcha. I keep forgetting how much metainformation is lost in the d compilation process. Thanks for your help! --benji
Re: Tango: Out of Date Installation Instructions
Moritz Warning wrote: On Sat, 21 Feb 2009 13:46:48 -0500, Benji Smith wrote: I just set up a new (Windows) computer, after working with the same DMD/Tango/DWin/DSSS installation for the last six or eight months. And for the life of me, I can't get my code to compile on the new machine. The Tango installation instructions seem to be somewhat out of date, since they describe installing tango on top of an existing DMD installation, while the tango distributions for DMD all include the compiler and don't require a pre-existing DMD installation: http://dsource.org/projects/tango/wiki/WindowsInstall Anyhow, the particular error I'm getting when I try to compile my code (using "dsss build") is this: module FileConduit cannot read file 'tango\io\device\FileConduit.d' This is my sc.ini file (unmodified from the tango install): [Environment] LIB="%...@p%\..\lib" DFLAGS="-...@p%\..\import" -version=Tango -defaultlib=tango-base-dmd.lib -debuglib=tango-base-dmd.lib -L+tango-user-dmd.lib linkcm...@p%\link.exe Since it references the "tango-user-dmd.lib" file, I wonder why it even needs to include the FileConduit.d source file. Why doesn't it just use the lib? Much appreciation to anyone who can help get me rolling again! And I'd be happy to help rewrite the Tango installation instructions once I understand the correct installation procedure. --benji I think the best is to join #d.tango on Freenode IRC. Aha. Is that where all the tango-related discussion happens these days? I considered posting to the dsource tango forum, but it's such a low-volume group, I might not get a response for a week or more. I posted here because of the high-volume. Assuming for a moment that I don't want to install an IRC client just to resolve this one issue, where is the best place to ask Tango questions? --benji
Tango: Out of Date Installation Instructions
I just set up a new (Windows) computer, after working with the same DMD/Tango/DWin/DSSS installation for the last six or eight months. And for the life of me, I can't get my code to compile on the new machine. The Tango installation instructions seem to be somewhat out of date, since they describe installing tango on top of an existing DMD installation, while the tango distributions for DMD all include the compiler and don't require a pre-existing DMD installation: http://dsource.org/projects/tango/wiki/WindowsInstall Anyhow, the particular error I'm getting when I try to compile my code (using "dsss build") is this: module FileConduit cannot read file 'tango\io\device\FileConduit.d' This is my sc.ini file (unmodified from the tango install): [Environment] LIB="%...@p%\..\lib" DFLAGS="-...@p%\..\import" -version=Tango -defaultlib=tango-base-dmd.lib -debuglib=tango-base-dmd.lib -L+tango-user-dmd.lib linkcm...@p%\link.exe Since it references the "tango-user-dmd.lib" file, I wonder why it even needs to include the FileConduit.d source file. Why doesn't it just use the lib? Much appreciation to anyone who can help get me rolling again! And I'd be happy to help rewrite the Tango installation instructions once I understand the correct installation procedure. --benji
Re: Is str ~ regex the root of all evil, or the leaf of all good?
Some of the things I'd like to see in the regex implementation: All functions accepting a compiled regex object/struct should also accept a string version of the pattern (and vice versa). Some implementations (Java) only accept the compiled version in some places and the string pattern in other places. That's annoying. Just like with ordinary string-searching functions, you should be able to specify a start position (and maybe an end position) for the search. Even if the match exists somewhere in the string, it fails if not found within the target slice. Something like this: auto text = "ABCDEFG"; auto pattern = regex("[ABCEFG]"); // returns false, because the char at position 3 does not match auto result = match(text, 3); // this should be exactly equivalent (but the previous version // uses less memory, and ought to work with infinite ranges, whereas // the slice version wouldn't make any sense) auto equivalent = match(text[3..$]); I've needed to use this technique in a few cases to implement a simple lexical scanner, and it's a godsend, if the regex engine supports it (though most don't). Finally, it'd be extremely cool if the regex compiler automatically eliminated redundant nodes from its NFA, converting as much of it as possible to a DFA. I did some work on this a few years ago, and it's actually remarkably simple to implement using prefix trees. // These two expressions produce an identical set of matches, // but the first one is functionally an NFA, while the second // one is a DFA. auto a = regex("(cat|car|cry|dog|door|dry)"); auto b = regex("(c(?:a[tr]|ry)|d(?:o(?:g|or)|ry)"); In cases where the expression can only be partially simplified, you can leave some NFA nodes deep within the tree, while still DFA-ifying the rest of it: auto a = regex("(attitude|attribute|att.+ion"); auto b = regex("(att(?:itude|ribute|.+ion)"); It's a very simple transformation, increases speed (dramatically) for complex regular expressions (especially those produced dynamically at runtime by combining large sets of unrelated target expressions), and it reliably produces equivalent results with the inefficient version. The only really tricky part is if the subexpressions have their own capturing groups, in which case the DFA transformation screws up the ordinal-numbering of the resultant captures. Anyhoo... I don't have any strong feelings about the function names (though I'd rather have functions that operators, like "~", for searching and matching). And I don't have any strong feelings about whether the compiled regex is an object or a struct (though I prefer reference semantics over value semantics for regexen, and right now, I think that makes objects the (slightly) better choice). Thanks for your hard work! I've implemented a small regex engine before, so I know it's no small chunk of effort. Regular expressions are my personal favorite "tiny language", and I'm glad to see them get some special attention in phobos2. --benji
Re: Is str ~ regex the root of all evil, or the leaf of all good?
And how do you combine them? "repeat, ignorecase"? Writing and parsing such options becomes a little adventure in itself. I think the "g", "i", and "m" flags are popular enough if you've done any amount of regex programming. If not, you'll look up the manual regardless. Perhaps, string.match("a[b-e]", Regex.Repeat | Regex.IgnoreCase); might be better? I don't find "gmi" immediately clear nor self-documenting. I prefer the enum options too. But not vociferously. I could live with the single-char flags. --benji
Re: default random object?
Don wrote: Benji Smith wrote: Don wrote: Andrei Alexandrescu wrote: Benji Smith wrote: Benji Smith wrote: Maybe a NumericInterval struct would be a good idea. It could be specialized to any numeric type (float, double, int, etc), it would know its own boundaries, and it'd keep track of whether those boundaries were open or closed. The random functions would take an RND and an interval (with some reasonable default intervals for common tasks like choosing elements from arrays and random-access ranges). I have a Java implementation around here somewhere that I could port to D if anyone is interested. --benji Incidentally, the NumericInterval has lots of other interesting applications. For example auto i = NumericInterval.UBYTE.intersect(NumericInterval.SBYTE); bool safelyPolysemious = i.contains(someByteValue); auto array = new double[123]; auto i = NumericInterval.indexInterval(array); bool indexIsLegal = i.contains(someIndex); Using a numeric interval for generating random numbers would be, in my opinion, totally ideal. double d = uniform(NumericInterval.DOUBLE); // Any double value I've never been in a situation in my life where I thought, hey, a random double is exactly what I'd need right now. It's a ginormous interval! Andrei It's worse than that. Since the range of double includes infinity, a uniform distribution must return +-infinity with probability 1. It's nonsense. Way to miss the forest for the trees. You guys telling me can't see any legitimate use for a NumericInterval type? And that it wouldn't be convenient to use for random number generation within that interval? So the full double range was a dumb example. But that wasn't really the point, was it? --benji On the contrary, I've been giving NumericInterval considerable thought. One key issue is whether a NumericInterval(x1, x2) must satisfy x1 <= x2 (the _strict_ definition), or whether it is also permissible to have x2<=x1 (ie, you can specify the two endpoints in reverse order; the interval is then between min(x1,x2) and max(x1, x2)). This is an issue because I've noticed is that when I want to use it, I often have related pairs of values. eg. Suppose u is the interval {x1, x2}. There's a related v = {f(x1), f(x2)}. Unfortunately although x1<=x2, f(x1) may not be <= f(x2). So v is not an interval in the _strict_ sense. But it satisfies the _relaxed_ definition. I don't see any idealogical reason for requiring x2 >= x1. But the public API of the interval will probably have functions or properties returning the "lowerBound" and "upperBound". And the implementations of the "containsValue", "intersect", and "overlap" functions are all more straightforward to write if you know in advance which value is which, potentially switching them in the constructor. Of course, if you switch the values, do you also switch the boundary open/closed boundaries? What about this case: auto i = Interval!("[)")(1000, -1000); Which side of the range is open, and which is closed? Does the "[)" argument apply to the natural order of the range (closed on its lower bound) or does it apply to the order of the arguments in the function (closed on its leftmost argument)? As long as the behavior is well documented, I think it'd be fine either way. But I also think it'd be reasonable to throw an exception if the arguments are in the wrong order. --benji
Re: memory-mapped files
Andrei Alexandrescu wrote: This all would make perfect sense if the performance was about the same in the two cases. But in fact memory mapping introduced a large *pessimization*. Why? I am supposedly copying less data and doing less Pessimization? What a great word! I've never heard that before! --benji
Re: default random object?
Don wrote: Andrei Alexandrescu wrote: Benji Smith wrote: Benji Smith wrote: Maybe a NumericInterval struct would be a good idea. It could be specialized to any numeric type (float, double, int, etc), it would know its own boundaries, and it'd keep track of whether those boundaries were open or closed. The random functions would take an RND and an interval (with some reasonable default intervals for common tasks like choosing elements from arrays and random-access ranges). I have a Java implementation around here somewhere that I could port to D if anyone is interested. --benji Incidentally, the NumericInterval has lots of other interesting applications. For example auto i = NumericInterval.UBYTE.intersect(NumericInterval.SBYTE); bool safelyPolysemious = i.contains(someByteValue); auto array = new double[123]; auto i = NumericInterval.indexInterval(array); bool indexIsLegal = i.contains(someIndex); Using a numeric interval for generating random numbers would be, in my opinion, totally ideal. double d = uniform(NumericInterval.DOUBLE); // Any double value I've never been in a situation in my life where I thought, hey, a random double is exactly what I'd need right now. It's a ginormous interval! Andrei It's worse than that. Since the range of double includes infinity, a uniform distribution must return +-infinity with probability 1. It's nonsense. Way to miss the forest for the trees. You guys telling me can't see any legitimate use for a NumericInterval type? And that it wouldn't be convenient to use for random number generation within that interval? So the full double range was a dumb example. But that wasn't really the point, was it? --benji
Re: default random object?
Benji Smith wrote: Maybe a NumericInterval struct would be a good idea. It could be specialized to any numeric type (float, double, int, etc), it would know its own boundaries, and it'd keep track of whether those boundaries were open or closed. The random functions would take an RND and an interval (with some reasonable default intervals for common tasks like choosing elements from arrays and random-access ranges). I have a Java implementation around here somewhere that I could port to D if anyone is interested. --benji Incidentally, the NumericInterval has lots of other interesting applications. For example auto i = NumericInterval.UBYTE.intersect(NumericInterval.SBYTE); bool safelyPolysemious = i.contains(someByteValue); auto array = new double[123]; auto i = NumericInterval.indexInterval(array); bool indexIsLegal = i.contains(someIndex); Using a numeric interval for generating random numbers would be, in my opinion, totally ideal. double d = uniform(NumericInterval.DOUBLE); // Any double value auto i = NumericInterval.parse("[ 123, 456.789 )"); double random = uniform!(double)(i, rng); --benji
Re: default random object?
Andrei Alexandrescu wrote: Steve Schveighoffer wrote: 4. While we're at it, should uniform(a, b) generate by default something in [a, b] or [a, b)? [a,b) Every other piece of range-like code is zero based, and excludes the upper bound. This should be no different. It makes the code simpler too. I tried both versions, and it turns out my code is almost never simpler with open integral intervals. Most of the time I need something like: auto x = uniform(rng, -100, 100); auto y = uniform(rng, 0, 100); and I need to remember to actually ask for 101 instead of 100. True, when you want a random index in an array, open intervals are more convenient. One purity-based argument is that in a random number you may actually ask for the total range: auto big = uniform(rng, uint.max / 2, uint.max); If the interval is open I can't generate uint.max. Anyway, I checked the C++ API and it turns out they use closed intervals for integers and open intervals for reals. I know there's been a lot of expert scrutiny there, so I suppose I better copy their design. Andrei Maybe a NumericInterval struct would be a good idea. It could be specialized to any numeric type (float, double, int, etc), it would know its own boundaries, and it'd keep track of whether those boundaries were open or closed. The random functions would take an RND and an interval (with some reasonable default intervals for common tasks like choosing elements from arrays and random-access ranges). I have a Java implementation around here somewhere that I could port to D if anyone is interested. --benji
Re: std.string and ranges
bearophile wrote: I have taken a look at the docs for dice(), I don't like its name because isn't intuitive at all, but its usage is easy. The usage of the function I have suggested is a bit more higher level. An possible alternative design for such function is to take in input an already sorted array of the weights (beside the iterable of the items), this may speed up this function a bit (it just needs to call the algorithm for bisect search, I presume). FWIW, I've implemented this sort of thing before. In my implementation, it was called ProbabilisticChooser(T), and I could instantiate it either with a pair of parallel arrays or with a HashMap(T, double). In my case, I didn't lump it in with my other random-number related code, because I had a set of other classes implementing the Chooser(T) interface. Some of them were random and some were deterministic, but they all provided the same "choose" function on top of a "choice strategy" implementation. --benji
Re: Compiler as dll
BCS wrote: Hello Walter, Instead, what you can do is simply dude up command line arguments, spawn the command line compiler, and collect the result. The one main thing I see not working there is memory-to-memory compiles. I'd love to be able to build a function as a string, call the compiler and get back a function pointer. I think also, with a compiler-as-dll, it'd have separate modules for lexing, parsing, optimizing, code-generation, and linking. As a user of that compiler DLL, I might like to write my own AST visitor, wrapping all function calls (or scope blocks) with tracing statements before sending them into the rest of the pipeline. Those are the kinds of things that I think would be especially cool with a CompilerServices module in the standard library. Also, consider this: someone could implement AST macros as a library! --benji
Re: Any chance to call Tango as Extended Standard Library
Don wrote: Lars Ivar Igesund wrote: Don wrote: druntime should certainly not become any bigger (in scope), as that would defeat the purpose of separating the runtime from userspace in the first place. The topic of common userspace functionality should be kept separate from the topic of druntime. I think you are confusing druntime (the project) with the D runtime. druntime includes the gc as well the runtime, though they are seperate. I see no reason why including core modules in the druntime project would destroy the seperation. Really, this is entirely a question of naming. core.XXX seems to me to be the perfect namespace, certainly for the key math modules which I'm most concerned about (std.math/(tango.math.Math, tango.math.IEEE), and possibly also the low-level bigint routines. These are all functionality which is closely tied to the compiler). Totally agree. Although the name 'druntime' implies it'll only contain the runtime, I think it ought to contain all the common functionality that virtually all applications and libraries will absolutely need: the runtime itself, gc, TypeInfo, math, containers (including ranges), algorithms, string processing, date/time, and IO. Without those commonalities, any "compatibility" between Phobos and Tango will be purely illusory. Whether the commonality is realized within druntime, or within some other low-level common library (like "dcore"), is immaterial to me. And actually, I don't really care whether Phobos and Tango have their own implementations. But there should be an API (interfaces? concepts? some new template-interface mechanism? doesn't matter.) that both Phobos and Tango implement, so that library consumers can seamlessly pass low-level objects between Phobos and Tango dependent libraries. --benji
Re: Any chance to call Tango as Extended Standard Library
IUnknown wrote: Agree. Which is why I said the problems you are facing seem to be non-technical. I'm suggesting that the D library developers should pick one and axe the other. *I* think what more important is to have one single set of containers in a single style rather than have two separate ones. There is going to be complaining for sure from the current developers, but in my opinion, the target of having a single standard library (with core and advanced modules to suit system/ app programming) is more important than having to make a difficult choice. Totally agree. While I personally prefer the Java-style containers, I'd gladly accept the STL-style containers if it meant unification of Phobos and Tango. Having druntime is nice, sure, but application-level code and high-level libraries will bake the container API into their public interfaces, and any code that uses both the Phobos and Tango libraries would have to perform a zillion tedious conversions. In my mind, the things that need a unified API are (in order of importance): 1. GC and TypeInfo 2. Data structures 3. Algorithms 4. String processing 5. Date & Time 6. IO Everything else (encryption, compression, sockets, regular expressions, could have a totally different API in Tango & Phobos and I wouldn't care much. Having a common runtime (GC and TypeInfo) is a neat trick, but pretty useless if the data structures and algorithms are entirely different. And, while I'm perfectly willing to accept either Java-style or STL-style containers, I'd also really appreciate it if the design anticipates and supports custom implementations (because I almost always end up implementing my own multimaps, multisets, circular queues etc) --benji
Re: new principle of division between structures and classes
Andrei Alexandrescu wrote: Benji Smith wrote: Actually, memory allocated in the JVM is very cache-friendly, since two subsequent allocations will always be adjacent to one another in physical memory. And, since the JVM uses a moving GC, long-lived objects move closer and closer together. Well the problem is that the allocation size grows quickly. Allocate and dispose one object per loop -> pages will be quickly eaten. for (...) { JavaClassWithAReallyLongNameAsTheyUsuallyAre o = factory.giveMeOne(); o.method(); } The escape analyzer could catch that the variable doesn't survive the pass through the loop, but the call to method makes things rather tricky (virtual, source unavailable...). So then we're facing a quickly growing allocation block and consequently less cache friendliness and more frequent collections. Andrei Good point. I remember five years ago when people were buzzing about the possible implementation of escape analysis in the next Java version, and how it'd move a boatload of intermediate object allocations from the heap to the stack. Personally, I don't think it'll ever happen. They can't even agree on how to get *closures* into the language. I personally think the JVM and the HotSpot compiler are two of the greatest accomplishments of computer science. But the Java community has long since jumped the shark, and I don't expect much innovation from that neighborhood anymore. --benji
Re: Properties
Nick Sabalausky wrote: "John Reimer" wrote in message news:28b70f8c119528cb42154f5d1...@news.digitalmars.com... Hello Nick, But, of course, adjectives (just like "direct/indirect objects") are themselves nouns. Umm... May I make a little correction here? Adjectives are not nouns. They are used to /describe/ nouns. -JJR Maybe there's examples I'm not thinking of, and I'm certainly no natural language expert, but consider these: "red" "ball" "red ball" By themselves, "red" and "ball" are both nouns. Stick the noun "red" in front of ball and "red" becomes an adjectve. (FWIW, "dictionary.reference.com" lists "red" as both a noun and an adjective). The only adjectives I can think of at the moment (in my admittedly quite tired state) are words that are ordinarly nouns on their own. I would think that the distinguishing charactaristic of an adjective vs noun would be the context in which it's used. Maybe I am mixed up though, it's not really an area of expertise for me. Incidentally... I used to do a lot of work in natural language processing, and our parsing heuristics were built to handle a lot of adjective/noun ambiguity. For example, in the phrase "car dealership", the word "car" is an adjective that modifies "dealership". For the most part, you can treat adjectives and nouns as being functionally identical, and the final word in a sequence of adjectives and nouns becomes the primary noun of the noun-phrase. --benji
Re: new principle of division between structures and classes
Andrei Alexandrescu wrote: Weed wrote: Weed пишет: 4. Java and C# also uses objects by reference? But both these of language are interpreted. I assume that the interpreter generally with identical speed allocates memory in a heap and in a stack, therefore authors of these languages and used reference model. Neither of these languages are interpreted, they both are compiled into native code at runtime. Oh!:) but I suspect such classes scheme somehow correspond with JIT-compilation. I guess allocation in Java occurs fast because of usage of the its own memory manager. I do not know how it is fair, but: http://www.ibm.com/developerworks/java/library/j-jtp09275.html "Pop quiz: Which language boasts faster raw allocation performance, the Java language, or C/C++? The answer may surprise you -- allocation in modern JVMs is far faster than the best performing malloc implementations. The common code path for new Object() in HotSpot 1.4.2 and later is approximately 10 machine instructions (data provided by Sun; see Resources), whereas the best performing malloc implementations in C require on average between 60 and 100 instructions per call (Detlefs, et. al.; see Resources)." Meh, that should be taken with a grain of salt. An allocator that only bumps a pointer will simply eat more memory and be less cache-friendly. Many applications aren't that thrilled with the costs of such a model. Andrei Actually, memory allocated in the JVM is very cache-friendly, since two subsequent allocations will always be adjacent to one another in physical memory. And, since the JVM uses a moving GC, long-lived objects move closer and closer together. Of course, Java programmers tend to be less careful about memory allocation, so they usually consume **way** too much memory and lose the benefits of the moving GC. Java-the-langauge and Java-the-platform are very efficient, even if the java frameworks and java patterns tend to bloated and nasty. --benji
Re: Properties
Miles wrote: dsimcha wrote: I figure the vast majority of cases are going to be primitive types anyhow (mostly ints), Yes, this is very true. and if someone defines operator overloads such that foo += 1 produces totally different observable behavior than foo = foo + 1, that's just too ridiculously bad a design to even take seriously. Sure. It is bad coding style, it is ugly and the programmer who does this should be called for a meeting with his boss. But there are still ways to have sane behavior, even in such situations. See below. What do you think? Is it worth ignoring a few hard cases in exchange for solving most cases simply and elegantly and without adding any new constructs? Instead, I think it is more sane to use temporaries. -- { auto tmp = __get_foo(); tmp += 1; __set_foo(foo); } -- It is the safest this way, principle of least surprise. If the caller does foo += 1, it will get that; if it does foo = foo + 1, it will still get that; if it does foo.call(), again, the behavior is still sane. We must first attack the semantics. This have sane semantics. Then let the compiler optimize that as far as possible. The compiler inlines the getter and setter calls, then optimizes away the temporary, etc. Or the compiler could prevent properties from returning mutable structs? class MyClass { private MyStruct _a; private MyStruct _b; public property a { const get { return _a; } // legal } public property a { get { return _b; } // compile-time error } } On the flip-side, the compiler could intervene at the call site, preventing modification of structs when directly accessed via a property invocation. Though I think the first solution is better. --benji
Re: foreach ... else statement
Walter Bright wrote: I keep thinking I should put on a "Compiler Construction" seminar! Sign me up!
Re: Randomness in built-in .sort
dsimcha wrote: == Quote from Bill Baxter (wbax...@gmail.com)'s article Actually, a function to sort multiple arrays in parallel was exactly what I was implementing using .sort. So that doesn't sound like a limitation to me at all. :-) --bb Am I (and possibly you) the only one(s) who think that sorting multiple arrays in parallel should be standard library functionality? The standard rebuttal might be "use arrays of structs instead of parallel arrays". This is a good idea in some situations, but for others, parallel arrays are just plain better. Furthermore, with D's handling of variadic functions, generalizing any sort to handle parallel arrays is easy. I've written my own parallel-array quicksort implementation (several times over, in many different languages). Parallel sorting is one of my favorite tricks, and I think it definitely belongs in the standard library. --benji
Re: Non-nullable references, again
Michel Fortin wrote: On 2009-01-02 10:37:50 -0500, Benji Smith said: case a?.b:c: break; is this case ((a?).b): c: break; or is it case (a ? b : c ) : break; How's this different from case a*.b: is this: case ((a*).b): or is it: case ((a) * (.b)): Think of it like this: MyClass?.myProperty It's a static field of the nullable MyClass type. --benji
Re: Improvement to switch-case statement
Yigal Chripun wrote: Maybe it's just me but all those C-style statements seem so arcane and unnessaccary. real OOP languages do not need control structures to be part of the language - they're part of the class library instead. Here's some Smalltalk examples: (and D-like comparable code) Interesting... Assuming the core language had no control structures, how would library authors implement them? If the language itself lacked IF, ELSE, SWITCH, CASE, DO, WHILE, FOR, and presumably GOTO... how exactly would you go about implementing them in a library? --benji
Re: Improvement to switch-case statement
Yigal Chripun wrote: also, some thought should be spent on getting rid of the ternary op syntax since it interferes with other things that could be added to the language (nullable types, for instance) Heresy! The ternary operator is one of my favorite tools. If you want to get rid of it, I think you'd have to make the 'if' statement into an expression (which would open up a whole other can of worms). As I showed earlier, there's no ambiguity between the ternary operator and the nullable type suffix. The ambiguity comes from the case statement. In my opinion, the best way to resolve that ambiguity is to add braces around case statments, like this: switch (x) { case 1 { ... } case 2 { ... } default { ... } } But that might make it impossible to implement Duff's Device (blessing or curse? personally, I don't care). And it might imply the creation of a new scope with each case. Currently, a case statement doesn't introduce its own lexical scope. Anyhoo... Don't mess with the ternary operator!! :) --benji
Re: Non-nullable references, again
Don wrote: Benji Smith wrote: Daniel Keep wrote: Benji Smith wrote: Don wrote: Denis Koroskin wrote: Foo nonNull = new Foo(); Foo? possiblyNull = null; > Wouldn't this cause ambiguity with the "?:" operator? At first, thought you might be right, and that there would some ambiguity calling constructors of nullable classes (especially given optional parentheses). But for the life of me, I couldn't come up with a truly ambiguous example, that couldn't be resolved with an extra token or two of lookahead. The '?' nullable-type operator is only used in type declarations, not in expressions, and the '?:' operator always consumes a few trailing expressions. Also (at least in C#) the null-coalesce operator (which converts nullable objects to either a non-null instance or a default value) looks like this: MyClass? myNullableObj = getNullableFromSomewhere(); MyClass myNonNullObj = myNullableObj ?? DEFAULT_VALUE; Since the double-hook is a single token, it's also unambiguous to parse. --benji Disclaimer: I'm not an expert on compilers. Plus, I just got up. :P The key is that the parser has to know what "MyClass" means before it can figure out what the "?" is for; that's why it's context-dependant. D avoids this dependency between compilation stages, because it complicates the compiler. When the parser sees "MyClass", it *doesn't know* that it's a type, so it can't distinguish between a nullable type and an invalid ?: expression. At least, I think that's how it works; someone feel free to correct me if it's not. :P -- Daniel I could be wrong too. I've done a fair bit of this stuff, but I'm no expert either :) Nevertheless, I still don't think there's any ambiguity, as long as the parser can perform syntactic lookahead predicates. The grammar would look something like this: DECLARATION := IDENTIFIER // Type name ( HOOK )? // Is nullable? IDENTIFIER // Var name ( SEMICOLON// End of declaration | ( OP_ASSIGN // Assignment operator EXPRESSION // Assigned value ) ) Whereas the ternary expression grammar would look something like this: TERNARY_EXPRESSION := IDENTIFIER // Type name HOOK // Start of '?:' operator EXPRESSION // Value if true COLON // End of '?:' operator EXPRESSION // Value if false The only potential ambiguity arises because the "value if true" expression could also just be an identifier. But if the parser can construct syntactic predicates to perform LL(k) lookahead with arbitrary k, then it can just keep consuming tokens until it finds either a SEMICOLON, an OP_ASSIGN, or a COLON (potentially, recursively, if it encounters another identifier and hook within the expression). Still, though, once it finds one of those tokens, the syntax has been successfully disambiguated, without resorting to a semantic predicate. It requires arbitrary lookahead, but it can be done within a context-free grammar, and all within the syntax-processing portion of the parser. Of course, I could be completely wrong too :) --benji case a?.b:c: break; is this case ((a?).b): c: break; or is it case (a ? b : c ) : break; Damn. I got so distracted with the ternary operator, I forgot about case statements. --benji
Re: Non-nullable references, again
Daniel Keep wrote: Benji Smith wrote: Don wrote: Denis Koroskin wrote: Foo nonNull = new Foo(); Foo? possiblyNull = null; > Wouldn't this cause ambiguity with the "?:" operator? At first, thought you might be right, and that there would some ambiguity calling constructors of nullable classes (especially given optional parentheses). But for the life of me, I couldn't come up with a truly ambiguous example, that couldn't be resolved with an extra token or two of lookahead. The '?' nullable-type operator is only used in type declarations, not in expressions, and the '?:' operator always consumes a few trailing expressions. Also (at least in C#) the null-coalesce operator (which converts nullable objects to either a non-null instance or a default value) looks like this: MyClass? myNullableObj = getNullableFromSomewhere(); MyClass myNonNullObj = myNullableObj ?? DEFAULT_VALUE; Since the double-hook is a single token, it's also unambiguous to parse. --benji Disclaimer: I'm not an expert on compilers. Plus, I just got up. :P The key is that the parser has to know what "MyClass" means before it can figure out what the "?" is for; that's why it's context-dependant. D avoids this dependency between compilation stages, because it complicates the compiler. When the parser sees "MyClass", it *doesn't know* that it's a type, so it can't distinguish between a nullable type and an invalid ?: expression. At least, I think that's how it works; someone feel free to correct me if it's not. :P -- Daniel I could be wrong too. I've done a fair bit of this stuff, but I'm no expert either :) Nevertheless, I still don't think there's any ambiguity, as long as the parser can perform syntactic lookahead predicates. The grammar would look something like this: DECLARATION := IDENTIFIER // Type name ( HOOK )? // Is nullable? IDENTIFIER // Var name ( SEMICOLON// End of declaration | ( OP_ASSIGN // Assignment operator EXPRESSION // Assigned value ) ) Whereas the ternary expression grammar would look something like this: TERNARY_EXPRESSION := IDENTIFIER // Type name HOOK // Start of '?:' operator EXPRESSION // Value if true COLON // End of '?:' operator EXPRESSION // Value if false The only potential ambiguity arises because the "value if true" expression could also just be an identifier. But if the parser can construct syntactic predicates to perform LL(k) lookahead with arbitrary k, then it can just keep consuming tokens until it finds either a SEMICOLON, an OP_ASSIGN, or a COLON (potentially, recursively, if it encounters another identifier and hook within the expression). Still, though, once it finds one of those tokens, the syntax has been successfully disambiguated, without resorting to a semantic predicate. It requires arbitrary lookahead, but it can be done within a context-free grammar, and all within the syntax-processing portion of the parser. Of course, I could be completely wrong too :) --benji
Re: Non-nullable references, again
Don wrote: Denis Koroskin wrote: Foo nonNull = new Foo(); Foo? possiblyNull = null; > Wouldn't this cause ambiguity with the "?:" operator? At first, thought you might be right, and that there would some ambiguity calling constructors of nullable classes (especially given optional parentheses). But for the life of me, I couldn't come up with a truly ambiguous example, that couldn't be resolved with an extra token or two of lookahead. The '?' nullable-type operator is only used in type declarations, not in expressions, and the '?:' operator always consumes a few trailing expressions. Also (at least in C#) the null-coalesce operator (which converts nullable objects to either a non-null instance or a default value) looks like this: MyClass? myNullableObj = getNullableFromSomewhere(); MyClass myNonNullObj = myNullableObj ?? DEFAULT_VALUE; Since the double-hook is a single token, it's also unambiguous to parse. --benji
Re: dmd platform support - poll
Walter Bright wrote: What platforms for dmd would you be most interested in using? .net jvm mac osx 32 bit intel mac osx 64 bit intel linux 64 bit windows 64 bit freebsd 32 bit netbsd 32 bit other? My choice, BY FAR, would be Mac OSX 32 bit. When I started my current D project, six months ago or so, it looked like GDC mac support was on a steady, healthy incline, and that choosing D as a development platform would yield full mac compatibility in the very near future. Supporting the mac platform is absolutely essential for my product, so without a viable D compiler, I'll have to rewrite a bunch of code in C, which would make me very sad. The 64-bit win/lin/mac platforms would also be nice to have. But as long as every 64-bit OS provides legacy support for 32-bit apps, I consider a 64-bit D compiler pretty low priority, for the type of work I'm currently doing. The bsd platform is completely off my radar screen, and given Walter's limited resources, I'd be disappointed to see these given much attention. .NET and the JVM would be compelling for the marketing of D, making the language seem more mainstream and widely accessible. But I personally wouldn't find much use in them. The primary benefit of D, for me, is escaping from the confines of the VMs and being able to do system-level stuff. I frequently develop for both the CLR and the JVM, but when I do so, I prefer C# and Java, respectively. I can't think of a single reason I'd ever elect to write D for a VM platform. --benji PS -- Game console platforms would be very very cool as well. For me, I'd be interested in the cell processor, for the PS3. HOWEVER, since the native PS3 SDK is proprietary (with a $10,000 licensing fee), and since linux on the PS3 uses artificially crippled hardware, my interest in developing anything on the PS3 is little more than casual curiosity.