Video Codecs?

2009-09-30 Thread Benji Smith

Does anybody know of any D libraries implementing or wrapping video codecs?

I need to read video files (AVI or MPEG would be fine) using DIVX,
XVID, or any other popular codec. In addition to playing those files in 
a media player control, I need to extract individual frames and perform 
various filtration and processing operations on them, for a computer 
vision project I'm about to start working on.


I looked around at DSource but didn't find anything there. Any ideas?

--benji


Re: Template Metaprogramming Made Easy (Huh?)

2009-09-11 Thread Benji Smith

Rainer Deyke wrote:

I'm not entirely happy with the way Scala handles the division between
statements - Scala's rules seem arbitrary and complex - but semicolons
*are* noise, no matter how habitually I use them and how much time I
waste removing them afterwards.


I don't know anything about scala, but I've been working on an 
Actionscript compiler recently (the language is based on ECMAScript, so 
it's very much like JavaScript in this respect) and the optional 
semicolon rules are completely maddening.


The ECMAScript spec basically says: virtual semicolons must be inserted 
at end-of-line whenever the non-insertion of semicolons would result in 
an erroneous parse.


So there are really only three ways to handle it, and all of them are 
insane:


1) Treat the newline character as a token (rather than as skippable 
whitespace) and include that token as an optional construct in every 
single production where it can legally occur. This results in hundreds 
of optional semicolons throughout the grammar, and makes the whole thing 
a nightmare to read, but at least it still uses a one-pass CFG.


CLASS :=
  "class"
  NEWLINE?
  IDENTIFIER
  NEWLINE?
  "{"
  NEWLINE?
  (
MEMBER
NEWLINE?
  )*
  "}"

2) Use lexical lookahead, dispatched from the parser. The tokenizer 
determines whether to treat a newline as a statement terminator based on 
the current parse state (are we in the middle of a parenthetized 
expression?) and the upcoming tokens on the next line. This is nasty 
because the grammar becomes context-sensitive and conflates lexical 
analysis with parsing.


2) Whenever the parser encounters an error, have it back up to the 
beginning of the previous production and insert a virtual semicolon into 
the token stream. Then try reparsing. Since there might be multiple 
newlines contained in a single multiline expression, it might take 
arbitrarily many rewrite attempts before reaching a correct parse.


The thing about most compiler construction tools is that they don't 
allow interaction between the context-guided tokenization, and they're 
not designed for the creation of backup-and-retry processing, or the 
insertion of virtual tokens into the token stream.


Ugly stuff.

Anyhoo, I know this is waaay off topic. But I think any language 
designer including optional semicolons in their language desperately 
deserves a good swift punch in the teeth.


--benji


Re: reddit.com: first Chapter of TDPL available for free

2009-08-10 Thread Benji Smith

Andrei Alexandrescu wrote:

Daniel Keep wrote:


Andrei Alexandrescu wrote:

Michel Fortin wrote:

On 2009-08-09 11:10:48 -0400, Andrei Alexandrescu
 said:


It's also arguable that all functions in std.string should take
const(char)[]. Or, you know, const(T)[], since D supports encodings
other than UTF-8, despite what std.string leads you to believe.

Yah, I think they should all be parameterized so they can work with
various character widths and even encodings.

But shouldn't they work with *ranges* in general, a string being only
a specific case?

That's true as well! In my dreams, me and the famous actress... oh wait,
wrong dream. In my dreams, I eliminate std.string and put all of its
algorithms, properly generalized, in std.algorithm, to work on more than
just arrays, and more than just characters.

Andrei


How do you define 'tolower' on non-characters?


That and others would remain specific for characters. I do help to be 
able to abstract functions such as e.g. strip().


Andrei


How would you generalize the string functions into ordinary array 
functions while still taking into account the different character types?


For example...

   dchar needle = 'f';
   char[] haystack = "abcdefg";
   auto index = haystack.indexOf(needle);

That code is roughly equivalent to this code for generalized arrays, 
which seems reasonable enough...


   float needle = 2.0;
   double[] haystack = [ 1.0, 2.0, 3.0 ];
   auto index = haystack.indexOf(needle);

...since "float" is implicitly castable to "double".

But the string example has weird monkey-business going on under the 
covers, since dchar is wider than char, and therefore a single dchar 
element might consume multiple slots within the char[] array.


Are there any analogous examples of that behavior with other types, 
where you'd search for a single element striding multiple indexes within 
an array of narrower values?


--benji


Re: Naming things in Phobos - std.algorithm and writefln

2009-08-05 Thread Benji Smith

Daniel Keep wrote:

That way, if someone writes logging functions one day that takes
formatted strings in the same way, he can reuse the convention:

log
logLine
logFormat
logLineFormat

instead of "log", "logln", "logf", and "logfln". If you create a hash
function, you can reuse the pattern too:

hash
hashLine
hashFormat
hashLineFormat

instead of "hash", "hashln", "hashf" and "hashfln". And it goes on.


How is this an improvement?  If we accept that people know what the "f"
and "ln" suffixes mean (and given that they will be exposed to this in
the course of writing a Hello, World! program), what benefit is gained
from increasing the length and complexity of the identifiers?

Saying you can re-use the convention is irrelevant because the exact
same thing can be said of the shorter suffixes.


The thing about one-letter abbreviations is that they mean different 
things in different contexts. An "f" might mean "formatted" in a 
"writefln" function, but it means "file" in an "ifstream" and "floating 
point" in the "fenv" module.


In those cases (and in many more), there's no convention than can be 
reused. You just have to memorize stuff. Memorization was a perfectly 
acceptable solution back in the days of C, when standard libraries were 
small. But I think any modern standard library, with scores of modules 
and hundreds (or thousands) of functions, needs a better strategy.


Coming from a Java background, I much prefer to give up terseness in 
favor of clarity. Though I recognize that verbosity has its own 
pitfalls, I think it's the lesser evil.


--benji


Re: DIP6: Attributes

2009-08-04 Thread Benji Smith

Frank Benoit wrote:

Andrei Alexandrescu schrieb:

Ary Borenszweig wrote:

call!(foo)(5, "hello")

with varaidic args?

Well some don't like to need to remember the order of arguments.

Andrei


Assigning the argument by name instead of order has two other benefits,
i can think of...
1. on the call side, it is documented for what the given values are used.
2. it may be possible to let all parameters have default values and for
example just give a value for the last parameter. This is not possible
with just the parameter order.


But these aren't issues with reflection. These are just the same 
function calling rules applied elsewhere in the language:


1) If you want to call a function: you must know its name.

2) If you want to pass parameters: you must know the correct order.

I can't imagine a circumstance where someone uses reflection to call a 
function and knows how to create the correct set of arguments, but 
doesn't know what order to put them in.


--benji


Re: property syntax strawman

2009-08-03 Thread Benji Smith

Andrei Alexandrescu wrote:

Jarrett Billingsley wrote:

I think it's funny that for a week, Andrei has been arguing against
throwing around new syntax to solve this problem, and that's exactly
what you guys have come up with.  Really, how much more complicated
would this make the parser, compared to adding a new attribute?


We couldn't find a good solution without adding new syntax, so this is 
now on the table. Adding syntax or keywords is the next thing to look 
at. I'd still be unsatisfied if:


(a) there would be significant syntactic noise to defining a read-only 
property


(b) we had to add a keyword


Andrei


The nice thing about a keyword (or an @attribute) is that it's 
greppable. Syntax, not so much.


--b


Re: property syntax strawman

2009-08-03 Thread Benji Smith

Steven Schveighoffer wrote:
On Mon, 03 Aug 2009 11:18:26 -0400, Daniel Keep 
 wrote:

You can't trivially disambiguate between the getter and the setter with
the current system, either.  How is this a new issue?


You can't *trivially* but you can do it (that's another issue that 
probably should be addressed in general for overloaded functions).  


Agreed. I don't think this is so much an issue with properties as it's 
an issue with overloads. A good solution that works really well for 
overloads will work well for properties too.




Besides which, why can't you just add this:

  __traits(getter, aggregate.property)

Problem solved.


That works too.  That's probably the most sensable solution I've seen.  
Has my vote.


-Steve


Me too.

--b


Re: property syntax strawman

2009-08-03 Thread Benji Smith

Andrei Alexandrescu wrote:

Michiel Helvensteijn wrote:

void empty.set(bool value) { ... }
bool empty.get() { ... }

and have the same meaning as my earlier example.


Yah, I was thinking the same. This is my #1 fave so far.

Andrei


Agreed!

I see the appeal of putting getter/setter pairs within a single pair of 
braces, since it groups them together as one logical unit.


BUT...

I think it's more valuable to define them as completely separate, since 
you sometimes want to define get/set properties with different access 
modifiers (protected setter & public getter == very nice). And then the 
brace-enclosed syntax looks kinda goofy to my eyes:


   property MyProperty int {
  public get; protected set; // WEIRD
   }


--benji


Re: DIP6: Attributes

2009-08-03 Thread Benji Smith

Steven Schveighoffer wrote:
Annotations have more usages than just how to serialize.  Some uses I've 
seen in C#:


* hints to an IDE about a GUI component (what it does, what properties 
to allow setting via the visual builder)
* hints to the debugger about when to skip debugging certain functions 
(to avoid stepping into mundane crap such as property getters).
* hints to another program about which classes would be interesting when 
dynamically loading a library


In Actionscript (and the Flex framework), one very handy use of 
annotations is to mark a public field as "bindable".


   class MyClass {

  [Bindable]
  public var MyField:int = 0;

   }

In this example, whenever the "MyField" value is updated, a 
property-change event will be send to all listeners. The XML-based Flex 
framework uses those annotations to create (unidirectional or 
bidirectional) bindings between variables.







This creates a window with two controls, a horizontal numeric slider and 
an image. Whenever the user drags the slider control, the width and 
height of the image automatically update themselves.


The reason this works is that the "value" field of the "HSlider"
object is marked with the "Bindable" annotation. The compiler silently 
converts the field into a property getter/setter pair, and the setter 
sends out property-change events whenever called.


(Good thing Actionscript properties exist, with a syntax identical to 
normal fields, or else the automatic data binding wouldn't work!)


The cool thing that makes this work is that the compiler can perform 
code transformation based on the existence of various annotations.


--benji


Re: DIP6: Attributes

2009-08-03 Thread Benji Smith

Don wrote:

Ary Borenszweig wrote:

http://www.prowiki.org/wiki4d/wiki.cgi?LanguageDevel/DIPs/DIP6


This looks like a solution in search of a problem. What's the problem 
being solved?


Keyword proliferation for a zillion tiny features? Annotations would 
help with that very nicely.


--benji


Re: Omissible Parentheses...

2009-08-03 Thread Benji Smith

Denis Koroskin wrote:

Stdout("Hello, World!").newline.newline.newline;


Ugh. This is one of the few things about Tango that really drives me 
nuts. I hate all the usage of the opCall overload and non-parenthetized 
function calls.


At first glance, that code doesn't make any sense to me. My brain just 
doesn't grok what's going on. It takes me a split second to mentally 
parse it.


--benji


Re: The XML module in Phobos

2009-08-01 Thread Benji Smith

Michel Fortin wrote:

On 2009-08-01 00:04:01 -0400, Benji Smith  said:


But XML documents aren't really lists. They're trees.

Do ranges provide an abstraction for working with trees (other than 
the obvious flattening algorithms, like breadth-first or depth-first 
traversal)?


Well, it depends at what level you look. An XML document you read is 
first a list of bytes, then a list of Unicode characters, then you 
convert those characters to a list of tokens -- the Tango pull-parser 
sees each tag and each attribute as a token, SAX define each tag 
(including attributes) as a token and calls it an event -- and from that 
list of token you can construct a tree.


The tree isn't a list though, and a range is a unidimentional list of 
something. You need another interface to work with the tree.


But then, from the tree, create a list in one way or another 
(flattening, or performing an XPath query for instance) and then you can 
have a range representing the list of subtrees for the query if you 
want. That's pretty good since with a range you can lazily iterate over 
the results.


Oh sure. I agree that a range-based way of iterating over tokens is 
cool. And a range-based API for walking through the results of an XPath 
query would be great. But the real meat and potatoes of an XML API would 
need to be something more DOM-like, with a tree structure.


The only reason I chimed in, in the first place, was Andrei's post 
saying that a replacement XML parser "ideally outputs ranges".


I don't think that's right. Ideally, an XML parser outputs a tree structure.

Though a range-based mechanism for traversing that tree would be nice too.

--benji


Re: new DIP5: Properties 2

2009-08-01 Thread Benji Smith

Bill Baxter wrote:

On Fri, Jul 31, 2009 at 10:09 PM, Andrei
Alexandrescu wrote:

Benji Smith wrote:

So the clusterfuck of unenforceable and useless conventions is already
here. Here's my suggestions: if you think putting parentheses on a no-arg
function is stupid, then it should be a syntax error for them to exist. That
wouldn't be my first choice, but it'd be a thousand times better than the
situation with optional parens.

--benji

I agree that it's not good to have two ways of doing the same thing. Now
think of it for a second: a full-blown language feature has been proposed to
not fix that, but reify it.


D already has a *truckload* of such features. Aliases, typedefs, renamed 
imports, and overloaded operators all exists solely so that a programmer 
can pretend that one thing is another thing, so that an API designer can 
more precisely express the *intent* of the code, and with semantics that 
are enforced by the compiler.


Compared with those other features, I don't see what's so different 
about the properties proposals.


--benji


Re: new DIP5: Properties 2

2009-08-01 Thread Benji Smith

Andrei Alexandrescu wrote:

Thanks for these great points. As an additional example, most ranges 
define the method


bool empty() { ... }

whereas infinite ranges define the enum

enum bool empty = false;

It follows that if a range user wants to be compatible with finite and 
infinite ranges, they always must use no "()". It would be nice if the 
range's definition could enforce that.



Andrei


Huh. How does this reconcile with your previous posts, where you said 
it'd probably be a bad idea for the API designer to mandate the function 
calling style of the API consumer?


Is this the same issue, and you've changed your mind? Or do you see this 
as a different issue?


--benji


Re: Omissible Parentheses...

2009-08-01 Thread Benji Smith

Andrei Alexandrescu wrote:

Denis Koroskin wrote:
On Sat, 01 Aug 2009 21:04:43 +0400, Chad J 
 wrote:



Omissible Parentheses

Could someone remind me why we don't remove these?

So far I have
- They save typing.
- Removing them breaks backwards compatibility.
- They allow some features of properties, but with a list of limitations
and gotchas.

This is not intended to be a deep discussion.  I'm writing a piece on
properties, so I'm gathering information.


Andrei likes them.


http://igsoft.net/dpolls/poll/results.php?pollid=1
http://igsoft.net/dpolls/poll/results.php?pollid=2


Andrei


If I'm not mistaken, each of those polls shows a two-to-one preference 
for getting rid of omissable parentheses and introducing a dedicated 
property syntax of some kind.


--benji


Re: new DIP5: Properties 2

2009-07-31 Thread Benji Smith

Andrei Alexandrescu wrote:

Steven Schveighoffer wrote:
So to sum up, with this feature lack of parentheses would imply no 
action, but would not be enforced.  However, it would be considered 
incorrect logic if the rule was not followed, similar to naming your 
functions something other than what they do.


I am leery of such a feature. It essentially introduces a way to define 
conventions that are in no way useful to, or checked by, language rules. 
In my experience this has been a bad idea more often than not.


Like it or not, that's exactly the situation we have now, with the 
(sometimes)-optional parentheses. Some people are using a convention of 
never using the optional parens. Other people use the parens only when a 
function an action, and avoiding them otherwise. And some other people 
(like me) always use the parens.


So the clusterfuck of unenforceable and useless conventions is already 
here. Here's my suggestions: if you think putting parentheses on a 
no-arg function is stupid, then it should be a syntax error for them to 
exist. That wouldn't be my first choice, but it'd be a thousand times 
better than the situation with optional parens.


--benji


Re: The XML module in Phobos

2009-07-31 Thread Benji Smith

Michel Fortin wrote:
> Benji Smith wrote:


Usually, I use something like XPath to extract information from an XML 
doc. Something liek this:


auto doc = parser.parse(xml);
auto nodes = doc.select("/root//whatever[...@id]");

I can see how you might do depth-first or breadth-first traversal of 
the DOM tree, or inorder traversal of the SAX events, with a range. 
But that's now how most people use XML. Are there are other range 
tricks up your sleeve that would support the a DOM or XPath kind of 
model?


A range is mostly a list of things. In the example above, doc.select 
could return a range to lazily evaluate the query instead of computing 
the whole query and returning all the elements. This way, if you only 
care about the first result you just take the first and don't have to 
compute them all.


Ranges can be used everywehere there are lists, and are especially 
useful for lazy lists that compute things as you go. I made an XML 
tokenizer (similar to Tango's pull parser) with a range API. Basically, 
you iterate over various kinds of token made available through an 
Algebraic, and as you advance it parses the document to get you the next 
token. (It'd be more useful if you could switch on various kinds of 
tokens with an Algebraic -- right now you need to use "if 
(token.peek!OpenElementToken)" -- but that's a problem with Algebraic 
that should get fixed I believe, or else I'll have to use something else.)


But XML documents aren't really lists. They're trees.

Do ranges provide an abstraction for working with trees (other than the 
obvious flattening algorithms, like breadth-first or depth-first traversal)?


--benji


Re: new DIP5: Properties 2

2009-07-30 Thread Benji Smith

Nick Sabalausky wrote:
"Andrei Alexandrescu"  wrote in message 
news:h4lsuo$au...@digitalmars.com...
For me, I get a breath of fresh air whenever I get to not write "()". I 
can't figure how some are missing it.




Every time I call a parameterless function in D, I curse under my breath at 
how incredibly sloppy it is. Great, just what I need: Yet another thing that 
forces me to make the completely unnecessary choice between using something 
inconsistently or making up and sticking to a completely arbitrary 
convention that can't be enforced. Sloppy, sloppy, sloppy. Especially 
considering it's all for the sake of a "feature" that doesn't accomplish a 
damn thing, doesn't solve any problem, not even a trivial one, doesn't do 
anything but clutter the language. 


My thoughts exactly.

--benji


Re: properties

2009-07-30 Thread Benji Smith

Andrei Alexandrescu wrote:

Steven Schveighoffer wrote:
On Tue, 28 Jul 2009 16:08:58 -0400, Andrei Alexandrescu 
 wrote:



Steven Schveighoffer wrote:

However, when I see:
 x.empty;
 I can't tell what is implied here.


You can. In either C# or D language it could execute arbitrary code 
that you better know what it's supposed to do. D simply doesn't make 
it "bad style" as C# stupidly does.


still not getting it, are you...

Just forget it, I think this is a lost cause, I keep making the same 
points over and over again, and you keep not reading them.


I do read them and understand them. I mean, it's not rocket surgery. At 
the end of the day you say "x = a.b;" looks more like sheer access 
because that's what happens for fields already. Then you say "a.b()" in 
any context looks more like an action because it's clear that there's a 
function call involved. But your arguments are not convincing to me, and 
in turn I explained why. What would you do if you were me?


Andrei


I totally agree with Steven's arguments (and have enjoyed reading the 
discussion).


I think the reason he says you're "not getting it" is because your 
examples tend to be "a.b" whereas his examples tend to be "a.empty". In 
your examples, you've stripped away the distinct function/field names 
and presented the argument from the compiler's perspective: in terms of 
arbitrary symbols that might either perform a pointer dereference or a 
function invocation.


Steve's arguments, on the other hand, are all from the perspective of 
the programmer. The parentheses following the identifier act as 
*punctuation* that clarify intent.


Good?

Good.

--benji



Re: The XML module in Phobos

2009-07-30 Thread Benji Smith

Michael Rynn wrote:

I did look at the code for the xml module, and posted a suggested bug
fix to the empty elements problem. I do not have access rights to
updating the source repository, and at the time was too busy for this.


Andrei Alexandrescu wrote:
It would be great if you could contribute to Phobos. Two things I hope 
from any replacement (a) works with ranges and ideally outputs ranges, 
(b) uses alias functions instead of delegates if necessary.


Interesting. Most XML parsers either produce a "Document" object, or 
they just execute SAX callbacks. If an XML parser returned a range 
object, how would you use it?


Usually, I use something like XPath to extract information from an XML 
doc. Something liek this:


   auto doc = parser.parse(xml);
   auto nodes = doc.select("/root//whatever[...@id]");

I can see how you might do depth-first or breadth-first traversal of the 
DOM tree, or inorder traversal of the SAX events, with a range. But 
that's now how most people use XML. Are there are other range tricks up 
your sleeve that would support the a DOM or XPath kind of model?


--benji


Re: Properties: problems

2009-07-30 Thread Benji Smith

John C wrote:

Chad J wrote:

John C wrote:

Here's a couple of annoying problems I encounter quite often with D's
properties. Would having some form of property syntax fix them?

1) Array extensions:

  class Person {

string name_;

string name() {
  return name_;
}

  }

  auto person = getPerson();
  auto firstAndLast = person.name.split(' ');

The above line currently requires parentheses after 'name' to compile.



This one is weird.  After defining getPerson() I was able to rewrite the
last line into this and make it compile:

auto firstAndLast = split(person.name," ");


Yes, that's D's special array syntax, where free functions can be called 
as if they were "methods" of an array.


Yeah, this is one of those nasty cases where several different features 
(optional parentheses on functions & automatic extension method syntax 
on arrays) work ok in isolation, but where they have weird wonky 
behavior when combined. I've seen this one before.


--benji


Re: Properties: a.b.c = 3

2009-07-30 Thread Benji Smith

Jarrett Billingsley wrote:

The issue is that the compiler accepts
no-effect modifications of temporary values as valid statements.
There is no setter being invoked here, nor should there be.

Or should there?  In the face of a value type, should the compiler
rewrite this code as

auto t = a.b();
t.c = 3;
a.b = t;

?  The last line of the rewrite is unnecessary if a.b() returns a
reference type or a byref struct.  But is this what people would
expect to happen?


I think the compiler should only rewrite the code (as above) if a.b() 
returns a struct, by value. The compiler can figure that out easily 
enough. Depending on the return types of all the different properties in 
a.b.c.d.e.f = 3, there might be a few ref types and a few value types 
returned. Each of those subexpressions would be rewritten with the 
appropriate semantics.


--benji


Re: Properties: a.b.c = 3

2009-07-30 Thread Benji Smith

Chad J wrote:

Steven Schveighoffer wrote:

struct Rectangle
{
float x,y,w,h;
}

class Widget
{
Rectangle _rect;
Rectangle rect() { return _rect; }
Rectangle rect(Rectangle r) { return _rect = r; }
// etc
}

void main()
{
auto widget = new Widget();

// DOES WORK:
auto tmp = widget.rect;
tmp.w = 200;
tmp.h = 100;
widget.rect = tmp;

// DOES NOT WORK:
// widget.rect.w = 200;
// widget.rect.h = 100;
}

Wouldn't the compiler write:

//widget.rect.w = 200 translates to
auto tmp1 = widget.rect;
tmp1.w = 200;
widget.rect = tmp1;

//widget.rect.h = 100 translates to
auto tmp2 = widget.rect;
tmp2.h = 100;
widget.rect = tmp2;

???

Unless you want some serious optimization requirements...

-Steve


It would.

The optimization you speak of is reference caching.  I often do it by
hand in deeply nested loops where it actually means a damn.  It's also
an optimization I think compilers should do, because it is useful in
many more cases than just this.

Using some manner of property syntax would not preclude the programmer
from writing the optimized version of the code by hand.


And, in fact, this exact kind of optimization is made very simple if the 
compiler uses a static single assignment (SSA) form for its internal 
code representation. The LLVM suite already does it.


--benji


Re: Properties: a.b.c = 3

2009-07-30 Thread Benji Smith

Nick Sabalausky wrote:
"Zhenyu Zhou"  wrote in message 
news:h4rfif$2os...@digitalmars.com...

e.g.
Rectangle rect(Rectangle r) {
 _rect = r;
 redraw();
 return _rect;
}

If you allow
widget.rect.w = 200;
widget.rect.h = 100;
you will have to write much more code to handle the painting correctly.
and we don't want to call redraw twice here



I've dealt with that sort of thing in C# and it's a trivial issue. When you 
write code such as the above, it's very clear that you're changing the rect 
twice. If that's a problem, you just do this:


widget.rect = Rect(200, 100);

Easy. 


It's kind of a moot point anyhow, because most respectable graphics 
frameworks will defer any rendering until all properties have been set. 
Something like this:


   class Rect {

  private int _w;
  private int _h;
  private boolean _dirty;

  property set w(int value) {
 _w = value;
 _dirty = true;
  }

  property set h(int value) {
 _h = value;
 _dirty = true;
  }

  void draw() {
 if (_dirty) {
// rendering code
_dirty = false;
 }
  }

   }

Rendering code is *never* invoked from within a property-setter, and 
property values are never changed during rendering code. (Also, there's 
usually a separate "measurement" phase, following the manual 
property-setting phase, within which properties can be changed to suit 
the positional constraints but where no rendering occurs.)


Anyhow, I think those kinds of considerations are mostly orthogonal to a 
discussion of properties, in the general sense, except insofar as the 
existence of a property syntax makes it more convenient to implement 
things like dirty-flag marking, property-change listeners, and the like.


--benji


Re: Properties: a.b.c = 3

2009-07-30 Thread Benji Smith

Chad J wrote:

Chad J wrote:

Bill Baxter wrote:

On Wed, Jul 29, 2009 at 1:14 PM, grauzone wrote:

Chad J wrote:

Thinking about it a little more, the extra temporaries could run you out
of registers.  That still sounds like a negligable cost in most code.

Temporaries can be on the stack. That's not a problem.


How is that not a performance issue?  The stack is in main memory.

--bb

This is where my knowledge starts to run a bit thin.

So correct me if I'm wrong, but isn't something like the stack (or at
least the top/bottom/end in use) extremely likely to be in the nearest
cache (L1)?

If that's the case, then this kind of dereference is going to be of the
cheaper variety.


Also, really deep dot chains are unlikely to happen.  I just feel like
this won't create many more memory accesses than there were already.
Especially for people with 64 bit OSes on x86_64 that are not register
starved like the 32 bit x86.  On x86 you are hitting the stack all the
time anyways, and the extra access or two will go unnoticed.


Especially especially because, if you prevent the a.b.c = x syntax, the 
only thing that'll happen is you'll cause people to write all that code 
themselves. The same number of assignments will happen anyhow, but the 
user will have to write them all manually. I'll all for having the 
compiler automate the boilerplate stuff.


Also, note that the double-assignment case only happens when assigning 
to value types. Assigning to reference type properties will be unaffected.


--benji


Re: new DIP5: Properties 2

2009-07-27 Thread Benji Smith

Andrei Alexandrescu wrote:

Benji Smith wrote:
3) The existence of "magical" identifiers complicates the language 
design. Because the rules that apply to those magical identifiers is 
different than the rules applying to non-magical identifiers.


Well I agree with some of your points but this is factually incorrect. 
There's nothing special about opXxx identifiers. The compiler simply 
rewrites certain operations into regular calls to those operators. 
That's all. I happen to find that very elegant, and in fact I'd want D 
to rely more often on simple rewrites instead of sophisticated special 
casing.


Andrei


I should have been more clear. I understand the rewriting part of the 
proposal. What I was referring to was the fact that an opGet_x 
identifier would shadow the declaration of a variable named "x", making 
it impossible, within the type itself, to directly reference the 
variable itself. So, in this case...


   class MyClass {

  private int x;
  public int opGet_x() {
return x;
  }

   }

...either the compiler would issue an error (my preference) or the 
private field would take precedence (within the class) in any name 
resolution logic. From outside the class, there would be no problem.


Also related, is this case:

   class MyClass {

  public int x;
  public int opGet_x();

   }

I assume the compiler would have to throw an error. Eventually, people 
would learn to give their fields different names than their properties 
(probably with an underscore prefix or something).


Anyhow, in both cases, I'd consider these to be changes to the 
language's identifier semantics. They're not *huge* changes, but the 
introduction of those magical rewriting rules is still something a 
programmer would have to be aware of. And those are the reasons I'd 
rather shy away from magical name-rewriting mechanisms. (NOTE: I have no 
problem with the implementation of the other operator overloading names. 
They work exactly as expected.)


--benji


Re: new DIP5: Properties 2

2009-07-27 Thread Benji Smith

On Mon, Jul 27, 2009 at 4:34 PM, Chad

This seems to me like it adds more syntactic clutter than adding a
keyword would:

PropertyDecl:
   PropertyGetter
   PropertySetter

PropertyGetter:
   Type 'opGet_' Identifier '(' ')'

PropertySetter:
   Type 'opSet_' Identifier '(' Type ')'




Jarrett Billingsley wrote:

Nono,





they're just functions with "magical" names.


I agree with Chad. The opGet_X syntax is terrible, with both 
syntactic and semantic clutter. To whit:


1) This convention has four syntactic parts: "op", "Get|Set", "_", and 
an identifier. Adding a new keyword (like "property") would only add one 
syntactic element to the declaration.


2) A property is not an operator. So the "op" prefix is lying to you.

3) The existence of "magical" identifiers complicates the language 
design. Because the rules that apply to those magical identifiers is 
different than the rules applying to non-magical identifiers.


There's nothing wrong with the mechanics of the proposal. I especially 
like how it allows the getter/setter to have different protection 
attributes, and that it allows each function to be overridden 
separately. You could even implement the getter in a read-only 
superclass and implement the setter in a read-write subclass. Nice!


But I think the same thing can be more elegantly written using the 
"property" keyword:


  private int _x;
  public property int X() { return _x; }
  protected property X(int value) { _x = value; }

The only disadvantage I see there is the introduction of a keyword. And 
that's definitely a disadvantage. But, compared to the "op" syntax, I 
think it's the lesser of two evils.


--benji


Re: Reddit: why aren't people using D?

2009-07-27 Thread Benji Smith

Andrei Alexandrescu wrote:

Rainer Deyke wrote:

Nick Sabalausky wrote:
I can't be nice about this: Any programmer who has *any* aggrivation 
learning any even remotely sane property syntax is an idiot, period. 
They'd have to be incompetent to not be able to look at an example 
like this:


// Fine, I'll throw DRY away:
int _width;
int width
{
get { return _width; }
set(v) { _width = v; }
}

And immediately know exactly how the poroperty syntax works.


I don't know exactly how this is supposed to work.  The basic idea is
obvious, but:
  - How does it interact with inheritance?  Can I override properties?
Can I partially override properties (setter but not getter)?
  - Can I write a setter that accepts another type?
  - Can I write a templated setter that accepts *all* types?  If so, how?
  - Can I create a delegate from a setter/getter?  If so, how?
  - I assume that getters/setters can have individual access specifiers
(i.e. private/protected/public), but is that really the case?

Dedicated property syntax isn't hard to learn, but it's not as obvious
as you make it our to be.  Note that none of these issues exist with
opGet_foo, which follows the same rules as all functions.



+1

Andrei


Also agree. The C# syntax is a little too complex for my taste, and it 
makes some things ugly or impossible (like, what if I want a public 
getter but a protected setter?)


I like the mechanics of the opGet_Xxx proposal, but aesthetically, it 
just makes my eyes bleed (as do the other "op" functions, like opApply, 
that don't technically overload any "op"erators).


For my money, the best solution is a simple "property" keyword as a 
function modifier. Only functions with the "property" modifier would be 
allowed to pose as fields (getters called without parens, setters called 
using assignment syntax). But, in all other respects, they should act 
just like functions.


--benji


Re: Developing a plan for D2.0: Getting everything on the table

2009-07-26 Thread Benji Smith

Andrei Alexandrescu wrote:

Benji Smith wrote:
Maybe if Andrei put together a list of 
missing Phobos functionality, we could get people from the community 
to flesh out the libs.


I think we'd need at a minimum:




That would be great. In general, it would be awesome to gather more 
contributions from the community. There's a thirst to contribute and 
we'll be glad to involve this group into some serious design e.g. for 
concurrency support, as well as accept code for functionality that 
belongs to the standard library.


In the bulleted list above there are many mini-projects that are 
confined enough to be done by one willing individual in a relatively 
short time.


Are there contributor guidelines somewhere?

For example, should the author of a container library prefer classes or 
structs? Should other (non-container) modules accept container classes 
as arguments? Or only container interfaces (if there are any such 
things) or just ranges?


Is it appropriate to use an empty struct purely as a namespace for the 
introduction of free functions? Or should free functions be placed at 
the module level?


Is it appropriate to define multiple classes, structs, templates, etc 
within a single module? What considerations should inform the decision 
regarding the placement of module boundaries?


What constitutes appropriate/inappropriate usage of opCall?

Anyhoo...

Point being, Phobos_1 was a hodgepodge of different conventions and 
styles. Tango_1 was considerably better, in terms of stylistic 
uniformity. But it used a very different set of idioms than Phobos_1 
(lots of predicate functions, "sink" delegates, etc). Probably any 
author contributing code to Phobos_2 should spend a little time getting 
up to speed with the preferred idioms before writing code.


I suspect that my humble little JSON parser uses styles and idioms that 
would clash with the majority of Phobos_2 (since my programming pedigree 
comes from Java, C#, JavaScript, and Perl much moreso than C or C++).


--benji


Re: Developing a plan for D2.0: Getting everything on the table

2009-07-22 Thread Benji Smith

Jason House wrote:

Other, less technical items:
• A clear and "finalized" spec. If it isn't implemented, it should be yanked 
(or clearly marked as pending)
• A plan for library support. Not just Tango, but also Phobos. D1 Phobos could 
not evolve.


In D1, I enthusiastically used Tango. I haven't used D2 yet (because all 
my code is heavily tied to the Tango libs), but I suspect that when D2 
is finalized, I'll port everything over to Phobos.


I've read all the Phobos2 development discussions here (most notably the 
range discussions), but what about the feature disparities between the 
two libraries. What types of functionality are currently present in 
Tango but absent in Phobos? Maybe if Andrei put together a list of 
missing Phobos functionality, we could get people from the community to 
flesh out the libs.


For example, I have a JSON parser implementation that I'd be happy to 
contribute.


--benji


Re: Dynamic D Library

2009-07-19 Thread Benji Smith

Nick Sabalausky wrote:
"BCS"  wrote in message 
news:78ccfa2d4382d8cbd4ffb8875...@news.digitalmars.com...

Reply to teo,


Well, to some extent this will do the job, but at some point you would
need to extract some stuff and put it in libraries, so that it can be
reused by other applications. Think about an application which
consists of several executables which work together and should share
common stuff. Wouldn't you extract it into a library?

Yes, as a static .lib type library that is statically linked in as part of 
the .exe.


Exactly, and it doesn't even have to be a compiled .lib, it could just be a 
source-library. I do that all the time. I really don't see any reason to 
think that modularity and code-reuse would require linking to be dynamic. At 
least certainly not in the general case.


I agree that source-level modularity, and static linking are preferable 
most of the time (especially given D's dependency on templates, which 
don't work so well in compiled libraries).


But there are plenty of legitimate situations that mandate dynamic 
linking, and I think the standard library needs a better solution than 
what it currently has.


--benji


Re: Dynamic D Library

2009-07-17 Thread Benji Smith

Daniel Keep wrote:

If we have, for example, a C app that is using D code as plugins, each
plugin will ask the system for "dmdrt.dll" using its minimal embedded
DDL stub.  But since they're system calls, we should only get one copy.
 I'm not sure exactly how the system will share that library, though;
whether it's per-process or system-wide.

In any case, the DDL stub should be able to pull in the full DDL from
dmdrt.dll and then use that to link everything together.

The nice bonus of this is that DDL just becomes an implementation detail
AND we can say "yes, we can do DLLs in D!" even if we're only using them
to contain a DDL payload.

The one downside I can think of is that if you DID want to distribute a
D plugin for a C/C++ program, you'd also need to ship dmdrt.dll
alongside it.  Although, in that case, it probably wouldn't hurt
anything (aside from memory usage) to simply statically link the runtime
and standard library in; if the host app is C/C++, then the plugins
probably won't be able to stomp all over each other.


My primary use of D right now is to build DLLs for C++ applications, so 
I'd be very annoyed if the standard Windows DLL functionality became 
more convoluted.


For custom loading into D applications, why even bother using a DLL as a 
container? Why not design a file format (maybe even DDL as it currently 
exists) and use that as the primary dynamic loading & linking machanism, 
 on all platforms?


--benji


Re: Dynamic D Library

2009-07-16 Thread Benji Smith

Jarrett Billingsley wrote:

On Thu, Jul 16, 2009 at 4:44 PM, teo wrote:


 For two, there is *no problem*
with creating D libraries on any platform other than Windows, and it
is entirely through Windows' fault that it has the problems it does
with DLLs.


Well, let us assume that you can create dynamic libraries in D and you need to 
include in each of them Phobos (later maybe just the D Runtime). What is the 
benefit of that? Can you imagine all your nice dynamic libraries (DLLs, SOs, 
etc.) written in D and all of them including a huge “payload”? Wouldn't it be 
better just a simple library only containing the stuff you need?


I don't think you're getting it.

ON WINDOWS, DLLs are not allowed to have unresolved externals.  So if
you create a DLL in D, yes, Phobos will be linked in.  THERE IS
NOTHING THAT CAN BE DONE ABOUT THAT.  It's a limitation on the way
DLLs work.

ON EVERY OTHER OPERATING SYSTEM (Linux, Unix, OSX, *whatever*), shared
libraries CAN have unresolved externals, so Phobos *does not* have to
be included in the shared libraries.  Shared libraries ALREADY work
the way you expect them to on every OS besides Windows.

The ONLY way to solve the problem with DLLs on Windows is to not use
DLLs.  Java solves it by not using any platform-dependent libraries,
instead using its own .class files.  This is *exactly* what DDL does.

So, I'm not sure what you see as the problem here.  DDL works fine on
Windows.  Use it.


You learn something new everyday. That's pretty cool.

Incidentally, this is exactly the kind of stuff that I'd love to see 
built right into DRuntime or Phobos.


I don't have a use for it right now (cuz my project is simple enough not 
to need dynamic loading), but in the future, I'd be reluctant to use DDL 
because:


1) Dynamic loading is something that, to me, seems completely 
fundamental to the runtime system, and I'd be hesitant to trust a 
third-party library to keep up-to-date with the current compiler & 
standard library.


2) DDL isn't even really a third-party library. It's more like a 
fourth-party, since (I assume) it really requires the h3r3tic patch to 
work correctly.


Building this kind of functionality into the standard library would make 
those issues irrelevant.


These kinds of issues are the ones that excite me the most and are the 
things I'd like to see D pay the most attention to. From my perspective, 
features of the runtime and standard library are often much more 
compelling than new language features.


--benji


Re: Number literals (Was: Re: Case Range Statement ..)

2009-07-12 Thread Benji Smith

Andrei Alexandrescu wrote:

Benji Smith wrote:

Andrei Alexandrescu wrote:
Anyhow... it would be a bummer if the negative atmosphere as of late 
in the group would cause people like you just lose interest. I can't 
understand what's going on.


I think it would help if you weren't so condescending to people all 
the time. People don't like that much.


I understand. My perception is that negativity predates my being 
condescending, which roots from exasperation. For every annoying message 
of mine there are dozens patient messages making a similar point. But 
you're right, if a point is made the wrong way its correctness is not 
that relevant anymore.



I empathize. I enjoy issuing a sly and well-worded skewer just as much 
as the next guy. But, when those kinds of retorts are perceived as 
coming from the top down, they create resentment.


Like it or not, you're "the man".

:)

--benji


Re: Number literals (Was: Re: Case Range Statement ..)

2009-07-11 Thread Benji Smith

Andrei Alexandrescu wrote:
Anyhow... it would be a bummer if the negative atmosphere as of late in 
the group would cause people like you just lose interest. I can't 
understand what's going on.


I think it would help if you weren't so condescending to people all the 
time. People don't like that much.


Re: optlink on multicore machines

2009-06-30 Thread Benji Smith

Derek Parnell wrote:

On Tue, 30 Jun 2009 20:54:55 +0200, dennis luehring wrote:


Walter Bright schrieb:

BCS wrote:

I IS running fine on 3 or 4 multicore machines around here.

That's a mystery, then.
thats the wonderfull world of hard to catch and reproduce multithreading 
problems - hope D will help here in the future


Ok then ... so optlink is going to be rewritten in D - excellent! And good
luck to the brave developer too.



Just out of curiosity... Why is a linker so hard to write?

A few years ago, I developed a small domain specific language and 
implemented its compiler, outputting bytecode for a very specialized 
(and limited purpose) virtual machine.


In my case, I decided it was easier to give good error messages if the 
compiler & linker were a single entity. I've always been annoyed by the 
discrepancy between compilers and linkers (mostly because build tools 
have their own special languages, pointlessly different than the 
development language). So my compiler combined compilation and linking 
into a single step.


Every time the compiler encountered an "import" statement, it checked to 
see whether a symbol table existed for the imported module and, if not, 
it added the module to the parse queue. After processing a new module, 
it would add the resultant code into a namespace-aware symbol table for 
the given module.


Once the parse queue was empty, I checked for unresolved symbols, cyclic 
dependency errors, etc. If there were no other referential errors (and 
if all the other semantic checks passed), then I'd start the 
code-generation process at the main entry point. The whole program was 
represented as a DAG, and writing bytecode was as simple as traversing 
that graph. Since the "linking" behavior was built right into the 
compiler, it was a piece of cake.


Anyhow...

Whenever someone on the NG complains about optlink, the inevitable 
conclusion is that it would be a huge undertaking to produce a new or 
improved linker.


Why?

Seems to me that a new linker implementation would be relatively 
straightforward. There are really only three steps:


1) Parse object files.
2) Create DAG structures using references in those object files.
3) Walk the graph, copying the code (with rewritten addresses) into the 
final executable.


Is it really more complex than that? What am I missing?

(Caveat: I don't know much about Windows PE, or any of the many other 
object file formats. Still, though... it doesn't seem like it could be 
THAT difficult. The compiler has already done most of the tricky stuff.)


--benji


Re: std.string and std.algorithm: what to do?

2009-05-16 Thread Benji Smith

Andrei Alexandrescu wrote:

Yah, I defined

enum CaseSensitive { no, yes }


Minor nitpick: there are lots of different ways to canonicalize text 
before performing a comparison. Ascii case conversions are just one way.


Instead of an enum with a yes/no value, what about future-proofing it 
with something more along the lines of...


   enum CaseSensitivity {
  None, Ascii, UnicodeChar, UnicodeSurrogatePair
   }

...or something like that.

The yes/no enum will outlive its usefulness before long.

--benji


Re: I wish I could use D for everything

2009-05-02 Thread Benji Smith

Brad Roberts wrote:

I'm going to play devils advocate too...

struct ctor/dtor's are simplifiers.  They remove a hard to explain difference
and aren't even a little bit hard to understand.


Ideally, that would be true.

But there are some wonky rules abound struct ctors, static opCall, and 
struct literals that I can never quite remember.


--benji


Re: RFC: naming for FrontTransversal and Transversal ranges

2009-05-01 Thread Benji Smith

Andrei Alexandrescu wrote:
Also something that wasn't discussed that much is the connection of 
whatever design we devise, with the GC. I am mightily excited by the 
possibility to operate without GC or with tight reference counting, and 
I thought many around here would share that excitement. If we go for 
non-gc ways of memory management, that will probably affect container 
design.


Just out of curiosity, why do you like reference counting more than 
mark/sweep for containers?


--benji


Re: Splitter quiz / survey

2009-04-28 Thread Benji Smith

Brad Roberts wrote:
Actually, perl is a risky language to take _syntax_ from, but _semantics_ 
aren't nearly as dangerous.  Obviously there's some semantics that are 
horrible (see it's OOP mechanisms), but parts of the rest are quite good.  
I grip and groan every time I find myself having to touch perl code, but 
it's rarely due to non-syntactical issues.


This is one of my favorite rants, anywhere on the world wide internets:

http://steve.yegge.googlepages.com/ancient-languages-perl

If nothing else, at least read the "Snake Eyes" section.

It's not the syntax that make perl so bad. Sure, it takes some getting 
used to. But when the rubber hits the road, it's just syntax, and anyone 
can learn it.


The semantics, though, are a complete and utter trainwreck. Even after 
two years of working at a company where perl was the primary development 
language, I still never felt comfortable unless I had the camel book 
within arm's reach.


But amid that insanity there are a few gems. Most notably: regular 
expressions. And string splitting is largely based on the regex engine. 
So it's not too shocking to me that D might be influenced by it.


On the other hand, I agree with most of the other people in this thread, 
that option (4) was the best of the possible splitting behaviors.


--benji


Re: Keyword 'dynamic' of C#4

2009-04-28 Thread Benji Smith

Unknown W. Brackets wrote:
I wonder what the overhead times were.  He should've timed them both and 
listed them separately.  For example, is DynamicMethod a complete win, 
or is the dynamic keyword cheaper as far as base cost?


Actually, he does. It's at the bottom of the "second look" post:

  Compile Time Bound: 6 ms
  Dynamically Bound with dynamic keyword: 45ms
  Dynamically Bound with MethodInfo.Invoke - 10943ms
  Dynamically Bound with DynamicMethod - 8ms

--benji


Re: Fully dynamic d by opDotExp overloading

2009-04-27 Thread Benji Smith

Danny Wilson wrote:

Now let's go from that obvious observation to opDotExp()

You know the class uses opDotExp() because it said so in the docs. 
Examples that could really benifit from this are:

- XMLRPC and other kinds of remoting
- Quick access to: XML / JSON / Yaml / Config files / DB access
- Calling DLLs without bindings
- Lots more

All these would mention it in their docs, guaranteed. Because they use 
opDotExp it's implicitly mentioned. I don't think anyone would tell a 
documentation generator to list all public methods except opDotExp .. 
that would be just braindead. And you could generate the docs yourself 
if you have to code..


Incidentally, one ugly problem with using opDotExp is that the 
underlying invocation might allow characters that aren't legal in D 
identifiers.


For example, let's say I have a dynamic object wrapping a JavaScript 
library, and I want to access a JQuery object. JavaScript allows the '$' 
character to appear in identifiers, and the JQuery people cleverly used 
that name for one of their core objects (which, I think, acts as an ID 
registry, or something like that).


So, this is a perfectly legal JQuery expression:

   var a = $("hello");

Using the opDotExp syntax, I'd ideally prefer to call it like this:

   auto a = js.$("hello");

But the compiler will reject that syntax, since '$' isn't a legal D 
identifier. Of course, in cases like that, we'll just use some sort of 
dynamic invocation method:


   auto a = js.invoke("$", "hello");

Which makes me think this whole discussion is kind of a waste of time, 
since every single implementation of opDotExp is going to end up 
delegating to a string-based dispatcher method anyhow.


THAT'S the really interesting discussion. In fact, I think I'll start a 
new topic...


--benji


Re: The new, new phobos sneak preview

2009-04-13 Thread Benji Smith

Andrei Alexandrescu wrote:

Daniel Keep wrote:

Actually, I've been thinking and I realised that in 95% of cases, you
can assume a range is resumable if it has no references.


Well I'm not so sure. How about a range around an integral file handle 
or socket?


If ranges can advertise their resumability, it wouldn't be hard to write 
a simple template wrapper that provides resumability to an underlying 
non-resumable range.


--benji


Re: Associative arrays with void values

2009-04-13 Thread Benji Smith

bearophile wrote:

Benji Smith:

Especially since an associative array should have a .keys property that 
returns a set.


I don't agree.  I think associative arrays should have .keys/.values/.items that return a 
lazy view that acts like a .set/.list/.list of pairs. Such "lazy views" don't 
actually store anything, they are very light. This design is now present in Python3, Java 
and I have done very similar things in my dlibs (named xkeys/xvalues/xitems in my dlibs, 
but xkeys isn't a set-like thing yet).


Actually I think we do agree. From an API perspective (rather than an 
implementation perspective), I think the .keys property should generally 
return a lazily constructed result (object? struct? I don't really 
care). But I think it should conform to some standardized notion of 
"set-ness" (interface? concept? again, I don't care).


HashSets are a perfectly acceptable implementation for me, as are Set 
interfaces, but I know some people won't like them, and those impl 
details aren't a big deal to me.


But whatever notion the language uses for its "Set" construct should be 
the same dohickey used by the AA .keys property.


(Incidentally, I also think the natural set operations, like 
intersection and mutual exclusion, are just as handy for maps as for sets.)


It's less semantically clean to define certain set operations on AAs, because 
for example you have to decide what to do when keys are equal but their values 
are not. You can avoid such semantic troubles altogether performing set 
operations just only on the lazy view of the keys.


You just have to define those operations on pairs rather than just on 
single values (for example, the union of two maps is naturally a multimap).


--benji


Re: Associative arrays with void values

2009-04-12 Thread Benji Smith

dsimcha wrote:

On the other hand, I'm not sure if it makes sense from a consistency perspective
to have AAs as a builtin, first class type and sets as a library type.  I'm not
sure whether this argues more for AAs being a library type or sets being 
builtin,
but the inconsistency is just weird.


Especially since an associative array should have a .keys property that 
returns a set.


(Incidentally, I also think the natural set operations, like 
intersection and mutual exclusion, are just as handy for maps as for sets.)


The natural conclusion is that AAs should be library types.

I like the fact that D provides literal syntax for AAs, but I think the 
correct implementation is for the compiler to pass the values from those 
  literal expressions into a library type constructor.


--benji


Re: bigfloat

2009-04-12 Thread Benji Smith

bearophile wrote:

Benji Smith:

// Defaults to using built-in associative array type
auto assocArray = [
   "hello" : "world
];

// Uses my own custom type.
auto hashtable = MyHashTableType!(string, string) [
   "hello" : "world
];


In the second case the type inference of the compiler may find the types from 
the AA literal itself:

auto hashtable = MyHashTableType ["hello" : "world];

Bye,
bearophile


If that were the case, I'd want the compiler to scan *all* the key/value 
pairs for instances of derived types (rather than just being based on 
the first K/V pair, like is currently the case with other array literals).


For example (using tango classes, where HttpGet and HttpPost are both 
subclasses of HttpClient):


   // Type is: MyHashTableType!(string, HttpClient)
   auto hashtable = MyHashTableType [
  "get"  : new HttpGet(),
  "post" : new HttpPost()
   ];


Re: bigfloat

2009-04-12 Thread Benji Smith

Daniel Keep wrote:


Andrei Alexandrescu wrote:

dsimcha wrote:

Well, now that I understand your proposal a little better, it makes
sense.  I had
wondered why the current AA implementation uses RTTI instead of
templates.  Even
better would be if only the default implementation were in Object, and
a user
could somehow override which implementation of AA is given the
blessing of pretty
syntax by some pragma or export alias or something, as long as the
implementation
conforms to some specified compile-time interface.

Great! For now, I'd be happy if at least the user could hack their
import path to include their own object.d before the stock object.d.
Then people can use straight D to implement the AssocArray they prefer.
Further improvements of the scheme will then become within reach!

Andrei


dmd -object=myobject.d stuff.d

That would require the user to duplicate everything in object, which is
a little messy.  Maybe it would be a good idea to break object itself
into a bunch of public imports to core.internal.* modules, then allow this:

dmd -sub=core.internal.aa=myaa stuff.d

Of course, it's probably simpler still to have this:

dmd -aatype=myaa.AAType stuff.d

  -- Daniel


Instead, what if the literal syntax was amended to take an optional type 
name, like this:


   // Defaults to using built-in associative array type
   auto assocArray = [
  "hello" : "world
   ];

   // Uses my own custom type.
   auto hashtable = MyHashTableType!(string, string) [
  "hello" : "world
   ];

You could accomplish that pretty easily, as long as the custom type had 
a no-arg constructor and a function with the signature:


   void add(K key, V val)

--benji


Re: Tango: Out of Date Installation Instructions

2009-02-21 Thread Benji Smith

Christopher Wright wrote:

Benji Smith wrote:
Anyhow, the particular error I'm getting when I try to compile my code 
(using "dsss build") is this:


   module FileConduit cannot read file 'tango\io\device\FileConduit.d'


Does tango.io.device.FileConduit still exist? It doesn't in my copy of 
tango.


You're right! Problem solved!

I could have sworn I was using the 0.99.7 version of tango before, but I 
guess I had been using an older release.


You don't need to compile FileConduit, but the frontend needs to know a 
lot of stuff that would be difficult or impossible to get from a .lib 
file -- things like function return types and parameter types, or 
templates. It's basically the same as needing a C header file, even 
though you have the compiled library.


Gotcha. I keep forgetting how much metainformation is lost in the d 
compilation process.


Thanks for your help!

--benji


Re: Tango: Out of Date Installation Instructions

2009-02-21 Thread Benji Smith

Moritz Warning wrote:

On Sat, 21 Feb 2009 13:46:48 -0500, Benji Smith wrote:


I just set up a new (Windows) computer, after working with the same
DMD/Tango/DWin/DSSS installation for the last six or eight months. And
for the life of me, I can't get my code to compile on the new machine.

The Tango installation instructions seem to be somewhat out of date,
since they describe installing tango on top of an existing DMD
installation, while the tango distributions for DMD all include the
compiler and don't require a pre-existing DMD installation:

http://dsource.org/projects/tango/wiki/WindowsInstall

Anyhow, the particular error I'm getting when I try to compile my code
(using "dsss build") is this:

module FileConduit cannot read file 'tango\io\device\FileConduit.d'

This is my sc.ini file (unmodified from the tango install):

[Environment]
LIB="%...@p%\..\lib"
DFLAGS="-...@p%\..\import" -version=Tango -defaultlib=tango-base-dmd.lib
-debuglib=tango-base-dmd.lib -L+tango-user-dmd.lib linkcm...@p%\link.exe

Since it references the "tango-user-dmd.lib" file, I wonder why it even
needs to include the FileConduit.d source file. Why doesn't it just use
the lib?

Much appreciation to anyone who can help get me rolling again! And I'd
be happy to help rewrite the Tango installation instructions once I
understand the correct installation procedure.

--benji


I think the best is to join #d.tango on Freenode IRC.


Aha. Is that where all the tango-related discussion happens these days?

I considered posting to the dsource tango forum, but it's such a 
low-volume group, I might not get a response for a week or more. I 
posted here because of the high-volume.


Assuming for a moment that I don't want to install an IRC client just to 
resolve this one issue, where is the best place to ask Tango questions?


--benji


Tango: Out of Date Installation Instructions

2009-02-21 Thread Benji Smith
I just set up a new (Windows) computer, after working with the same 
DMD/Tango/DWin/DSSS installation for the last six or eight months. And 
for the life of me, I can't get my code to compile on the new machine.


The Tango installation instructions seem to be somewhat out of date, 
since they describe installing tango on top of an existing DMD 
installation, while the tango distributions for DMD all include the 
compiler and don't require a pre-existing DMD installation:


http://dsource.org/projects/tango/wiki/WindowsInstall

Anyhow, the particular error I'm getting when I try to compile my code 
(using "dsss build") is this:


   module FileConduit cannot read file 'tango\io\device\FileConduit.d'

This is my sc.ini file (unmodified from the tango install):

[Environment]
LIB="%...@p%\..\lib"
DFLAGS="-...@p%\..\import" -version=Tango -defaultlib=tango-base-dmd.lib 
-debuglib=tango-base-dmd.lib -L+tango-user-dmd.lib

linkcm...@p%\link.exe

Since it references the "tango-user-dmd.lib" file, I wonder why it even 
needs to include the FileConduit.d source file. Why doesn't it just use 
the lib?


Much appreciation to anyone who can help get me rolling again! And I'd 
be happy to help rewrite the Tango installation instructions once I 
understand the correct installation procedure.


--benji


Re: Is str ~ regex the root of all evil, or the leaf of all good?

2009-02-19 Thread Benji Smith

Some of the things I'd like to see in the regex implementation:

All functions accepting a compiled regex object/struct should also 
accept a string version of the pattern (and vice versa). Some 
implementations (Java) only accept the compiled version in some places 
and the string pattern in other places. That's annoying.


Just like with ordinary string-searching functions, you should be able 
to specify a start position (and maybe an end position) for the search. 
Even if the match exists somewhere in the string, it fails if not found 
within the target slice. Something like this:


   auto text = "ABCDEFG";
   auto pattern = regex("[ABCEFG]");

   // returns false, because the char at position 3 does not match
   auto result = match(text, 3);

   // this should be exactly equivalent (but the previous version
   // uses less memory, and ought to work with infinite ranges, whereas
   // the slice version wouldn't make any sense)
   auto equivalent = match(text[3..$]);

I've needed to use this technique in a few cases to implement a simple 
lexical scanner, and it's a godsend, if the regex engine supports it 
(though most don't).


Finally, it'd be extremely cool if the regex compiler automatically 
eliminated redundant nodes from its NFA, converting as much of it as 
possible to a DFA. I did some work on this a few years ago, and it's 
actually remarkably simple to implement using prefix trees.


   // These two expressions produce an identical set of matches,
   // but the first one is functionally an NFA, while the second
   // one is a DFA.
   auto a = regex("(cat|car|cry|dog|door|dry)");
   auto b = regex("(c(?:a[tr]|ry)|d(?:o(?:g|or)|ry)");

In cases where the expression can only be partially simplified, you can 
leave some NFA nodes deep within the tree, while still DFA-ifying the 
rest of it:


   auto a = regex("(attitude|attribute|att.+ion");
   auto b = regex("(att(?:itude|ribute|.+ion)");

It's a very simple transformation, increases speed (dramatically) for 
complex regular expressions (especially those produced dynamically at 
runtime by combining large sets of unrelated target expressions), and it 
reliably produces equivalent results with the inefficient version.


The only really tricky part is if the subexpressions have their own 
capturing groups, in which case the DFA transformation screws up the 
ordinal-numbering of the resultant captures.


Anyhoo...

I don't have any strong feelings about the function names (though I'd 
rather have functions that operators, like "~", for searching and matching).


And I don't have any strong feelings about whether the compiled regex is 
an object or a struct (though I prefer reference semantics over value 
semantics for regexen, and right now, I think that makes objects the 
(slightly) better choice).


Thanks for your hard work! I've implemented a small regex engine before, 
so I know it's no small chunk of effort. Regular expressions are my 
personal favorite "tiny language", and I'm glad to see them get some 
special attention in phobos2.


--benji


Re: Is str ~ regex the root of all evil, or the leaf of all good?

2009-02-19 Thread Benji Smith
And how do you combine them? "repeat, ignorecase"? Writing and parsing 
such options becomes a little adventure in itself. I think the "g", 
"i", and "m" flags are popular enough if you've done any amount of 
regex programming. If not, you'll look up the manual regardless.




Perhaps, string.match("a[b-e]", Regex.Repeat | Regex.IgnoreCase); might 
be better? I don't find "gmi" immediately clear nor self-documenting.


I prefer the enum options too. But not vociferously. I could live with 
the single-char flags.


--benji


Re: default random object?

2009-02-19 Thread Benji Smith

Don wrote:

Benji Smith wrote:

Don wrote:

Andrei Alexandrescu wrote:

Benji Smith wrote:

Benji Smith wrote:
Maybe a NumericInterval struct would be a good idea. It could be 
specialized to any numeric type (float, double, int, etc), it 
would know its own boundaries, and it'd keep track of whether 
those boundaries were open or closed.


The random functions would take an RND and an interval (with some 
reasonable default intervals for common tasks like choosing 
elements from arrays and random-access ranges).


I have a Java implementation around here somewhere that I could 
port to D if anyone is interested.


--benji


Incidentally, the NumericInterval has lots of other interesting 
applications. For example


   auto i = NumericInterval.UBYTE.intersect(NumericInterval.SBYTE);
   bool safelyPolysemious = i.contains(someByteValue);

   auto array = new double[123];
   auto i = NumericInterval.indexInterval(array);
   bool indexIsLegal = i.contains(someIndex);

Using a numeric interval for generating random numbers would be, in 
my opinion, totally ideal.


   double d = uniform(NumericInterval.DOUBLE); // Any double value


I've never been in a situation in my life where I thought, hey, a 
random double is exactly what I'd need right now. It's a ginormous 
interval!


Andrei


It's worse than that. Since the range of double includes infinity, a 
uniform distribution must return +-infinity with probability 1. It's 
nonsense.


Way to miss the forest for the trees.

You guys telling me can't see any legitimate use for a NumericInterval 
type? And that it wouldn't be convenient to use for random number 
generation within that interval?


So the full double range was a dumb example. But that wasn't really 
the point, was it?


--benji


On the contrary, I've been giving NumericInterval considerable thought.
One key issue is whether a NumericInterval(x1, x2) must satisfy x1 <= x2 
(the _strict_ definition), or whether it is also permissible to have 
x2<=x1 (ie, you can specify the two endpoints in reverse order; the 
interval is then between min(x1,x2) and max(x1, x2)).


This is an issue because I've noticed is that when I want to use it, I 
often have related pairs of values.

eg.
Suppose u is the interval {x1, x2}. There's a related v = {f(x1), f(x2)}.
Unfortunately although x1<=x2, f(x1) may not be <= f(x2). So v is not an 
interval in the _strict_ sense. But it satisfies the _relaxed_ definition.


I don't see any idealogical reason for requiring x2 >= x1.

But the public API of the interval will probably have functions or 
properties returning the "lowerBound" and "upperBound". And the 
implementations of the "containsValue", "intersect", and "overlap" 
functions are all more straightforward to write if you know in advance 
which value is which, potentially switching them in the constructor.


Of course, if you switch the values, do you also switch the boundary 
open/closed boundaries? What about this case:


   auto i = Interval!("[)")(1000, -1000);

Which side of the range is open, and which is closed? Does the "[)" 
argument apply to the natural order of the range (closed on its lower 
bound) or does it apply to the order of the arguments in the function 
(closed on its leftmost argument)?


As long as the behavior is well documented, I think it'd be fine either 
way. But I also think it'd be reasonable to throw an exception if the 
arguments are in the wrong order.


--benji


Re: memory-mapped files

2009-02-18 Thread Benji Smith

Andrei Alexandrescu wrote:
This all would make perfect sense if the performance was about the same 
in the two cases. But in fact memory mapping introduced a large 
*pessimization*. Why? I am supposedly copying less data and doing less 


Pessimization? What a great word! I've never heard that before!

--benji


Re: default random object?

2009-02-18 Thread Benji Smith

Don wrote:

Andrei Alexandrescu wrote:

Benji Smith wrote:

Benji Smith wrote:
Maybe a NumericInterval struct would be a good idea. It could be 
specialized to any numeric type (float, double, int, etc), it would 
know its own boundaries, and it'd keep track of whether those 
boundaries were open or closed.


The random functions would take an RND and an interval (with some 
reasonable default intervals for common tasks like choosing elements 
from arrays and random-access ranges).


I have a Java implementation around here somewhere that I could port 
to D if anyone is interested.


--benji


Incidentally, the NumericInterval has lots of other interesting 
applications. For example


   auto i = NumericInterval.UBYTE.intersect(NumericInterval.SBYTE);
   bool safelyPolysemious = i.contains(someByteValue);

   auto array = new double[123];
   auto i = NumericInterval.indexInterval(array);
   bool indexIsLegal = i.contains(someIndex);

Using a numeric interval for generating random numbers would be, in 
my opinion, totally ideal.


   double d = uniform(NumericInterval.DOUBLE); // Any double value


I've never been in a situation in my life where I thought, hey, a 
random double is exactly what I'd need right now. It's a ginormous 
interval!


Andrei


It's worse than that. Since the range of double includes infinity, a 
uniform distribution must return +-infinity with probability 1. It's 
nonsense.


Way to miss the forest for the trees.

You guys telling me can't see any legitimate use for a NumericInterval 
type? And that it wouldn't be convenient to use for random number 
generation within that interval?


So the full double range was a dumb example. But that wasn't really the 
point, was it?


--benji


Re: default random object?

2009-02-15 Thread Benji Smith

Benji Smith wrote:
Maybe a NumericInterval struct would be a good idea. It could be 
specialized to any numeric type (float, double, int, etc), it would know 
its own boundaries, and it'd keep track of whether those boundaries were 
open or closed.


The random functions would take an RND and an interval (with some 
reasonable default intervals for common tasks like choosing elements 
from arrays and random-access ranges).


I have a Java implementation around here somewhere that I could port to 
D if anyone is interested.


--benji


Incidentally, the NumericInterval has lots of other interesting 
applications. For example


   auto i = NumericInterval.UBYTE.intersect(NumericInterval.SBYTE);
   bool safelyPolysemious = i.contains(someByteValue);

   auto array = new double[123];
   auto i = NumericInterval.indexInterval(array);
   bool indexIsLegal = i.contains(someIndex);

Using a numeric interval for generating random numbers would be, in my 
opinion, totally ideal.


   double d = uniform(NumericInterval.DOUBLE); // Any double value

   auto i = NumericInterval.parse("[ 123, 456.789 )");
   double random = uniform!(double)(i, rng);

--benji


Re: default random object?

2009-02-15 Thread Benji Smith

Andrei Alexandrescu wrote:

Steve Schveighoffer wrote:

4. While we're at it, should uniform(a, b) generate by default something
in [a, b] or [a, b)?


[a,b)

Every other piece of range-like code is zero based, and excludes the 
upper bound.  This should be no different.  It makes the code simpler 
too.


I tried both versions, and it turns out my code is almost never simpler 
with open integral intervals. Most of the time I need something like:


auto x = uniform(rng, -100, 100);
auto y = uniform(rng, 0, 100);

and I need to remember to actually ask for 101 instead of 100. True, 
when you want a random index in an array, open intervals are more 
convenient.


One purity-based argument is that in a random number you may actually 
ask for the total range:


auto big = uniform(rng, uint.max / 2, uint.max);

If the interval is open I can't generate uint.max.

Anyway, I checked the C++ API and it turns out they use closed intervals 
for integers and open intervals for reals. I know there's been a lot of 
expert scrutiny there, so I suppose I better copy their design.



Andrei


Maybe a NumericInterval struct would be a good idea. It could be 
specialized to any numeric type (float, double, int, etc), it would know 
its own boundaries, and it'd keep track of whether those boundaries were 
open or closed.


The random functions would take an RND and an interval (with some 
reasonable default intervals for common tasks like choosing elements 
from arrays and random-access ranges).


I have a Java implementation around here somewhere that I could port to 
D if anyone is interested.


--benji


Re: std.string and ranges

2009-02-11 Thread Benji Smith

bearophile wrote:

I have taken a look at the docs for dice(), I don't like its name because isn't 
intuitive at all, but its usage is easy. The usage of the function I have 
suggested is a bit more higher level.

An possible alternative design for such function is to take in input an already 
sorted array of the weights (beside the iterable of the items), this may speed 
up this function a bit (it just needs to call the algorithm for bisect search, 
I presume).


FWIW, I've implemented this sort of thing before. In my implementation, 
it was called ProbabilisticChooser(T), and I could instantiate it either 
with a pair of parallel arrays or with a HashMap(T, double).


In my case, I didn't lump it in with my other random-number related 
code, because I had a set of other classes implementing the Chooser(T) 
interface. Some of them were random and some were deterministic, but 
they all provided the same "choose" function on top of a "choice 
strategy" implementation.



--benji


Re: Compiler as dll

2009-01-27 Thread Benji Smith

BCS wrote:

Hello Walter,



Instead, what you can do is simply dude up command line arguments,
spawn the command line compiler, and collect the result.



The one main thing I see not working there is memory-to-memory compiles. 
I'd love to be able to build a function as a string, call the compiler 
and get back a function pointer.


I think also, with a compiler-as-dll, it'd have separate modules for 
lexing, parsing, optimizing, code-generation, and linking.


As a user of that compiler DLL, I might like to write my own AST 
visitor, wrapping all function calls (or scope blocks) with tracing 
statements before sending them into the rest of the pipeline.


Those are the kinds of things that I think would be especially cool with 
a CompilerServices module in the standard library.


Also, consider this: someone could implement AST macros as a library!

--benji


Re: Any chance to call Tango as Extended Standard Library

2009-01-24 Thread Benji Smith

Don wrote:

Lars Ivar Igesund wrote:

Don wrote:
druntime should certainly not become any bigger (in scope), as that 
would defeat the purpose of separating the runtime from userspace in 
the first place. The topic of common userspace functionality should be 
kept separate from the topic of druntime.




I think you are confusing druntime (the project) with the D runtime. 
druntime includes the gc as well the runtime, though they are seperate.
I see no reason why including core modules in the druntime project would 
 destroy the seperation.


Really, this is entirely a question of naming.

core.XXX seems to me to be the perfect namespace, certainly for the key 
math modules which I'm most concerned about (std.math/(tango.math.Math, 
tango.math.IEEE), and possibly also the low-level bigint routines. These 
are all functionality which is closely tied to the compiler).


Totally agree.

Although the name 'druntime' implies it'll only contain the runtime, I 
think it ought to contain all the common functionality that virtually 
all applications and libraries will absolutely need: the runtime itself, 
gc, TypeInfo, math, containers (including ranges), algorithms, string 
processing, date/time, and IO.


Without those commonalities, any "compatibility" between Phobos and 
Tango will be purely illusory.


Whether the commonality is realized within druntime, or within some 
other low-level common library (like "dcore"), is immaterial to me. And 
actually, I don't really care whether Phobos and Tango have their own 
implementations. But there should be an API (interfaces? concepts? some 
new template-interface mechanism? doesn't matter.) that both Phobos and 
Tango implement, so that library consumers can seamlessly pass low-level 
objects between Phobos and Tango dependent libraries.


--benji


Re: Any chance to call Tango as Extended Standard Library

2009-01-21 Thread Benji Smith

IUnknown wrote:
Agree. Which is why I said the problems you are facing seem to be non-technical. I'm suggesting that the D library developers should pick one and axe the other. *I* think what more important is to have one single set of containers in a single style rather than have two separate ones. There is going to be complaining for sure from the current developers, but in my opinion, the target of having a single standard library (with core and advanced modules to suit system/ app programming) is more important than having to make a difficult choice. 


Totally agree. While I personally prefer the Java-style containers, I'd 
gladly accept the STL-style containers if it meant unification of Phobos 
and Tango.


Having druntime is nice, sure, but application-level code and high-level 
libraries will bake the container API into their public interfaces, and 
any code that uses both the Phobos and Tango libraries would have to 
perform a zillion tedious conversions.


In my mind, the things that need a unified API are (in order of importance):

1. GC and TypeInfo
2. Data structures
3. Algorithms
4. String processing
5. Date & Time
6. IO

Everything else (encryption, compression, sockets, regular expressions, 
could have a totally different API in Tango & Phobos and I wouldn't care 
much.


Having a common runtime (GC and TypeInfo) is a neat trick, but pretty 
useless if the data structures and algorithms are entirely different.


And, while I'm perfectly willing to accept either Java-style or 
STL-style containers, I'd also really appreciate it if the design 
anticipates and supports custom implementations (because I almost always 
end up implementing my own multimaps, multisets, circular queues etc)


--benji


Re: new principle of division between structures and classes

2009-01-12 Thread Benji Smith

Andrei Alexandrescu wrote:

Benji Smith wrote:
Actually, memory allocated in the JVM is very cache-friendly, since 
two subsequent allocations will always be adjacent to one another in 
physical memory. And, since the JVM uses a moving GC, long-lived 
objects move closer and closer together.


Well the problem is that the allocation size grows quickly. Allocate and 
dispose one object per loop -> pages will be quickly eaten.


for (...) {
JavaClassWithAReallyLongNameAsTheyUsuallyAre o = factory.giveMeOne();
o.method();
}

The escape analyzer could catch that the variable doesn't survive the 
pass through the loop, but the call to method makes things rather tricky 
(virtual, source unavailable...). So then we're facing a quickly growing 
allocation block and consequently less cache friendliness and more 
frequent collections.



Andrei


Good point. I remember five years ago when people were buzzing about the 
possible implementation of escape analysis in the next Java version, and 
how it'd move a boatload of intermediate object allocations from the 
heap to the stack. Personally, I don't think it'll ever happen. They 
can't even agree on how to get *closures* into the language.


I personally think the JVM and the HotSpot compiler are two of the 
greatest accomplishments of computer science. But the Java community has 
long since jumped the shark, and I don't expect much innovation from 
that neighborhood anymore.


--benji


Re: Properties

2009-01-12 Thread Benji Smith

Nick Sabalausky wrote:
"John Reimer"  wrote in message 
news:28b70f8c119528cb42154f5d1...@news.digitalmars.com...

Hello Nick,


But, of course, adjectives (just like "direct/indirect objects") are
themselves nouns.



Umm... May I make a little correction here?
Adjectives are not nouns.  They are used to /describe/ nouns.

-JJR



Maybe there's examples I'm not thinking of, and I'm certainly no natural 
language expert, but consider these:


"red"
"ball"
"red ball"

By themselves, "red" and "ball" are both nouns. Stick the noun "red" in 
front of ball and "red" becomes an adjectve. (FWIW, 
"dictionary.reference.com" lists "red" as both a noun and an adjective). The 
only adjectives I can think of at the moment (in my admittedly quite tired 
state) are words that are ordinarly nouns on their own.  I would think that 
the distinguishing charactaristic of an adjective vs noun would be the 
context in which it's used.


Maybe I am mixed up though, it's not really an area of expertise for me. 


Incidentally...

I used to do a lot of work in natural language processing, and our 
parsing heuristics were built to handle a lot of adjective/noun ambiguity.


For example, in the phrase "car dealership", the word "car" is an 
adjective that modifies "dealership".


For the most part, you can treat adjectives and nouns as being 
functionally identical, and the final word in a sequence of adjectives 
and nouns becomes the primary noun of the noun-phrase.


--benji


Re: new principle of division between structures and classes

2009-01-12 Thread Benji Smith

Andrei Alexandrescu wrote:

Weed wrote:

Weed пишет:


4. Java and C# also uses objects by reference? But both these of
language are interpreted. I assume that the interpreter generally with
identical speed allocates memory in a heap and in a stack, therefore
authors of these languages and used reference model.


Neither of these languages are interpreted, they both are compiled into
native code at runtime.

Oh!:) but I suspect such classes scheme somehow correspond with
JIT-compilation.



I guess allocation in Java occurs fast because of usage of the its own
memory manager.

I do not know how it is fair, but:

http://www.ibm.com/developerworks/java/library/j-jtp09275.html

"Pop quiz: Which language boasts faster raw allocation performance, the
Java language, or C/C++? The answer may surprise you -- allocation in
modern JVMs is far faster than the best performing malloc
implementations. The common code path for new Object() in HotSpot 1.4.2
and later is approximately 10 machine instructions (data provided by
Sun; see Resources), whereas the best performing malloc implementations
in C require on average between 60 and 100 instructions per call
(Detlefs, et. al.; see Resources)."


Meh, that should be taken with a grain of salt. An allocator that only 
bumps a pointer will simply eat more memory and be less cache-friendly. 
Many applications aren't that thrilled with the costs of such a model.


Andrei


Actually, memory allocated in the JVM is very cache-friendly, since two 
subsequent allocations will always be adjacent to one another in 
physical memory. And, since the JVM uses a moving GC, long-lived objects 
move closer and closer together.


Of course, Java programmers tend to be less careful about memory 
allocation, so they usually consume **way** too much memory and lose the 
benefits of the moving GC.


Java-the-langauge and Java-the-platform are very efficient, even if the 
java frameworks and java patterns tend to bloated and nasty.


--benji


Re: Properties

2009-01-12 Thread Benji Smith

Miles wrote:

dsimcha wrote:

I figure the vast majority of cases are going to be primitive types anyhow 
(mostly
ints),


Yes, this is very true.


and if someone defines operator overloads such that foo += 1 produces
totally different observable behavior than foo = foo + 1, that's just too
ridiculously bad a design to even take seriously.


Sure. It is bad coding style, it is ugly and the programmer who does
this should be called for a meeting with his boss. But there are still
ways to have sane behavior, even in such situations. See below.


What do you think?  Is it worth
ignoring a few hard cases in exchange for solving most cases simply and 
elegantly
and without adding any new constructs?


Instead, I think it is more sane to use temporaries.

--
{
  auto tmp = __get_foo();
  tmp += 1;
  __set_foo(foo);
}
--

It is the safest this way, principle of least surprise. If the caller
does foo += 1, it will get that; if it does foo = foo + 1, it will still
get that; if it does foo.call(), again, the behavior is still sane.

We must first attack the semantics. This have sane semantics. Then let
the compiler optimize that as far as possible. The compiler inlines the
getter and setter calls, then optimizes away the temporary, etc.


Or the compiler could prevent properties from returning mutable structs?

class MyClass {

  private MyStruct _a;
  private MyStruct _b;

  public property a {
const get { return _a; } // legal
  }

  public property a {
get { return _b; } // compile-time error
  }
}

On the flip-side, the compiler could intervene at the call site, 
preventing modification of structs when directly accessed via a property 
invocation. Though I think the first solution is better.


--benji


Re: foreach ... else statement

2009-01-06 Thread Benji Smith

Walter Bright wrote:

I keep thinking I should put on a "Compiler Construction" seminar!


Sign me up!


Re: Randomness in built-in .sort

2009-01-05 Thread Benji Smith

dsimcha wrote:

== Quote from Bill Baxter (wbax...@gmail.com)'s article

Actually, a function to sort multiple arrays in parallel was exactly
what I was implementing using .sort.  So that doesn't sound like a
limitation to me at all.   :-)
--bb


Am I (and possibly you) the only one(s) who think that sorting multiple arrays 
in
parallel should be standard library functionality?  The standard rebuttal might 
be
"use arrays of structs instead of parallel arrays".  This is a good idea in some
situations, but for others, parallel arrays are just plain better.  Furthermore,
with D's handling of variadic functions, generalizing any sort to handle 
parallel
arrays is easy.


I've written my own parallel-array quicksort implementation (several 
times over, in many different languages).


Parallel sorting is one of my favorite tricks, and I think it definitely 
belongs in the standard library.


--benji


Re: Non-nullable references, again

2009-01-02 Thread Benji Smith

Michel Fortin wrote:

On 2009-01-02 10:37:50 -0500, Benji Smith  said:


case a?.b:c:
  break;

is this

  case ((a?).b):
c:
  break;

or is it

case (a ? b : c ) :
break;


How's this different from

case a*.b:

is this:

case ((a*).b):

or is it:

case ((a) * (.b)):




Think of it like this:

  MyClass?.myProperty

It's a static field of the nullable MyClass type.

--benji


Re: Improvement to switch-case statement

2009-01-02 Thread Benji Smith

Yigal Chripun wrote:
Maybe it's just me but all those C-style statements seem so arcane and 
unnessaccary. real OOP languages do not need control structures to be 
part of the language - they're part of the class library instead.

Here's some Smalltalk examples: (and D-like comparable code)


Interesting...

Assuming the core language had no control structures, how would library 
authors implement them?


If the language itself lacked IF, ELSE, SWITCH, CASE, DO, WHILE, FOR, 
and presumably GOTO... how exactly would you go about implementing them 
in a library?


--benji


Re: Improvement to switch-case statement

2009-01-02 Thread Benji Smith

Yigal Chripun wrote:
also, some thought should be spent on getting rid of the ternary op 
syntax since it interferes with other things that could be added to the 
language (nullable types, for instance)


Heresy!

The ternary operator is one of my favorite tools. If you want to get rid 
 of it, I think you'd have to make the 'if' statement into an 
expression (which would open up a whole other can of worms).


As I showed earlier, there's no ambiguity between the ternary operator 
and the nullable type suffix. The ambiguity comes from the case 
statement. In my opinion, the best way to resolve that ambiguity is to 
add braces around case statments, like this:


  switch (x) {
case 1 { ... }
case 2 { ... }
default { ... }
  }

But that might make it impossible to implement Duff's Device (blessing 
or curse? personally, I don't care).


And it might imply the creation of a new scope with each case. 
Currently, a case statement doesn't introduce its own lexical scope.


Anyhoo... Don't mess with the ternary operator!!

:)

--benji


Re: Non-nullable references, again

2009-01-02 Thread Benji Smith

Don wrote:

Benji Smith wrote:

Daniel Keep wrote:

Benji Smith wrote:

Don wrote:

Denis Koroskin wrote:

Foo nonNull = new Foo();
Foo? possiblyNull = null;

 >

Wouldn't this cause ambiguity with the "?:" operator?


At first, thought you might be right, and that there would some 
ambiguity calling constructors of nullable classes (especially given 
optional parentheses).


But for the life of me, I couldn't come up with a truly ambiguous 
example, that couldn't be resolved with an extra token or two of 
lookahead.


The '?' nullable-type operator is only used  in type declarations, 
not in expressions, and the '?:' operator always consumes a few 
trailing expressions.


Also (at least in C#) the null-coalesce operator (which converts 
nullable objects to either a non-null instance or a default value) 
looks like this:


  MyClass? myNullableObj = getNullableFromSomewhere();
  MyClass myNonNullObj = myNullableObj ?? DEFAULT_VALUE;

Since the double-hook is a single token, it's also unambiguous to 
parse.


--benji


Disclaimer: I'm not an expert on compilers.  Plus, I just got up.  :P

The key is that the parser has to know what "MyClass" means before it 
can figure out what the "?" is for; that's why it's 
context-dependant. D avoids this dependency between compilation 
stages, because it complicates the compiler.  When the parser sees 
"MyClass", it *doesn't know* that it's a type, so it can't 
distinguish between a nullable type and an invalid ?: expression.


At least, I think that's how it works; someone feel free to correct 
me if it's not.  :P


  -- Daniel


I could be wrong too. I've done a fair bit of this stuff, but I'm no 
expert either :)


Nevertheless, I still don't think there's any ambiguity, as long as 
the parser can perform syntactic lookahead predicates. The grammar 
would look something like this:


DECLARATION :=
  IDENTIFIER // Type name
  ( HOOK )?  // Is nullable?
  IDENTIFIER // Var name
  (
SEMICOLON// End of declaration
|
(
  OP_ASSIGN  // Assignment operator
  EXPRESSION // Assigned value
)
  )

Whereas the ternary expression grammar would look something like this:

TERNARY_EXPRESSION :=
  IDENTIFIER // Type name
  HOOK   // Start of '?:' operator
  EXPRESSION // Value if true
  COLON  // End of '?:' operator
  EXPRESSION // Value if false

The only potential ambiguity arises because the "value if true" 
expression could also just be an identifier. But if the parser can 
construct syntactic predicates to perform LL(k) lookahead with 
arbitrary k, then it can just keep consuming tokens until it finds 
either a SEMICOLON, an OP_ASSIGN, or a COLON (potentially, 
recursively, if it encounters another identifier and hook within the 
expression).


Still, though, once it finds one of those tokens, the syntax has been 
successfully disambiguated, without resorting to a semantic predicate.


It requires arbitrary lookahead, but it can be done within a 
context-free grammar, and all within the syntax-processing portion of 
the parser.


Of course, I could be completely wrong too :)

--benji


case a?.b:c:
  break;

is this

  case ((a?).b):
c:
  break;

or is it

case (a ? b : c ) :
break;



Damn. I got so distracted with the ternary operator, I forgot about case 
statements.


--benji


Re: Non-nullable references, again

2009-01-01 Thread Benji Smith

Daniel Keep wrote:

Benji Smith wrote:

Don wrote:

Denis Koroskin wrote:

Foo nonNull = new Foo();
Foo? possiblyNull = null;

 >

Wouldn't this cause ambiguity with the "?:" operator?


At first, thought you might be right, and that there would some 
ambiguity calling constructors of nullable classes (especially given 
optional parentheses).


But for the life of me, I couldn't come up with a truly ambiguous 
example, that couldn't be resolved with an extra token or two of 
lookahead.


The '?' nullable-type operator is only used  in type declarations, not 
in expressions, and the '?:' operator always consumes a few trailing 
expressions.


Also (at least in C#) the null-coalesce operator (which converts 
nullable objects to either a non-null instance or a default value) 
looks like this:


  MyClass? myNullableObj = getNullableFromSomewhere();
  MyClass myNonNullObj = myNullableObj ?? DEFAULT_VALUE;

Since the double-hook is a single token, it's also unambiguous to parse.

--benji


Disclaimer: I'm not an expert on compilers.  Plus, I just got up.  :P

The key is that the parser has to know what "MyClass" means before it 
can figure out what the "?" is for; that's why it's context-dependant. D 
avoids this dependency between compilation stages, because it 
complicates the compiler.  When the parser sees "MyClass", it *doesn't 
know* that it's a type, so it can't distinguish between a nullable type 
and an invalid ?: expression.


At least, I think that's how it works; someone feel free to correct me 
if it's not.  :P


  -- Daniel


I could be wrong too. I've done a fair bit of this stuff, but I'm no 
expert either :)


Nevertheless, I still don't think there's any ambiguity, as long as the 
parser can perform syntactic lookahead predicates. The grammar would 
look something like this:


DECLARATION :=
  IDENTIFIER // Type name
  ( HOOK )?  // Is nullable?
  IDENTIFIER // Var name
  (
SEMICOLON// End of declaration
|
(
  OP_ASSIGN  // Assignment operator
  EXPRESSION // Assigned value
)
  )

Whereas the ternary expression grammar would look something like this:

TERNARY_EXPRESSION :=
  IDENTIFIER // Type name
  HOOK   // Start of '?:' operator
  EXPRESSION // Value if true
  COLON  // End of '?:' operator
  EXPRESSION // Value if false

The only potential ambiguity arises because the "value if true" 
expression could also just be an identifier. But if the parser can 
construct syntactic predicates to perform LL(k) lookahead with arbitrary 
k, then it can just keep consuming tokens until it finds either a 
SEMICOLON, an OP_ASSIGN, or a COLON (potentially, recursively, if it 
encounters another identifier and hook within the expression).


Still, though, once it finds one of those tokens, the syntax has been 
successfully disambiguated, without resorting to a semantic predicate.


It requires arbitrary lookahead, but it can be done within a 
context-free grammar, and all within the syntax-processing portion of 
the parser.


Of course, I could be completely wrong too :)

--benji


Re: Non-nullable references, again

2008-12-31 Thread Benji Smith

Don wrote:

Denis Koroskin wrote:

Foo nonNull = new Foo();
Foo? possiblyNull = null;

>

Wouldn't this cause ambiguity with the "?:" operator?


At first, thought you might be right, and that there would some 
ambiguity calling constructors of nullable classes (especially given 
optional parentheses).


But for the life of me, I couldn't come up with a truly ambiguous 
example, that couldn't be resolved with an extra token or two of lookahead.


The '?' nullable-type operator is only used  in type declarations, not 
in expressions, and the '?:' operator always consumes a few trailing 
expressions.


Also (at least in C#) the null-coalesce operator (which converts 
nullable objects to either a non-null instance or a default value) looks 
like this:


  MyClass? myNullableObj = getNullableFromSomewhere();
  MyClass myNonNullObj = myNullableObj ?? DEFAULT_VALUE;

Since the double-hook is a single token, it's also unambiguous to parse.

--benji


Re: dmd platform support - poll

2008-12-28 Thread Benji Smith

Walter Bright wrote:

What platforms for dmd would you be most interested in using?

.net
jvm
mac osx 32 bit intel
mac osx 64 bit intel
linux 64 bit
windows 64 bit
freebsd 32 bit
netbsd 32 bit

other?


My choice, BY FAR, would be Mac OSX 32 bit.

When I started my current D project, six months ago or so, it looked 
like GDC mac support was on a steady, healthy incline, and that choosing 
D as a development platform would yield full mac compatibility in the 
very near future.


Supporting the mac platform is absolutely essential for my product, so 
without a viable D compiler, I'll have to rewrite a bunch of code in C, 
which would make me very sad.


The 64-bit win/lin/mac platforms would also be nice to have. But as long 
as every 64-bit OS provides legacy support for 32-bit apps, I consider a 
64-bit D compiler pretty low priority, for the type of work I'm 
currently doing.


The bsd platform is completely off my radar screen, and given Walter's 
limited resources, I'd be disappointed to see these given much attention.


.NET and the JVM would be compelling for the marketing of D, making the 
language seem more mainstream and widely accessible. But I personally 
wouldn't find much use in them. The primary benefit of D, for me, is 
escaping from the confines of the VMs and being able to do system-level 
stuff.


I frequently develop for both the CLR and the JVM, but when I do so, I 
prefer C# and Java, respectively. I can't think of a single reason I'd 
ever elect to write D for a VM platform.


--benji

PS -- Game console platforms would be very very cool as well. For me, 
I'd be interested in the cell processor, for the PS3. HOWEVER, since the 
native PS3 SDK is proprietary (with a $10,000 licensing fee), and since 
linux on the PS3 uses artificially crippled hardware, my interest in 
developing anything on the PS3 is little more than casual curiosity.