from:"dsimcha"

LDC: Constant Folding Across Nested Functions?

2013-05-18 Thread dsimcha

Background:  This came from an attempt to get rid of delegate 
indirection in parallel foreach loops on LDC.  LDC can inline 
delegates that always point to the same code.  This means that it 
can inline opApply delegates after inlining opApply itself and 
effectively constant folding the delegate.


Simplified case without unnecessarily complex context:

// Assume this function does NOT get inlined.
// In my real use case it's doing something
// much more complicated and in fact does not
// get inlined.
void runDelegate(scope void delegate() dg) {
dg();
}

// Assume this function gets inlined into main().
uint divSum(uint divisor) {
uint result = 0;

// If divisor gets const folded and is a power of 2 then
// the compiler can optimize the division to a shift.
void doComputation() {
foreach(i; 0U..1_000_000U) {
result += i / divisor;
}
}

runDelegate(doComputation);
}

void main() {
// divSum gets inlined, to here, but doComputation()
// can't because it's called through a delegate.
// Therefore, the 2 is never const folded into
// doComputation().
auto ans = divSum(2);
}

The issue I'm dealing with in std.parallelism is conceptually the 
same as this, but with much more context that's irrelevant to 
this discussion.  Would the following be a feasible compiler 
optimization either in the near future or at least in principle:


When an outer function is inlined, all non-static inner functions 
should be recompiled with the information gained by inlining the 
outer function.  In this case doComputation() would be recompiled 
with divisor const-folded to 2 and the division optimized to a 
shift.  This post-inlining compilation would then be passed to 
runDelegate().


Also, is there any trick I'm not aware of to work around the 
standard compilation model and force this behavior now?

Low-Lock Singletons In D

2013-05-05 Thread dsimcha

On the advice of Walter and Andrei, I've written a blog article 
about the low-lock Singleton pattern in D.  This is a previously 
obscure pattern that uses thread-local storage to make Singletons 
both thread-safe and efficient and was independently invented by 
at least me and Alexander Terekhov, an IBM researcher.  However, 
D's first-class treatment of thread-local storage means the time 
has come to move it out of obscurity and possibly make it the 
standard way to do Singletons.


Article:
http://davesdprogramming.wordpress.com/2013/05/06/low-lock-singletons/

Reddit:
http://www.reddit.com/r/programming/comments/1droaa/lowlock_singletons_in_d_the_singleton_pattern/

Re: From C++14 and Java 1.8

2013-04-21 Thread dsimcha


On Sunday, 21 April 2013 at 12:08:54 UTC, bearophile wrote:
Arrays#parallelSort uses Fork/Join framework introduced in Java 
7 to assign the sorting tasks to multiple threads available in 
the thread pool. This is called eating your own dog food. 
Fork/Join implements a work stealing algorithm where in a idle 
thread can steal tasks queued up in another thread.

An overview of Arrays#parallelSort:

The method uses a threshold value and any array of size lesser 
than the threshold value is sorted using the Arrays#sort() API 
(i.e sequential sorting). And the threshold is calculated 
considering the parallelism of the machine, size of the array 
and is calculated as:


private static final int getSplitThreshold(int n) {
  int p = ForkJoinPool.getCommonPoolParallelism();
  int t = (p  1) ? (1 + n / (p  3)) : n;
  return t  MIN_ARRAY_SORT_GRAN ? MIN_ARRAY_SORT_GRAN : t;
}

Once its decided whether to sort the array in parallel or in 
serial, its now to decide how to divide the array in to 
multiple parts and then assign each part to a Fork/Join task 
which will take care of sorting it and then another Fork/Join 
task which will take care of merging the sorted arrays. The 
implementation in JDK 8 uses this approach:

- Divide the array into 4 parts.
- Sort the first two parts and then merge them.
- Sort the next two parts and then merge them.
And the above steps are repeated recursively with each part 
until the size of the part to sort is not lesser than the 
threshold value calculated above.





I think it's worth adding something similar as strategy of 
std.algorithm.sort.


FWIW, I created a parallel sort in D a while back using 
std.parallelism.  It was part of std.parallel_algorithm, a 
half-finished project that I abandoned because I was disappointed 
at how poorly most of it was scaling in practice, probably due to 
memory bandwidth.  If you have some expensive-to-compare types, 
though, it may be worthwhile.


https://github.com/dsimcha/parallel_algorithm/blob/master/parallel_algorithm.d

Re: From C++14 and Java 1.8

2013-04-21 Thread dsimcha


On Sunday, 21 April 2013 at 13:30:32 UTC, bearophile wrote:

dsimcha:

I abandoned because I was disappointed at how poorly most of 
it was scaling in practice, probably due to memory bandwidth.


Then do you know why the Java version seems to be advantageous 
(with four cores)?


Bye,
bearophile


I don't know Java very well, but possiblities include:

1.  Sorting using a virtual or otherwise non-inlined comparison 
function.  This makes the sorting require much more CPU time but 
not a lot more memory bandwidth.  It does beg the question, 
though, of why the comparison function isn't inlined, especially 
since modern JITs can sometimes inline virtual functions.


2.  Different hardware than I tested on, maybe with better memory 
bandwidth.


3.  Expensive comparison functions.  I didn't test this in D 
either because I couldn't think of a good use case.  I tested the 
D parallel sort using small primitive types (ints and floats and 
stuff).

Command Line Order + Linker Errors

2012-10-29 Thread dsimcha

I'm running into some inexplicable linker errors when trying to 
compile a project.  I've tried two command lines to compile the 
project that I thought were equivalent except for the names of 
the output files:


// emptymain.d:
void main(){}

// test.d:
unittest {
double[double] weights = [1:1.2, 4:2.3];
import std.stdio;
writeln(PASSED);
}

dmd -unittest emptymain.d test.d  // Linker errors

dmd -unittest test.d emptymain.d  // Works

Additionally, the linker errors only occur under a custom version 
of druntime.  Don't try to reproduce them under the stock 
version.  (For the curious, it's the precise heap scanning fork 
from https://github.com/rainers/druntime/tree/precise_gc2 .  I'm 
trying to get precise heap scanning ready for prime time.)


My real question, though, is why should the order of these files 
on the command line matter and does this suggest a compiler or 
linker bug?

Re: Command Line Order + Linker Errors

2012-10-29 Thread dsimcha

The mesasges are below.  The exact messages are probably not 
useful but I included them since you asked.  I meant to specify, 
though, that they're all undefined reference messages.


Actually, none of these issues occur at all when compilation of 
the two files is done separately, regardless of what order the 
object files are passed to DMD for linking:


dmd -c -unittest test.d
dmd -c -unittest emptymain.d
dmd -unittest test.o emptymain.o  # Works
dmd -unittest emptymain.o test.o  # Works

emptymain.o:(.data._D68TypeInfo_S6object26__T16AssociativeArrayTdTdZ16AssociativeArray4Slot6__initZ+0x80): 
undefined reference to 
`_D11gctemplates77__T11RTInfoImpl2TS6object26__T16AssociativeArrayTdTdZ16AssociativeArray4SlotZ11RTInfoImpl2yG2m'
emptymain.o:(.data._D73TypeInfo_S6object26__T16AssociativeArrayTdTdZ16AssociativeArray9Hashtable6__initZ+0x80): 
undefined reference to 
`_D11gctemplates82__T11RTInfoImpl2TS6object26__T16AssociativeArrayTdTdZ16AssociativeArray9HashtableZ11RTInfoImpl2yG2m'
emptymain.o:(.data._D69TypeInfo_S6object26__T16AssociativeArrayTdTdZ16AssociativeArray5Range6__initZ+0x80): 
undefined reference to 
`_D11gctemplates78__T11RTInfoImpl2TS6object26__T16AssociativeArrayTdTdZ16AssociativeArray5RangeZ11RTInfoImpl2yG2m'
emptymain.o:(.data._D149TypeInfo_S6object26__T16AssociativeArrayTdTdZ16AssociativeArray5byKeyMFNdZS6object26__T16AssociativeArrayTdTdZ16AssociativeArray5byKeyM6Result6Result6__initZ+0x80): 
undefined reference to 
`_D11gctemplates86__T11RTInfoImpl2TS6object26__T16AssociativeArrayTdTdZ16AssociativeArray5byKeyM6ResultZ11RTInfoImpl2yG2m'
emptymain.o:(.data._D153TypeInfo_S6object26__T16AssociativeArrayTdTdZ16AssociativeArray7byValueMFNdZS6object26__T16AssociativeArrayTdTdZ16AssociativeArray7byValueM6Result6Result6__initZ+0x80): 
undefined reference to 
`_D11gctemplates88__T11RTInfoImpl2TS6object26__T16AssociativeArrayTdTdZ16AssociativeArray7byValueM6ResultZ11RTInfoImpl2yG2m'
emptymain.o: In function 
`_D11gctemplates66__T6bitmapTS6object26__T16AssociativeArrayTdTdZ16AssociativeArrayZ6bitmapFZG2m':
test.d:(.text._D11gctemplates66__T6bitmapTS6object26__T16AssociativeArrayTdTdZ16AssociativeArrayZ6bitmapFZG2m+0x1b): 
undefined reference to 
`_D11gctemplates71__T10bitmapImplTS6object26__T16AssociativeArrayTdTdZ16AssociativeArrayZ10bitmapImplFPmZv'


On Monday, 29 October 2012 at 21:08:52 UTC, David Nadlinger wrote:

On Monday, 29 October 2012 at 20:56:02 UTC, dsimcha wrote:
My real question, though, is why should the order of these 
files on the command line matter and does this suggest a 
compiler or linker bug?


What exactly are the errors you are getting? My first guess 
would be templates (maybe the precise GC RTInfo ones?) – 
determining which template instances to emit into what object 
files is non-trivial, and DMD is currently known to contain a 
few related bugs. The fact that the problem also appears when 
compiling all source files at once is somewhat special, though.


David

Re: RFC: Pinning interface for the GC

2012-10-13 Thread dsimcha

We already have a NO_MOVE attribute that can be set or unset.  
What's wrong with that?


http://dlang.org/phobos/core_memory.html#NO_MOVE

On Saturday, 13 October 2012 at 18:58:27 UTC, Alex Rønne 
Petersen wrote:

Hi,

With precise garbage collection coming up, and most likely 
compacting garbage collection in the future, I think it's time 
we start thinking about an API to pin garbage collector-managed 
objects.


A typical approach that people use to 'pin' objects today is to 
allocate a chunk of memory from the C heap, add it as a root 
[range], and store a reference in it. That, or just global 
variables.


This is kind of terrible because adding the chunk of memory as 
a root forces the GC to actually scan it, which is unnecessary 
when what you really want is to pin the object in place and 
tell the GC I know what I'm doing, don't touch this.


I propose the following functions in core.memory.GC:

static bool pin(const(void)* p) nothrow;
static bool unpin(const(void)* p) nothrow;

The pin function shall pin the object pointed to by p in place 
such that it is not allowed to be moved nor collected until 
unpinned. The function shall return true if the object was 
successfully pinned or false if the object was already pinned 
or didn't belong to the garbage collector in the first place.


The unpin function shall unpin the object pointed to by p such 
that it is once again eligible for moving and collection as 
usual. The function shall return true if the object was 
successfully unpinned or false if the object was not pinned or 
didn't belong to the garbage collector in the first place.


Destroy!

Re: GC statistics

2012-10-11 Thread dsimcha

On Wednesday, 10 October 2012 at 19:35:33 UTC, Andrei 
Alexandrescu wrote:
This is mostly for GC experts out there - what statistics are 
needed and useful, yet not too expensive to collect?


https://github.com/D-Programming-Language/druntime/pull/236


Andrei


I'd like to see mark, sweep and page-freeing time be counted 
separately so that if overall GC performance is slow, the user 
can identify where the bottleneck is.  For example, mark time 
will be slow if there's a lot of total memory to be scanned.  
Sweep time will be slow if there are a lot of blocks allocated, 
even if they're all small.  I'm not sure if this is feasible, 
though, because it assumes that the GC implementation is 
mark-sweep.  I guess we could name the subcategories something 
more generic like mark and process marks.

Re: openMP

2012-10-04 Thread dsimcha

Ok, I think I see where you're coming from here.  I've replied to 
some points below just to make sure and discuss possible 
solutions.


On Thursday, 4 October 2012 at 16:07:35 UTC, David Nadlinger 
wrote:

On Wednesday, 3 October 2012 at 23:02:25 UTC, dsimcha wrote:


Because you already have a system in place for managing these 
tasks, which is separate from std.parallelism. A reason for 
this could be that you are using a third-party library like 
libevent. Another could be that the type of workload requires 
additional problem knowledge of the scheduler so that different 
tasks don't tread on each others's toes (for example 
communicating with some servers via a pool of sockets, where 
you can handle several concurrent requests to different 
servers, but can't have two task read/write to the same socket 
at the same time, because you'd just send garbage).


Really, this issue is just about extensibility and/or 
flexibility. The design of std.parallelism.Task assumes that 
all values which becomes available at some point in the 
future are the product of a process for which a TaskPool is a 
suitable scheduler. C++ has std::future separate from 
std::promise, C# has Task vs. TaskCompletionSource, etc.


I'll look into these when I have more time, but I guess what it 
boils down to is the need to separate the **abstraction** of 
something that returns a value later (I'll call that 
**abstraction** futures) from the **implementation** provided by 
std.parallelism (I'll call this **implementation** tasks), which 
was designed only with CPU-bound tasks and multicore in mind.


On the other hand, I like std.parallelism's simplicity for 
handling its charter of CPU-bound problems and multicore 
parallelism.  Perhaps the solution is to define another Phobos 
module that models the **abstraction** of futures and provide an 
adapter of some kind to make std.parallelism tasks, which are a 
much lower-level concept, fit this model.  I don't think the 
**general abstraction** of a future should be defined in 
std.parallelism, though.  std.parallelism includes 
parallelism-oriented things besides tasks, e.g. parallel map, 
reduce, foreach.  Including a more abstract model of values that 
become available later would make its charter too unfocused.




Maybe using the word callback was a bit misleading, but it 
callback would be invoked on the worker thread (or by whoever 
invokes the hypothetical Future.complete(result) method).


Probably most trivial use case would be to set a condition 
variable in it in order to implement a waitAny(Task[]) method, 
which waits until the first of a set of tasks is completed. 
Ever wanted to wait on multiple condition variables? Or used 
select() with multiple sockets? This is what I mean.


Well, implementing something like ContinueWith or Future.complete 
for std.parallelism tasks would be trivial, and I see how waitAny 
could easily be implemented in terms of this.  I'm not sure I 
want to define an API for this in std.parallelism, though, until 
we have something like a std.future and the **abstraction** of a 
future is better-defined.




For more advanced/application-level use cases, just look at any 
use of ContinueWith in C#. std::future::then() is also proposed 
for C++, see e.g. 
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3327.pdf.


I didn't really read the the N3327 paper in detail, but from a 
brief look it seems to be a nice summary of what you might want 
to do with tasks/asynchronous results – I think you could 
find it an interesting read.


I don't have time to look at these right now, but I'll definitely 
look at them sometime soon.  Thanks for the info.

Re: openMP

2012-10-04 Thread dsimcha

On Thursday, 4 October 2012 at 16:07:35 UTC, David Nadlinger 
wrote:
For more advanced/application-level use cases, just look at any 
use of ContinueWith in C#. std::future::then() is also proposed 
for C++, see e.g. 
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3327.pdf.


I didn't really read the the N3327 paper in detail, but from a 
brief look it seems to be a nice summary of what you might want 
to do with tasks/asynchronous results – I think you could 
find it an interesting read.


David


Thanks for posting this.  It was an incredibly useful read for 
me!  Given that the code I write is generally compute-intensive, 
not I/O intensive, I'd never given much thought to the value of 
futures in I/O intensive code before this discussion.  I stand by 
what I said before:  Someone (not me because I'm not intimately 
familiar with the use cases; you might be qualified) should write 
a std.future module for Phobos that properly models the 
**abstraction** of a future.  It's only tangentially relevant to 
std.parallelism's charter, which includes both a special case of 
futures that's useful to SMP parallelism and other parallel 
computing constructs.  Then, we should define an adapter that 
allows std.parallelism Tasks to be modeled more abstractly as 
futures when necessary, once we've nailed down what the future 
**abstraction** should look like.

Re: openMP

2012-10-03 Thread dsimcha

Unless we're using different terminology here, futures are just 
std.parallelism Tasks.


On Wednesday, 3 October 2012 at 10:17:41 UTC, Nick Sabalausky 
wrote:

On Wed, 03 Oct 2012 09:08:47 +0100
Russel Winder rus...@winder.org.uk wrote:

Now that C++ has made the jump to using futures and
asynchronous function calls as an integral part of the 
language,


Speaking of, do we have futures in D yet? IIRC, way back last 
time I
asked about it there was something that needed taken care of 
first,
though I don't remember what. If we don't have them ATM, is 
there

currently anything in the way of actually creating them?

Re: openMP

2012-10-03 Thread dsimcha

Ok, now I vaguely remember seeing stuff about futures in your 
Thrift code and wondering why it was there.  I'm a little big 
confused about what you want.  If I understand correctly, 
std.parallelism can already do it pretty easily, but maybe the 
docs need to be improved a little to make it obvious how to.


All you have to do is something like this:

auto createFuture() {
auto myTask = task!someFun();  // Returns a _pointer_ to a 
Task.

taskPool.put(myTask);  // Or myTask.executeInNewThread();

// A task created with task() can outlive the scope it was 
created in.
// A scoped task, created with scopedTask(), cannot.  This is 
safe,

// since myTask is NOT scoped and is a _pointer_ to a Task.
return myTask;
}

In this case myTask is already running using the execution 
resources specified in createFuture().  Does this do what you 
wanted?  If so, I'll clarify the documentation.  If not, please 
clarify what you needed and the relevant use cases so that I can 
fix std.parallelism.


On Wednesday, 3 October 2012 at 15:50:38 UTC, David Nadlinger 
wrote:

On Wednesday, 3 October 2012 at 14:10:57 UTC, dsimcha wrote:
Unless we're using different terminology here, futures are 
just std.parallelism Tasks.


No, std.parallelism.Tasks are not really futures – they offer 
a constrained [1] future interface, but couple this with the 
notion that a Task can be executed at some point on a TaskPool 
chosen by the user. Because of this, I had to implement my own 
futures for the Thrift async stuff, where I needed a future as 
a promise [2] by an invoked entity that it kicked off a 
background activity which will eventually return a value, but 
which the users can't »start« or choose to »execute it 
now«, as they can with Tasks.


If TaskPool had an execute() method which took a delegate to 
execute (or a »Task«, for that matter) and returned a new 
object which serves as a »handle« with wait()/get()/… 
methods, _that_ would (likely) be a future.


David


[1] Constrained in the sense that it is only meant for 
short-/synchronous-running tasks and thus e.g. offer no 
callback mechanism.


[2] Let's not get into splitting hairs regarding the exact 
meaning of »Future« vs. »Promise«, especially because C++11 
introduced a new interpretation to the mix.

Re: openMP

2012-10-03 Thread dsimcha

On Wednesday, 3 October 2012 at 21:02:07 UTC, David Nadlinger 
wrote:

On Wednesday, 3 October 2012 at 19:42:07 UTC, dsimcha wrote:
If not, please clarify what you needed and the relevant use 
cases so that I can fix std.parallelism.


In my use case, conflating the notion of a future, i.e. a value 
that becomes available at some point in the future, with the 
process which creates that future makes no sense.


So the process which creates the future is a Task that executes 
in a different thread than the caller?  And an alternative way 
that a value might become available in the future is e.g. if it's 
being retrieved from some slow I/O process like a database or 
network?




For example, let's say you are writing a function which 
computes a complex database query from its parameters and then 
submits it to your query manager/connection pool/… for 
asynchronous execution. You cannot use std.parallelism.Task in 
this case, because there is no way of expressing the process 
which retrieves the result as a delegate running inside a 
TaskPool.


Ok, I'm confused here.  Why can't the process that retrieves the 
result be expressed as a delegate running in a TaskPool or a new 
thread?




Or, say you want to write an aggregator, combining the 
results of several futures together, again offering the same 
future interface (maybe an array of the original result types) 
to consumers. Again, there is no computation-bound part to that 
at all, which would make sense to run on a TaskPool – you are 
only waiting on the other tasks to finish.


Maybe I'm just being naive since I don't understand the use 
cases, but why couldn't you just create an array of Task objects?




The second problem with std.parallelism.Task is that your only 
choice is polling (or blocking, for that matter). Yes, 
callbacks are a hairy thing to do if you can't be sure what 
thread they are executed on, but not having them severely 
limits the power of your abstraction, especially if you are 
dealing with non-CPU-bound tasks (as many of today's modern 
use cases are).


I'm a little confused about how the callbacks would be used here. 
 Is the idea that some callback would be called when the task is 
finished?  Would it be called in the worker thread or the thread 
that submitted the task to the pool?  Can you provide a use case?




For example, something my mentor asked to implement for Thrift 
during last year's GSoC was a feature which allows to send a 
request out to a pool of servers concurrently, returning the 
first one of the results (apparently, this mechanism is used as 
a sharding mechanism in some situations – if a server doesn't 
have the data, it simply ignores the request).


First one of the results == the result produced by the the 
first server to return anything?


How would you implement something like that as a function 
Task[] - Task? For what it's worth, Task in C# (which is quite 
universally praised for its take on the matter) also has a 
»ContinueWith« method which is really just a completion 
callback mechanism.


I'll look into ContinueWith and see if it's implementable in 
std.parallelism without breaking anything.




std.parallelism.Task is great for expressing local 
resource-intensive units of work (and fast!), but I think it is 
to rigid and specialized for that case to be generally useful.


Right.  I wrote std.parallelism with resource-intensive units of 
work in mind because that's the use case I was familiar with.  It 
was designed first and foremost to make using SMP parallelism 
_simple_.  In hindsight I might have erred to much on the side of 
making simple things simple vs. complicated things possible or 
over-specialized it and avoided solving the an important, more 
general problem.  I'll try to understand your use cases and see 
if they can be addressed without making simple things more 
complicated.


I think the best way you could help me understand what I've 
overlooked in std.parallelism's design is to give a quick n' 
dirty example of how an API that does what you want would be 
used.  Even more generally, any _concise, concrete_ use cases, 
even toy use cases, would be a huge help.

Re: Status on Precise GC

2012-09-09 Thread dsimcha


On Sunday, 9 September 2012 at 16:51:15 UTC, Jacob Carlborg wrote:

On 2012-09-08 23:35, Tyler Jameson Little wrote:

Awesome, that's good news. I'd love to test it out, but I've
never built the D runtime (or Phobos for that matter) from
source. Are there any instructions or do I just do something 
like

make  sudo make install and it'll put itself in the right
places? FWIW, I'm running Linux with the standard DMD 2.060
compiler.


Just run:

make -f posix.mak

Or, for Windows:

make -f win32.mak


You also need to build Phobos, which automatically links the 
druntime objects into a single library file, by going into the 
Phobos directory and doing the same thing.


An annoying issue on Windows, though, is that DMD keeps running 
out of memory when all the precise GC teimplates are 
instantiated.  I've been meaning to rewrite the make file to 
separately compile Phobos on Windows, but I've been preoccupied 
with other things.

Re: Status on Precise GC

2012-09-07 Thread dsimcha

Here's the GSoC project I mentored this summer.  A little 
integration work still needs to be done, and I've been meaning to 
ping the student about the status of this.  If you want, I'd 
welcome some beta testers.


https://github.com/Tuna-Fish/druntime/tree/gc_poolwise_bitmap

On Saturday, 8 September 2012 at 01:55:44 UTC, Tyler Jameson 
Little wrote:

This issue on bugzilla hasn't been updated since July 2011, but
it's assigned to Sean Kelly:
http://d.puremagic.com/issues/show_bug.cgi?id=3463

I've found these threads concerning a precise GC:

http://www.digitalmars.com/d/archives/digitalmars/D/learn/Regarding_the_more_precise_GC_35038.html

http://www.digitalmars.com/d/archives/digitalmars/D/How_can_I_properly_import_functions_from_gcx_in_object.di_171815.html

Is this issue obsolete, or is it being worked on?

Reason being, I'm writing a game in D and I plan to write it in
nearly 100% D (with the exception being OpenGL libraries and the
like), but I know I'll run into problems with the GC eventually.
If this is an active project that may get finished in the
relative near term (less than a year), then I'd feel comfortable
knowing that eventually problems may go away.

I want to eventually make this work with ARM (Raspberry PI 
cubieboard), and the GC is a major blocker here (well, and a
cross-compiler, but I'll work that out when I get there).

I'm using dmd atm if that matters.

Thanks!

Jameson

Re: Phobos unittest failure on single-core machines

2012-08-24 Thread dsimcha


On Friday, 24 August 2012 at 02:16:24 UTC, Ed McCardell wrote:
When trying to run the phobos unittests on my 32- and 64-bit 
linux single-processor machines, I get this output:


  Testing generated/linux/debug/64/unittest/std/parallelism
  totalCPUs = 1
  core.exception.AssertError@std.parallelism(4082): unittest 
failure


Has anyone else seen this, or is possible that I have an error 
in my dmd setup? (I'm using dmd/druntime/phobos from git HEAD, 
building in what I thought was the normal manner).


--Ed McCardell


This looks to be a bug in a recently-added feature.  I'll look at 
it in detail tonight, but I think I know what the problem is and 
it's pretty easy to fix.  Can you please file a Bugzilla and note 
whether it always occurs or is non-deterministic?

Re: Antti-Ville Tuuainen Passes GSoC Final Evaluation

2012-08-23 Thread dsimcha


On Thursday, 23 August 2012 at 11:40:22 UTC, Rory McGuire wrote:

On Thu, Aug 23, 2012 at 4:01 AM, Chad J
chadjoan@__spam.is.bad__gmail.comwrote:



Poolwise bitmap... what an interesting name.  I'll look 
forward to

learning about the concepts behind it!



+1


Basically, the idea is to store information about what is and 
isn't a pointer at the pool level instead of at the block level.  
My attempt from a long time ago at precise heap scanning, and 
Antti-Ville's first attempt, stored meta-data at the end of every 
allocated block.  This worked well for large arrays, but was 
terribly inefficient for smaller allocations and made the GC code 
even messier than it already is.  The overhead was a fixed 
(void*).sizeof bits per block.  Now, each pool has a bit array 
that contains one bit for every possible aligned pointer.  The 
overhead is always 1 bit for every (void*).sizeof bytes no matter 
how large or small the block is.

Re: Antti-Ville Tuuainen Passes GSoC Final Evaluation

2012-08-23 Thread dsimcha

On Thursday, 23 August 2012 at 14:38:19 UTC, Alex Rønne Petersen 
wrote:
Yes, but parallelization of the mark phase is fairly trivial, 
and something we should probably look into.


Ironically, Antti-ville's original proposal involved 
parallelization.  This was scrapped because after rtinfo was 
added, we agreed that precise heap scanning was more important 
and looked newly feasible.

Antti-Ville Tuuainen Passes GSoC Final Evaluation

2012-08-21 Thread dsimcha

Congratulations, Antti-Ville!  This project creates a better 
implementation of precise GC heap scanning than anything that's 
been created so far for D.  The goal is to eventually integrate 
it into standard D distributions.  Any volunteers for beta 
testing?


Code:

https://github.com/Tuna-Fish/druntime/tree/gc_poolwise_bitmap

The code for this project is a fork of druntime.  The master 
branch was a failed (or less successful) experiment.  The version 
we're going with for integration is the gc_poolwise_bitmap branch.

Re: Fragile ABI

2012-08-16 Thread dsimcha


On Thursday, 16 August 2012 at 14:58:23 UTC, R Grocott wrote:
C++'s fragile ABI makes it very difficult to write class 
libraries without some sort of workaround. For example, 
RapidXML and AGG are distributed as source code; GDI+ is a 
header-only wrapper over an underlying C interface; and Qt 
makes heavy use of the Pimpl idiom, which makes its source code 
much more complex than it needs to be. This is also a major 
problem for any program which wants to expose a plugin API.


Since pimpl is useful but messy, given D's metaprogramming 
capabilities, maybe what we need is a Pimpl template in Phobos:


// The implementation struct.
struct SImpl {
   int a, b, c;

   void fun() {}
}

// Automatically generate code for the Pimpl wrapper.
alias Pimpl!SImpl S;

auto s = new S;

On the other hand, IIUC Pimpl doesn't solve the vtable part of 
the problem, only the data members part.  (Correct me if I'm 
wrong here, since I admit to knowing very little about the 
fragile ABI problem or its workarounds.)

Re: Which D features to emphasize for academic review article

2012-08-12 Thread dsimcha


On Sunday, 12 August 2012 at 03:30:24 UTC, bearophile wrote:

Andrei Alexandrescu:

- The language's superior modeling power and level of control 
comes at an increase in complexity compared to languages such 
as e.g. Python. So the statistician would need a larger 
upfront investment in order to reap the associated benefits.


Statistician often use the R language 
(http://en.wikipedia.org/wiki/R_language ).
Python contains much more computer science and CS complexity 
compared to R. Not just advanced stuff like coroutines, 
metaclasses, decorators, Abstract Base Classes, operator 
overloading, and so on, but even simpler things, like 
generators, standard library collections like heaps and deques, 
and so on.
For some statisticians I've seen, even several parts of Python 
are too much hard to use or understand. I have rewritten 
several of their Python scripts.


Bye,
bearophile



For people with more advanced CS/programming knowledge, though, 
this is an advantage of D.  I find Matlab and R incredibly 
frustrating to use for anything but very standard 
matrix/statistics computations on data that's already structured 
the way I like it.  This is mostly because the standard CS 
concepts you mention are at best awkward and at worst impossible 
to express and, being aware of them, I naturally want to take 
advantage of them.


Using Matlab or R feels like being forced to program with half 
the tools in my toolbox either missing or awkwardly misshapen, so 
I avoid it whenever practical.  (Actually, languages like C and 
Java that don't have much modeling power feel the same way to me 
now that I've primarily used D and to a lesser extent Python for 
the past few years.  Ironically, these are the languages that are 
easy to integrate with R and Matlab respectively.  Do most 
serious programmers who work in problem domains relevant to 
Matlab and R feel this way or is it just me?).  This was my 
motivation for writing Dstats and mentoring Cristi's fork of 
SciD.  D's modeling power is so outstanding that I was able to 
replace R and Matlab for a lot of use cases with plain old 
libraries written in D.

Re: Which D features to emphasize for academic review article

2012-08-12 Thread dsimcha

On Monday, 13 August 2012 at 01:52:28 UTC, Joseph Rushton 
Wakeling wrote:
The main use-case and advantage of both R and MATLAB/Octave 
seems to me to be the plotting functionality -- I've seen some 
exceptionally beautiful stuff done with R in particular, 
although I've not personally explored its capabilities too far.


The annoyance of R in particular is the impenetrable thicket of 
dependencies that can arise among contributed packages; it 
feels very much like some are thrown over the wall and then 
built on without much concern for organization. :-(


I've addressed that, too :).

https://github.com/dsimcha/Plot2kill

Obviously this is a one-man project without nearly the same 
number of features that R and Matlab have, but like Dstats and 
SciD, it has probably the 20% of functionality that handles 80% 
of use cases.  I've used it for the figures in scientific 
articles that I've submitted for publication and in my Ph.D. 
proposal and dissertation.


Unlike SciD and Dstats, Plot2kill doesn't highlight D's modeling 
capabilities that much, but it does get the job done for simple 
2D plots.

Re: MPI Concurrency library update?

2012-08-11 Thread dsimcha

All I have is a very ad-hoc wrapper that does just what I needed 
for my purposes.  It basically has function prototypes for the 
parts of the API I actually care about and a few high-level 
wrappers for passing primitives and arrays to other nodes of the 
same architecture.


On Saturday, 11 August 2012 at 01:12:29 UTC, Andrew wrote:

On Saturday, 11 August 2012 at 00:24:40 UTC, dsimcha wrote:

On Friday, 10 August 2012 at 23:40:43 UTC, Andrew Spott wrote:
A while ago, (August of last year I think), there was talk 
about

a MPI wrapper for D.  Has there been any update on that?


I was considering writing one, but I wanted it to be 
high-level and easy-to-use.  I ended up not doing it, 
initially because I was waiting for serialization to be added 
to Phobos (which I thought was imminent) and then because I 
got busy with unrelated things.


I think that a nice high-level MPI wrapper for D should be 
tightly integrated into a serialization library to encapsulate 
the low-level details of passing non-trivial data structures 
across nodes.  I doubt I'll get around to implementing it when 
serialization is added, though, because I'm probably past the 
MPI-using stage of my life (my Ph.D. research is basically 
finished, I'm just revising my dissertation and preparing to 
defend) so I wouldn't get to eat my own dogfood.


Well, my PhD research is just beginning... :)

Any chance you could pass on what you have?  It might help me 
out

a bit, and reduce my workload toward creating a usable MPI
library.

Thanks.

-Andrew

Re: MPI Concurrency library update?

2012-08-10 Thread dsimcha


On Friday, 10 August 2012 at 23:40:43 UTC, Andrew Spott wrote:

A while ago, (August of last year I think), there was talk about
a MPI wrapper for D.  Has there been any update on that?


I was considering writing one, but I wanted it to be high-level 
and easy-to-use.  I ended up not doing it, initially because I 
was waiting for serialization to be added to Phobos (which I 
thought was imminent) and then because I got busy with unrelated 
things.


I think that a nice high-level MPI wrapper for D should be 
tightly integrated into a serialization library to encapsulate 
the low-level details of passing non-trivial data structures 
across nodes.  I doubt I'll get around to implementing it when 
serialization is added, though, because I'm probably past the 
MPI-using stage of my life (my Ph.D. research is basically 
finished, I'm just revising my dissertation and preparing to 
defend) so I wouldn't get to eat my own dogfood.

Re: Which D features to emphasize for academic review article

2012-08-09 Thread dsimcha

Ok, so IIUC the audience is academic BUT is people interested in 
using D as a means to an end, not computer scientists?  I use D 
for bioinformatics, which IIUC has similar requirements to 
econometrics.  From my point of view:


I'd emphasize the following:

Native efficiency.  (Important for large datasets and monte carlo 
simulations)


Garbage collection.  (Important because it makes it much easier 
to write non-trivial data structures that don't leak memory, and 
statistical analyses are a lot easier if the data is structured 
well.)


Ranges/std.range/builtin arrays and associative arrays.  (Again, 
these make data handling a pleasure.)


Templates.  (Makes it easier to write algorithms that aren't 
overly specialized to the data structure they operate on.  This 
can also be done with OO containers but requires more boilerplate 
and compromises on efficiency.)


Disclaimer:  These last two are things I'm the primary designer 
and implementer of.  I intentionally put them last so it doesn't 
look like a shameless plug.


std.parallelism  (Important because you can easily parallelize 
your simulation, etc.)


dstats  (https://github.com/dsimcha/dstats  Important because a 
lot of statistical analysis code is already implemented for you.  
It's admittedly very basic compared to e.g. R or Matlab, but it's 
also in many cases better integrated and more efficient.  I'd say 
that it has the 15% of the functionality that covers ~70% of use 
cases.  I welcome contributors to add more stuff to it.  I 
imagine economists would be interested in time series, which is 
currently a big area of missing functionality.)

Review Queue: Should We Start Reviews Again?

2012-07-28 Thread dsimcha

Apparently nothing's been getting reviewed for inclusion in 
Phobos lately, and the review queue has once again become fairly 
long according to the wiki 
(http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue).  I also noticed 
that a new XML library is already in the review queue.


Is there some reason why none of this stuff is being reviewed?  
If std.xml2 is really ready for review, we should review this 
ASAP since XML processing is fundamental functionality for a 
modern standard library in a high-level language.


IIRC the current std.xml's inadequacy was a major complaint that 
Jacob Carolberg, the author of std.serialize, had.  Perhaps 
when/if std.xml2 is accepted, he should modify std.serialize to 
use it, and std.serialize should be next in the queue.

Hiatus, Improving Participation in D Development

2012-07-15 Thread dsimcha

I've been on somewhat of a hiatus these past few months and have only 
worked on D-related development sporadically.  There are several reasons 
for my absence, some of which will hopefully change soon, and I hope to 
make a comeback.  Below are the reasons why my contributions have 
declined and some suggestions for improvements to the D community where 
the issues aren't specific to me:


0.  There's a lot less stuff that's broken or missing now than a few 
years ago when I started contributing.  This has led to a mild 
complacency as D is already awesome.  For example, it's been a long time 
since I hit a compiler bug that caused me significant hassle.


1.  I'm writing my Ph.D. thesis and looking for jobs.  I still have some 
time to contribute, but D development isn't the top idea in my mind due 
to these distractions.  This is in some ways the root of the problem, as 
I have less time and mental energy to keep up with D through informal 
channels.  I think my job search is over, though, so that's one less 
distraction.


2.  Because I'm writing my thesis, I don't program much for my research 
anymore.  I therefore don't notice little things that are broken or get 
cool ideas from other languages as often.  To make it easier for someone 
to find bugs and enchancement requests in areas he/she is already 
familiar with, I'd like to see an ability to search by module (for 
Phobos/druntime) or feature (for DMD) in Bugzilla.  For example, in 
Phobos I'm most familiar with std.range, std.algorithm, std.parallelism 
and std.functional.  I'd like to be able to easily query a list of bugs 
specific to those modules.


3.  As the community has grown larger, more people besides me have 
stepped up.  Of course this is a good thing.  The only downside is that 
I've lost track of who's working on what, what its status is, what still 
needs to be done, and what the holdups are.  Perhaps we need some 
central place other than this newsgroup where this information is posted 
for key D projects.  For example, we'd have a page that says Person X 
is working on a new std.xml.  Here's the Github repository.  It's 
stalled because noone can agree on a design for Y.  We should also 
maintain a slightly more formal wishlist of stuff that noone's working 
on that's waiting to be done.


BTW, if noone is working on a new std.xml anymore, I might want to 
start.  I interviewed for a job where they wanted me to do a small 
prototype as part of the hiring process that involved parsing XML.  I 
was allowed to use any language I wanted.  I think my D projects played 
a large role in me getting an offer, but I couldn't use it for the 
prototype because std.xml is so bad.  I ended up using Python instead.


4.  The release cycle has slowed greatly.  What happened here?  The 1-2 
month release cycles were a good motivator because they created mild 
deadline pressure to get features and fixes checked in before the next 
release.


5.  The amount of stuff on this forum and the mailing lists has become 
overwhelming.  I've recently remedied this to a small degree by 
unsubscribing from dmd-internals.  I've never been a contributor to the 
compiler itself and had only subscribed to this list to track bug fixes 
and 64-bit support implementation.  Now, the signal-to-noise ratio of my 
inbox is good enough that I actually read the Phobos and druntime stuff 
again instead of just glossing over all my D-related email.


As far as this forum, I suggest a split something like the following, so 
that it has a better signal-to-noise ratio from the perspective of 
people with specific interests:


D.language-design:  Long, complicated threads about language features 
and the theory behind them belong here.


D.phobos-design:  Since the Phobos mailing list is intended mostly for 
regular contributors and is focused on individual pull requests and 
commits, this is where high-level design stuff would get discussed 
w.r.t. Phobos.


D.ecosystem:  Stuff about third-party libraries, Deimos, toolchains, 
etc. goes here.


D.adoption:  Discussions about why D was or wasn't adopted for a given 
project and how to increase its adoption go here.


D.learn:  Questions about using D.  We already have this, but we need to 
encourage people to use it more instead of posting to the main group.

Re: Hiatus, Improving Participation in D Development

2012-07-15 Thread dsimcha


On 7/15/2012 4:54 AM, Jonathan M Davis wrote:

On Sunday, July 15, 2012 00:02:09 dsimcha wrote:

BTW, if noone is working on a new std.xml anymore, I might want to
start.  I interviewed for a job where they wanted me to do a small
prototype as part of the hiring process that involved parsing XML.  I
was allowed to use any language I wanted.  I think my D projects played
a large role in me getting an offer, but I couldn't use it for the
prototype because std.xml is so bad.  I ended up using Python instead.


Someone was working on it (Tomaz?) and was supposedly making good progress,
but last time I checked, they hadn't posted anything since some time in 2010.
So, as far as I can tell, that project is effectively dead. I have no idea what
state it was in before it stalled or whether the code is available anywhere
online. I expect that anyone who wants to work on it will either have to start
from scratch or grabbing one of the existing xml parsers floating around and
adjust it (though I suspect that if it's going to be range-based like it's
supposed to be that any existing parsers floating around probably would need
quite a bit of work to get the right API, but I don't know). It's the sort of
thing that I'd love to work on given the time, but I have so much else going
on that it would be ridiculous for me to even consider it. If you want to take
up that baton, then I think that's great. Even if you end up taking a while to
do it, that's better than getting nothing and seeing no progress as we have
been for quite some time now.

- Jonathan M Davis




Ok, well I'll at least try to get a better handle on what's involved and 
how much time I'm going to have over the next few months.  I'm not 
saying I definitely want to take it on yet.

Re: Hiatus, Improving Participation in D Development

2012-07-15 Thread dsimcha


On Sunday, 15 July 2012 at 15:46:38 UTC, David Nadlinger wrote:

On Sunday, 15 July 2012 at 04:02:48 UTC, dsimcha wrote:
5.  The amount of stuff on this forum and the mailing lists 
has become overwhelming.  I've recently remedied this to a 
small degree by unsubscribing from dmd-internals.  I've never 
been a contributor to the compiler itself and had only 
subscribed to this list to track bug fixes and 64-bit support 
implementation.
Now, the signal-to-noise ratio of my inbox is good enough that 
I actually read the Phobos and druntime stuff again instead of 
just glossing over all my D-related email.


I take it you are referring to the GitHub commit messages which 
are relayed to dmd-internals? Because except for those (which I 
just made a filter rule for), the list is really quite 
low-volume. Maybe we should create a dedicated d-commits list 
to which all the GitHub notifications get sent, similar to what 
other projects have? The occasional post-commit discussion 
could then be continued on one of the repository-specific 
lists, just like they are now.


David


Yeah.  The problem is that for a while, D mailing lists became so 
overwhelming that I got into the habit of reflexively ignoring 
them completely due to poor signal-to-noise ratio w.r.t stuff I 
actually work on and being preoccupied with other things.  Your 
idea may be a good one, since only the core DMD devs care about 
every commit but others might want to participate in higher level 
discussions.

Antti-Ville Tuuainen passes his midterm evaulations for GSoC 2012

2012-07-12 Thread dsimcha

Congratulations to Antti-Ville Tuuainen for passing the GSoC 2012 
midterm evaluation!  Despite going through a steep learning curve 
to learn D's template metaprogramming system, Antti-Ville has 
precise heap scanning for the garbage collector close to working 
using the new rtinfo template that has been added to object.d.  
His Github repository is at:


https://github.com/Tuna-Fish/druntime

The plans for the second half include creating an alternative 
implementation of precise scanning that may be more efficient and 
removing the global lock from malloc() if time permits.

Re: Rational numbers in D

2012-06-09 Thread dsimcha

A long time ago, this was discussed on this forum.  I wrote the current 
candidate for std.rational, and there was talk of Don Clugston 
integrating the GCD function into std.bigint to take advantage of 
knowing BigInt's internals.  According to Don, using a general algorithm 
here results in terrible performance.  As of now, that hasn't happened, 
though.


On 6/7/2012 1:49 PM, Joseph Rushton Wakeling wrote:

Sorry for the double-post -- I already asked this in d-learn, but this
may be a better place to ask.

What's the current state of affairs and roadmap for inclusion of
rational number support in D?  I've come across David Simcha's work:
http://cis.jhu.edu/~dsimcha/d/phobos/std_rational.html

 and a feature request on the bugzilla:
http://d.puremagic.com/issues/show_bug.cgi?id=7885

 but this isn't mentioned at all in the review queue:
http://prowiki.org/wiki4d/wiki.cgi?ReviewQueue

What's the status of work/planning for this feature and is there any
kind of ETA for when it might land in Phobos?

Thanks and best wishes,

 -- Joe

Re: run-time stack-based allocation

2012-05-07 Thread dsimcha


On 5/7/2012 12:08 PM, Gor Gyolchanyan wrote:

Wasn't there an allocator mechanism under development for phobos? I
remember there was a StackAllocator, that can span for arbitrary
scopes. What's up with that?


I wrote one.  It's at https://github.com/dsimcha/TempAlloc .  It hasn't 
been accepted to Phobos, though, because of issues w.r.t. figuring out 
what a more general allocator interface should look like.

Re: GC API: What can change for precise scanning?

2012-04-18 Thread dsimcha


On 4/18/2012 6:46 PM, Sean Kelly wrote:

Leandro's GC (CDGC) is already set up to support precise scanning.  It's in the 
Druntime git repository, but lacks the features added to the Druntime GC 
compared to the Tango GC on which CDGC is based.  Still, it may be easier to 
update CDGC based on a diff between the Druntime and Tango GC than it would to 
add precise scanning to the GC Druntime currently uses.  Worth a look if anyone 
is interested anyway.



Or, failing that, I can look at it to get ideas about how to handle 
various annoying plumbing issues.  The plumbing issues (i.e. getting the 
GCInfo pointers from the allocation routines into the guts of the GC) 
are actually the hard part of this project.  Once the GC has the the 
GCInfo pointer, making it use that for precise scanning is trivial in 
that I've done it before and remember roughly how I did it.

GC API: What can change for precise scanning?

2012-04-17 Thread dsimcha

Now that the compiler infrastructure has been implemented, I've 
gotten busy figuring out how to make D's default GC precise.  As 
a first attempt, I think I'm going to adapt my original solution 
from http://d.puremagic.com/issues/show_bug.cgi?id=3463 since 
it's simple and it works except that there previously was no 
clean way to get the offset info into the GC.  As Walter pointed 
out in another thread, the GCInfo template is allowed to 
instantiate to data instead of a function.  IMHO unless/until 
major architectural changes to the GC are made that require a 
function pointer, there's no point in adding this indirection.


I started working on this and I ran into a roadblock.  I need to 
know what parts of the GC API are allowed to change, and discuss 
how to abstract away the implementation of it from the GC API.  I 
assume the stuff in core.memory needs to stay mostly the same, 
though I guess we would need to add a setType() function that 
takes a pointer into a block of memory and a TypeInfo object and 
changes how the GC interprets the bits in the block.


In gc.d, we define a bunch of extern(C) functions and the proxy 
thing.  Since we've given up on the idea of swapping precise GCs 
at link time, can I just rip out all this unnecesary indirection? 
 If not, is it ok to change some of these signatures?  I 
definitely want to avoid allocating (requiring the GC lock) and 
then calling a function to set the type (requiring another lock 
acquisition) so the signature of malloc(), etc. needs to change 
somewhere.


More generally, what is the intended way to get GCInfo pointers 
from TypeInfo into the guts of the GC where they can be acted on?

Re: compiler support added for precise GC

2012-04-16 Thread dsimcha


On 4/15/2012 10:24 PM, Walter Bright wrote:

Just checked it in. Of course, it doesn't actually do precise GC, it is
just thrown over the wall for the library devs who are itching to get
started on it.


Excellent!!  Maybe I'll get started on this soon.

Re: Mono-D GSoC proposal, hopefully the last thread about it

2012-04-04 Thread dsimcha

Yeah, as a mentor, I will reassure both of you that no news 
definitely isn't bad news and may even be good news.  If you 
don't have any feedback, it's because we either haven't gotten 
around to reading your proposal yet or it had all the information 
we wanted and don't have any requests for clarification, etc.  
Don't read too much into the lack of feedback or get discouraged.

Re: D for a Qt developer

2012-03-31 Thread dsimcha


On 3/31/2012 4:23 PM, Davita wrote:

One general comment:  Lots of people ask for the stuff you're asking 
for.  Progress is being made on all the relevant fronts, slowly but surely.



1) Database libs/ORMs.


I think Steve Teale is working on something for this, but I don't know 
the details or how much progress is being made.




2) mature UI library (vector based ,declarative or at least to support
styling like Qt stylesheet).


I think QtD is now usable since the relevant compiler bugs were ironed out.



3) Crypto libs for hashing and with asymmetric algorithm implementations.


You would probably be best off linking to a C library for this.  The 
headers are in Deimos.  https://github.com/D-Programming-Deimos/openssl




4) XML libraries for generating and parsing xml docs. Although XSD
validation support and XSL transforms.


Phobos has a pretty rudimentary XML lib.  Tango's been ported to D2, 
though.  You could try it. https://github.com/SiegeLord/Tango-D2




5) networking libs with several main protocol implementations such as
Http, FTP and SMTP.


std.net.curl was just added to the latest Phobos release.



6) and of course, RAD styled IDE.


Visual D might do what you want.



Those are the minimum of my requirements in order to start development
for a platform. So guys, what do you think, will D be useful for me? :-)

P.S. what happened to Qt bindings? I saw that it was abandoned. Maybe
working with trolltech/Nokia team to integrate D in QtCreator and
creating and maintening Qt's D bindings would be the most awesome
decision, but how achievable is it? :)


I personally don't use QtD, so I don't know where it's hosted, but a lot 
of stuff that was on dsource has moved to Github.  If it looks abandoned 
on dsource, it may have been migrated.

Re: GSoC: Linear Algebra and the SciD library

2012-03-24 Thread dsimcha


Cullen,

I think the ideas page sums it up pretty well.  Matrix factorizations, 
sparse matrices and general polish and bug fixing are the main goals I 
had in mind, though we're definitely open to any other ideas you may 
have.  As someone with a strong math background, you could add a lot of 
value by helping us figure out what features are worth adding in 
addition to just implementing the features that have been previously 
suggested.


Unfortunately, though, SciD uses template metaprogramming very heavily. 
 If you're not comfortable with template metaprogramming in either C++ 
or D (you imply that you have no experience with either language) then 
you'd need to get up to speed very quickly.  The project will have 
almost zero chance of success if you don't master templates.  If this 
sounds too difficult, we still encourage you to submit a proposal for 
another project that doesn't use templates or other advanced, D-specific 
features so heavily.


--David Simcha

Re: Three Unlikely Successful Features of D

2012-03-20 Thread dsimcha


1.  Scope guards.  The need to execute some code at the end of a
scope is pervasive and has led to about a zillion workarounds
from gotos in C to RAII in C++ to try/finally in Java and C# to
with statements in Python.  D is the only language I know of that
lets the programmer specify such a simple and common intention
directly.

2.  CTFE.  There's a certain elegance to having most of the
language available at compile time and it seems like there's no
shortage of creative uses for this.

3.  Static if.  This is the most important feature for converting
template metaprogramming from an esoteric form of sorcery to a
practical, readable tool for the masses.  Most of the compile time
introspection D provides would be almost unusable without it.

Re: We have a GSoC mentor already: David Simcha

2012-03-03 Thread dsimcha


On 3/2/2012 9:38 PM, Trass3r wrote:

Am 03.03.2012, 00:43 Uhr, schrieb Andrei Alexandrescu
seewebsiteforem...@erdani.org:


David Simcha applied for a second gig as a GSoC mentor. Needless to
say, his application was approved :o). Please join me in welcoming him!


Yay!

Time to ask about the status of the last GSoC project, i.e. the LinAlg one.
If it still needs lots of work, maybe there could be another round on that.


The status is that the debugging and polishing is slowly happening. 
I've used the library for real work and while rough around the edges, 
it's quite good.  I've also worked on adding some Lapack wrapper stuff 
that Cristi (the student I mentored) didn't get to and improving the 
test suite.


The key todos are:

1.  Fix a few nasty bugs that are more design flaws than run-of-the-mill 
bugs.  This is hard for me to do without Cristi's input because I am 
unclear on a few design decisions.


2. Documentation.

3  More LAPACK wrappers.

4.  More real-world testing.  I'm not comfortable submitting something 
this large and complicated for Phobos inclusion until a few people have 
used it extensively for real work and found all the bugs and design flaws.


5.  Some serious profiling/performance optimization.

6.  Get allocators into Phobos, since Cristi's SciD fork depends on them.

Re: We have a GSoC mentor already: David Simcha

2012-03-03 Thread dsimcha


On 3/3/2012 2:04 AM, Andrei Alexandrescu wrote:

Mentors are chosen before students and projects. As we all know, David
has a variety of interests, with scientific programming at the top.

Andrei



I'm open to a variety of projects, but scientific computing and garbage 
collection are at the top of my list.

Re: Inheritance of purity

2012-02-17 Thread dsimcha

On Friday, 17 February 2012 at 03:24:50 UTC, Jonathan M Davis 
wrote:
No. Absolutely not. I hate the fact that C++ does this with 
virtual. It makes it so that you have to constantly look at the 
base classes to figure out what's virtual and what isn't. It 
harms maintenance and code understandability. And now you want 
to do that with @safe, pure, nothrow, and const? Yuck.


I can understand wanting to save some typing, but I really 
think that this harms code maintainability. It's the sort of 
thing that an IDE is good for. It does stuff like generate the 
function signatures for you or fill in the attributes that are 
required but are missing.


Besides the fact that not everyone uses an IDE, my other 
counter-argument to these the IDE generates your boilerplate 
arguments is that code is read and modified more often than it is 
written.  I don't like reading or modifying boilerplate code any 
more than I like writing it.  Besides, if you're using a fancy 
IDE, can't it show you the protection attributes inherited from 
the derived class?

Re: GSoC will open soon

2012-02-12 Thread dsimcha


On 2/6/2012 12:34 AM, Andrei Alexandrescu wrote:


Yah, a wiki page sounds great.

Andrei


Wiki is up, ice is broken.  Let's start adding some ideas!  I also think 
this year we should have a possible mentors line next to each project to 
keep track of who's interested in mentoring what.  For example, I added 
garbage collection to the page.  If you're also interested in mentoring 
a GC project, just append yourself to the list of possible mentors.


http://prowiki.org/wiki4d/wiki.cgi?GSOC_2012_Ideas

Re: Message passing between threads: Java 4 times faster than D

2012-02-09 Thread dsimcha

I wonder how much it helps to just optimize the GC a little.  How 
much does the performance gap close when you use DMD 2.058 beta 
instead of 2.057?  This upcoming release has several new garbage 
collector optimizations.  If the GC is the bottleneck, then it's 
not surprising that anything that relies heavily on it is slow 
because D's GC is still fairly naive.


On Thursday, 9 February 2012 at 15:44:59 UTC, Sean Kelly wrote:
So a queue per message type?  How would ordering be preserved? 
Also, how would this work for interprocess messaging?  An 
array-based queue is an option however (though it would mean 
memmoves on receive), as are free-lists for nodes, etc.  I 
guess the easiest thing there would be a lock-free shared slist 
for the node free-list, though I couldn't weigh the chance of 
cache misses from using old memory blocks vs. just expecting 
the allocator to be fast.


On Feb 9, 2012, at 6:10 AM, Gor Gyolchanyan 
gor.f.gyolchan...@gmail.com wrote:


Generally, D's message passing is implemented in quite 
easy-to-use

way, but far from being fast.
I dislike the Variant structure, because it adds a huge 
overhead. I'd
rather have a templated message passing system with type-safe 
message

queue, so no Variant is necessary.
In specific cases Messages can be polymorphic objects. This 
will be

way faster, then Variant.

On Thu, Feb 9, 2012 at 3:12 PM, Alex Dovhal alex 
dov...@yahoo.com wrote:
Sorry, my mistake. It's strange to have different 'n', but 
you measure speed
as 1000*n/time, so it's doesn't matter if n is 10 times 
bigger.







--
Bye,
Gor Gyolchanyan.

Re: [xmlp] the recent garbage collector performance improvements

2012-02-02 Thread dsimcha

On Thursday, 2 February 2012 at 04:38:49 UTC, Robert Jacques 
wrote:
An XML parser would probably want some kind of stack segment 
growth schedule, which, IIRC isn't supported by RegionAllocator.


I had considered putting that in RegionAllocator but I was 
skeptical of the benefit, at least assuming we're targeting PCs 
and not embedded devices.   The default segment size is 4MB.  
Trying to make the initial size any smaller won't save much 
memory.  Four megabytes is also big enough that new segments 
would be allocated so infrequently that the cost would be 
negligible.  I concluded that the added complexity wasn't 
justified.

Re: [xmlp] the recent garbage collector performance improvements

2012-02-02 Thread dsimcha


On Thursday, 2 February 2012 at 18:06:24 UTC, Manu wrote:

On 2 February 2012 17:40, dsimcha dsim...@yahoo.com wrote:

On Thursday, 2 February 2012 at 04:38:49 UTC, Robert Jacques 
wrote:


An XML parser would probably want some kind of stack segment 
growth

schedule, which, IIRC isn't supported by RegionAllocator.



at least assuming we're targeting PCs and not embedded devices.



I don't know about the implications of your decision, but 
comment makes me

feel uneasy.

I don't know how you can possibly make that assumption? Have 
you looked

around at the devices people actually use these days?
PC's are an endangered and dying species... I couldn't imagine 
a worse
assumption if it influences the application of D on different 
systems.


I'm not saying that embedded isn't important.  It's just that for 
low level stuff like memory management it requires a completely 
different mindset.  RegionAllocator is meant to be fast and 
simple at the expense of space efficiency.  In embedded you'd 
probably want completely different tradeoffs.  Depending on how 
deeply embedded, space efficiency might be the most important 
thing.  I don't know exactly what tradeoffs you'd want, though, 
since I don't do embedded development.  My guess is that you'd 
want something completely different, not RegionAllocator plus a 
few tweaks that would complicate it for PC use.  Therefore, I 
designed RegionAllocator for PCs with no consideration for 
embedded environments.

Re: [xmlp] the recent garbage collector performance improvements

2012-02-02 Thread dsimcha

On Thursday, 2 February 2012 at 18:55:02 UTC, Andrej Mitrovic 
wrote:

On 2/2/12, Manu turkey...@gmail.com wrote:

PC's are an endangered and dying species...


Kind of like when we got rid of cars and trains and ships once 
we

started making jumbo jets.

Oh wait, that didn't happen.


Agreed.  I just recently got my first smartphone and I love it.  
I see it as a complement to a PC, though, not as a substitute.  
It's great for when I'm on the go, but when I'm at home or at 
work I like a bigger screen, a full keyboard, a faster processor, 
more memory, etc.  Of course smartphones will get more powerful 
but I doubt any will ever have dual 22 inch monitors.

Re: [xmlp] the recent garbage collector performance improvements

2012-02-01 Thread dsimcha

Interesting.  I'm glad my improvements seem to matter in the real 
world, though I'm thoroughly impressed with the amount of 
speedup.  Even the small allocation benchmark that I was 
optimizing only sped up by ~50% from 2.057 to 2.058 overall and 
~2x in collection time.  I'd be very interested if you could make 
a small, self-contained test program to use as a benchmark.


GC performance is one of D's biggest weak spots, so it would 
probably be a good bit of marketing to show that the performance 
is substantially better than it used to be even if it's not great 
yet.  Over the past year I've been working on and off at speeding 
it up.  It's now at least ~2x faster than it was last year at 
this time on every benchmark I've tried and up to several hundred 
times faster in the extreme case of huge allocations.


On Wednesday, 1 February 2012 at 18:33:58 UTC, Richard Webb wrote:
Last night I tried loading a ~20 megabyte xml file using xmlp 
(via the
DocumentBuilder.LoadFile function) and a recent dmd build, and 
found that it

took ~48 seconds to complete, which is rather poor.
I tried running it through a profiler, and that said that 
almost all the

runtime was spent inside the garbage collector.

I then tried the same test using the latest Git versions of 
dmd/druntime (with
pull request 108 merged in), and that took less than 10 
seconds. This is a

rather nice improvement, though still somewhat on the slow side.

Some profiler numbers, if anyone is interested:

Old version:
Gcxfullcollect: 31.14 seconds, 69.26% runtime.
Gcxmark: 4.84 seconds, 10.77% runtime.
Gcxfindpool: 2.10 seconds, 4.67% runtime.

New version:
Gcxmark: 11.67 seconds, 50.77% runtime.
Gcxfindpool: 3.58 seconds, 15.55% runtime.
Gcxfullcollect: 1.69 seconds, 7.37% runtime.

(Assuming that Sleepy is giving me accurate numbers. The new 
version is

definately faster though).

Re: [xmlp] the recent garbage collector performance improvements

2012-02-01 Thread dsimcha


On Wednesday, 1 February 2012 at 22:53:11 UTC, Richard Webb wrote:
For reference, the file i was testing with has ~5 root 
nodes, each of which has several children.
The number of nodes seems to have a much larger effect on the 
speed that the amount of data.




Sounds about right.  For very small allocations sweeping time 
dominates the total GC time.  You can see the breakdown at 
https://github.com/dsimcha/druntime/wiki/GC-Optimizations-Round-2 
.  The Tree1 benchmark is the very small allocation benchmark.  
Sweeping takes time linear in the number of memory blocks 
allocated and, for blocks 1 page, constant time in the size of 
the blocks.

Re: [xmlp] the recent garbage collector performance improvements

2012-02-01 Thread dsimcha


On Wednesday, 1 February 2012 at 23:43:24 UTC, H. S. Teoh wrote:

Out of curiosity, is there a way to optimize for the many small
allocations case? E.g., if a function allocates, as temporary 
storage,
a tree with a large number of nodes, which becomes garbage when 
it
returns. Perhaps a way to sweep the entire space used by the 
tree in one

go?

Not sure if such a thing is possible.


T


My RegionAllocator is probably the best thing for this if the 
lifetime is deterministic as you describe.  I rewrote the Tree1 
benchmark using RegionAllocator a while back just for comparison. 
 D Tree1 + RegionAllocator had comparable speed to a Java version 
of Tree1 run under HotSpot.  (About 6 seconds on my box vs. in 
the low 30s for Tree1 with the 2.058 GC.)


If all the objects are going to die at the same time but not at a 
deterministic time, you could just allocate a big block from the 
GC and place class instances in it using emplace().

Re: [xmlp] the recent garbage collector performance improvements

2012-02-01 Thread dsimcha


On Thursday, 2 February 2012 at 01:27:44 UTC, bearophile wrote:

Richard Webb:


Parsing the file with DMD 2.057 takes ~25 seconds

Parsing the file with DMD 2.058(Git) takes ~6.1 seconds

Parsing the file with DMD 2.058, with the GC disabled during 
the LoadFile call, takes ~2.2 seconds.



For comparison, MSXML6 takes 1.6 seconds to load the same file.


Not too much time ago Python devs have added an heuristic to 
the Python GC (that is a reference counter + cycle breaker), it 
switches off if it detects the program is allocating many 
items in a short time. Is it possible to add something similar 
to the D GC?


Bye,
bearophile


I actually tried to add something like this a while back but I 
couldn't find a heuristic that worked reasonably well.  The idea 
was just to create a timeout where the GC can't run for x 
milliseconds after it just ran.

Re: [xmlp] the recent garbage collector performance improvements

2012-02-01 Thread dsimcha

Wait a minute, since when do we even have a std.xml2?  I've never 
heard of it and it's not in the Phobos source tree (I just 
checked).


On Thursday, 2 February 2012 at 00:41:31 UTC, Richard Webb wrote:

On 01/02/2012 19:35, dsimcha wrote:

I'd be very
interested if you could make a small, self-contained test 
program to use

as a benchmark.




The 'test' is just

/
import std.xml2;

void main()
{
   string xmlPath = rtest.xml;

   auto document = DocumentBuilder.LoadFile(xmlPath, false, 
false);

}

/

It's xmlp that does all the work (and takes all the time).


I'll see about generating a simple test file, but basically:

5 top level nodes
each one has 6 child nodes
each node has a single attribute, and the child nodes each have 
a short text value.



Parsing the file with DMD 2.057 takes ~25 seconds

Parsing the file with DMD 2.058(Git) takes ~6.1 seconds

Parsing the file with DMD 2.058, with the GC disabled during 
the LoadFile call, takes ~2.2 seconds.



For comparison, MSXML6 takes 1.6 seconds to load the same file.

Re: Call site 'ref'

2012-01-15 Thread dsimcha


On 1/15/2012 8:36 AM, Alex Rønne Petersen wrote:

Hi,

I don't know how many times I've made the mistake of passing a local
variable to a function which takes a 'ref' parameter. Suddenly, local
variables/fields are just mutating out of nowhere, because it's not at
all obvious that a function you're calling is taking a 'ref' parameter.
This is particularly true for std.utf.decode().

Yes, I realize I could look at the function declaration. Yes, I could
read the docs too. But that doesn't prevent me from forgetting that a
function takes a 'ref' parameter, and then doing the mistake again. The
damage is done, and the time is wasted.

I think D should allow 'ref' on call sites to prevent these mistakes.
For example:

string str = ...;
size_t pos;
auto chr = std.utf.decode(str, ref pos);

Now it's much more obvious that the parameter is passed by reference and
is going to be mutated.

Ideally, this would not be optional, but rather *required*, but I
realize that such a change would break a *lot* of code, so that's
probably not a good idea.

Thoughts?



This would break UFCS severely.  The following would no longer work:

auto arr = [1, 2, 3, 4, 5];
arr.popFront();  // popFront takes arr by ref

Re: Discussion about D at a C++ forum

2012-01-09 Thread dsimcha


On 1/9/2012 2:56 AM, Gour wrote:

On Sun, 08 Jan 2012 19:26:15 -0500
dsimchadsim...@yahoo.com  wrote:


As someone who does performance-critical scientific work in D, this
comment is absolutely **wrong** because you only need to avoid the GC
in the most performance-critical/realtime parts of your code, i.e.
where you should be avoiding any dynamic allocation, GC or not.


Considering we'd need to do some work for our project involving number
crunching in the form of producing several libs to be (later) used by
GUI part of the app, I'm curious to know do you use ncurses or just
plain console output for your UI?


Pure command line/console.

Re: Discussion about D at a C++ forum

2012-01-08 Thread dsimcha


On 1/8/2012 6:28 PM, Mehrdad wrote:

On 1/7/2012 10:57 PM, Jonathan M Davis wrote:

Not exactly the most informed discussion.


Well, some of their comments _ARE_ spot-on correct...

2. While you can avoid the garbage collector, that basically means you
can't use most of the standard library.
Looks pretty darn correct to me -- from the fixed-size array literal
issue (literals are on the GC heap), to all the string operations (very
little is usable), to associative arrays (heck, they're even part of the
language, but you can't use them without a GC), etc...


As someone who does performance-critical scientific work in D, this 
comment is absolutely **wrong** because you only need to avoid the GC in 
the most performance-critical/realtime parts of your code, i.e. where 
you should be avoiding any dynamic allocation, GC or not.  (Though GC is 
admittedly worse than malloc, at least given D's current quality of 
implementation.)


My style of programming in D is to consciously transition between 
high-level D and low-level D depending on what I'm doing.  Low-level D 
avoids the GC, heavy use of std.range/std.algorithm since the compiler 
doesn't optimize these well yet, and basically anything else where the 
cost isn't clear.  It's a PITA to program in like all low-level 
languages, but not as bad as C or C++.  Nonetheless low-level D is just 
as fast as C or C++.  High-level D is slower than C or C++ but faster 
than Python, and integrates much more cleanly with low-level D than 
Python does with C and C++.  It's only slightly harder to program in 
than Python.


Bottom line:  D doesn't give you a free lunch but it does give you a 
cheaper lunch than C, C++ or even a combination of C/C++ and Python.

Removing the Lock for Small GC Allocations: Clarification of GC Design?

2011-12-31 Thread dsimcha

I have a plan to avoid the GC lock for most small (1 page) GC 
allocations.  I hope to have a pull request within a week or two, 
in time for the next release.  There's one detail I need 
clarified by Sean, Walter or someone who designed the D GC.


Currently small allocations are handled by popping a block off a 
free list, if a block is available.  I plan to make each page 
owned by a single thread, and make the free lists thread-local.  
The array of free lists (one for each power of two size) is 
stored in the Gcx struct.  The easiest way to make this array 
thread-local is to move it out of the Gcx struct and make it 
global.  Is there any reason why 1 instance of Gcx would exist 
(maybe as an implementation detail of shared libraries, etc.)?  
If not, what's to point of having the Gcx struct instead of just 
making its variables global?

Re: CURL Wrapper: Congratulations Next up: std.serialize

2011-12-28 Thread dsimcha

On Wednesday, 28 December 2011 at 16:01:50 UTC, Jacob Carlborg 
wrote:

Running the unit tests:
./unittest.sh

Use make to compile the library or create an executable using 
rdmd.


A few things to think about that need to be resolved:

* This is quite a large library and I really don't want to put 
it all into one module. I'm hoping it will be OK with a package


So the package would be std.serialize?



* I would really like to keep the unit tests in their own 
modules because they're quite large and the modules are already 
large without the unit tests in them


Sounds reasonable.  It goes against the Phobos convention, but it 
sounds like you have a good reason to.




* The unit tests use a kind of mini-unit test framework. Should 
that be kept or removed?


I haven't looked at it yet, but if it's generally useful, maybe 
it should be extracted and exposed as part of Phobos.  I'd say 
keep it for now but keep it private, and later make a proposal 
for a full review to make it a public, official part of Phobos.




Note:

The documentation is generate using D1, I don't think that 
should make a difference though.

Re: A nice way to step into 2012

2011-12-27 Thread dsimcha


On Tuesday, 27 December 2011 at 15:19:07 UTC, dsimcha wrote:
On Tuesday, 27 December 2011 at 15:11:25 UTC, Andrei 
Alexandrescu wrote:
Imagine how bitter I am that the string lambda syntax didn't 
catch on!


Andrei


Please tell me they're not going anywhere.  I **really** don't 
want to deal with those being deprecated.


...and they were kind of useful in that you could introspect the 
string and apply optimizations depending on what the lambda was.  
I wrote a sorting function that introspected the lambda that was 
passed to it.  If it was a  b, ab, a  b, etc., and the 
array to be sorted was floating point, it punned and bit twiddled 
the floats/doubles to ints/longs, sorted them and bit twiddled 
and punned them back.

Re: A nice way to step into 2012

2011-12-27 Thread dsimcha

On Tuesday, 27 December 2011 at 15:11:25 UTC, Andrei Alexandrescu 
wrote:
Imagine how bitter I am that the string lambda syntax didn't 
catch on!


Andrei


Please tell me they're not going anywhere.  I **really** don't 
want to deal with those being deprecated.

Re: Looking for SciLib

2011-12-26 Thread dsimcha

On Monday, 26 December 2011 at 10:11:01 UTC, Lars T. Kyllingstad 
wrote:
So submitting Cristi's library for inclusion in Phobos is now 
off the table?


-Lars


In the _near_ future, yes.  It's still too much of a work in 
progress.  Submitting to Phobos is still the eventual goal, 
though.

CURL Wrapper: Congratulations Next up: std.serialize

2011-12-26 Thread dsimcha

By a vote of 14-0, Jonas Drewsen's CURL wrapper (std.net.curl) 
has been accepted into Phobos.  Thanks to Jonas for his hard work 
and his persistence through the multiple rounds of review that it 
took to get this module up to Phobos's high and increasing 
quality standard.


Keep the good work coming.  Next in line, if it's ready, is Jacob 
Carlborg's std.serialize.  Jacob, please post here when you've 
got something ready to go.

Re: Looking for SciLib

2011-12-25 Thread dsimcha

On Monday, 26 December 2011 at 00:46:44 UTC, Jonathan M Davis 
wrote:

Sounds like they should probably be merged at some point.

- Jonathan M Davis


Yeah, I've started working on Cristi's fork now that I've built a 
good enough mental model of the implementation details that I can 
modify the code.  This fork is still very much a work in 
progress.  A merge with Lars's code is a good idea at some point, 
but right now debugging and fleshing out the linalg stuff is a 
higher priority.

Re: Binary Size: function-sections, data-sections, etc.

2011-12-21 Thread dsimcha

Indeed, a couple small programs I wrote today behave erratically 
w/ gc-sections.  This only seems to occur on DMD, but I'm not 
sure if this is a bug in DMD or if differences in library build 
configurations between compilers (these are workarounds for bugs 
in GDC and LDC) explain it.


On Wednesday, 21 December 2011 at 04:15:21 UTC, Artur Skawina 
wrote:

On 12/20/11 19:59, Trass3r wrote:

Seems like --gc-sections _can_ have its pitfalls:
http://blog.flameeyes.eu/2009/11/21/garbage-collecting-sections-is-not-for-production

Also I read somewhere that --gc-sections isn't always 
supported (no standard switch or something like that).


The scenario in that link apparently involves a hack, where a 
completely unused symbol
is used to communicate with another program/library (which 
checks for its presence with

dlsym(3)).
The linker will omit that symbol, as nothing else references it 
- the solution is to
simply reference it from somewhere. Or explicitly place it in a 
used section. Or
incrementally link in the unused symbols _after_ the gc pass. 
Or...


If you use such hacks you have to handle them specially; 
there's no way for the compiler
to magically know which unreferenced symbols are not really 
unused. (which is also why
this optimization isn't very useful for shared libs - every 
visible symbol has to be

assumed used, for obvious reasons)

The one potential problematic case i mentioned in that gdc bug 
mentioned above is this:
If the D runtime (most likely GC) needs to know the start/end 
of the data and bss
sections _and_ does it in a way that can confuse it if some 
unreferenced parts of these
sections disappear and/or are reordered, then turning on the 
section GC could uncover
this bug. From the few simple tests i ran here everything seems 
to work fine, but I did
not check the code to confirm there are no incorrect 
assumptions present.


I personally see no reason not to use -ffunction-sections and 
-fdata-sections for compiling phobos though, cause a test with 
gdc didn't even result in a much bigger lib file, nor did it 
take significantly longer to compile/link.


737k - 320k executable size reduction is a compelling argument.

That site I linked claims though, that it does mean serious 
overhead even if --gc-sections is omitted then.


?


So we have to do tests with huge codebases first.


yes.

artur

auto + Top-level Const/Immutable

2011-12-20 Thread dsimcha

The changes made to IFTI in DMD 2.057 are great, but they reveal another 
hassle with getting generic code to play nice with const.


import std.range, std.array;

ElementType!R sum(R)(R range) {
if(range.empty) return 0;
auto ans = range.front;
range.popFront();

foreach(elem; range) ans += elem;
return ans;
}

void main() {
const double[] nums = [1, 2, 3];
sum(nums);
}

test.d(8): Error: variable test9.sum!(const(double)[]).sum.ans cannot 
modify const
test.d(14): Error: template instance test9.sum!(const(double)[]) error 
instantiating


Of course this is fixable with an Unqual, but it requires the programmer 
to remember this every time and breaks for structs with indirection. 
Should we make `auto` also strip top-level const from primitives and 
arrays and, if const(Object)ref gets in, from objects?

Re: Top C++

2011-12-20 Thread dsimcha


On Tuesday, 20 December 2011 at 15:21:46 UTC, deadalnix wrote:

http://www.johndcook.com/blog/2011/06/14/why-do-c-folks-make-things-so-complicated/


Sounds a lot like SafeD vs. non-safe D.

Binary Size: function-sections, data-sections, etc.

2011-12-20 Thread dsimcha

I started poking around and examining the details of how the GNU 
linker works, to solve some annoying issues with LDC.  In the 
process I the following things that may be useful low-hanging 
fruit for reducing binary size:


1.  If you have an ar library of object files, by default no dead 
code elimination is apparently done within an object file, or at 
least not nearly as much as one would expect.  Each object file 
in the ar library either gets pulled in or doesn't.


2.  When something is compiled with -lib, DMD writes libraries 
with one object file **per function**, to get around this.  GDC 
and LDC don't.  However, if you compile the object files and then 
manually make an archive with the ar command (which is common in 
a lot of build processes, such as gtkD's), this doesn't apply.


3.  The defaults can be overridden if you compile your code with 
-ffunction-sections and -fdata-sections (DMD doesn't support 
this, GDC and LDC do) and link with --gc-sections.  
-ffunction-sections and -fdata-sections cause each function or 
piece of static data to be written as its own section in the 
object file, instead of having one giant section that's either 
pulled in or not.  --gc-sections garbage collects unused 
sections, resulting in much smaller binaries especially when the 
sections are fine-grained.


On one project I'm working on, I compiled all the libs I use with 
GDC using -ffunction-sections -fdata-sections.  The stripped 
binary is 5.6 MB when I link the app without --gc-sections, or 
3.5 MB with --gc-sections.  Quite a difference.  The difference 
would be even larger if Phobos were compiled w/ 
-ffunction-sections and -fdata-sections.  (See 
https://bitbucket.org/goshawk/gdc/issue/293/ffunction-sections-fdata-sections-for 
).


DMD can't compile libraries with -ffunction-sections or 
-fdata-sections and due to other details of my build process that 
are too complicated to explain here, the results from DMD aren't 
directly comparable to those from GDC.  However, --gc-sections 
reduces the DMD binaries from 11 MB to 9 MB.


Bottom line:  If we want to reduce D's binary size there are two 
pieces of low-hanging fruit:


1.  Make -L--gc-sections the default in dmd.conf on Linux and 
probably other Posix OS's.


2.  Add -ffunction-sections and -fdata-sections or equivalents to 
DMD and compile Phobos with these enabled.  I have no idea how 
hard this would be, but I imagine it would be easy for someone 
who's already familiar with object file formats.

Re: auto + Top-level Const/Immutable

2011-12-20 Thread dsimcha

On Tuesday, 20 December 2011 at 17:46:40 UTC, Jonathan M Davis 
wrote:
Assuming that the assignment can still take place, then making 
auto infer non-
const and non-immutable would be an improvement IMHO. However, 
there _are_ cases where you'd have to retain const - a prime 
example being classes. But value types could have 
const/immutable stripped from them, as could arrays using their 
tail-constness.


- Jonathan M Davis


Right.  The objects would only be head de-constified if Michael 
Fortin's patch to allow such things got in.  A simple way of 
explaining this would be auto removes top level const from the 
type T if T implicitly converts to the type that would result.

Re: Program size, linking matter, and static this()

2011-12-20 Thread dsimcha


On Tuesday, 20 December 2011 at 20:51:38 UTC, Marco Leise wrote:

Am 19.12.2011, 20:43 Uhr, schrieb Jacob Carlborg d...@me.com:
On Windows I see few applications that install libraries 
separately, unless they started on Linux or the libraries are 
established like DirectX. In the past DLLs from newly installed 
programs used to overwrite existing DLLs. IIRC the DLLs were 
then checked for their versions by installers, so they are 
never downgraded, but that still broke some applications with 
library updates that changed the API. Starting with Vista, 
there is the winsxs difrectory that - as I understand it - 
keeps a copy of every version of every dll associated to the 
programs that installed/use them.


Minor nitpick:  winsxs has been around since XP.

Re: Looking for SciLib

2011-12-20 Thread dsimcha


On 12/20/2011 5:58 PM, filgood wrote:

or this?...seems contain further development

https://github.com/cristicbz/scid




I mentored that GSoC project, so since people appear to be interested in 
it I'll give a status report.


The GSoC project was left in a somewhat rough/half-finished state at the 
end of GSoC because of several unanticipated problems and the ambitious 
(possibly overly so) nature of the project.  Cristi (the GSoC student I 
mentored) and I have been slowly improving things since the end of GSoC 
but both of us have limited time to work on it.


From GSoC we got a solid set of matrix/vector containers and an 
expression template system.  AFAIK there are no major issues with these. 
 The expression template system supports addition, subtraction, 
multiplication and division/inversion with matrices, vectors and scalars.


The expression template evaluator works well for general matrix storage, 
but is badly broken for packed matrices (e.g. triangular, symmetric, 
diagonal).  Fixing this is time consuming but is a simple matter of 
programming.


I'm starting to implement Lapack wrappers for common matrix 
factorizations in scid.linalg.  These are tedious to write because of 
the obtuseness of the Lapack API and the need to support both a 
high-level interface and one that allows very explicit memory 
management.  Again, though, it's a simple matter of programming.


There are a few performance problems that need to be ironed out (though 
perhaps I should do some benchmarking before I claim so boldly that 
these problems are serious).  See 
https://github.com/cristicbz/scid/issues/77 .


After the above issues are resolved, I think the next thing on the 
roadmap would be to start building on this foundation to add support for 
higher level scientific computing stuff.  For example, I have a bunch of 
statistics/machine learning code (https://github.com/dsimcha/dstats) 
that was written before SciD existed.  I'm slowly integrating with 
SciD's current foundation and will probably merge it with SciD once SciD 
is more stable and bug-free.

Re: Reducing Linker Bugs

2011-12-19 Thread dsimcha


On 12/19/2011 12:54 AM, Walter Bright wrote:

On 12/18/2011 8:38 PM, dsimcha wrote:

Two questions:

1. What's the best way to file a bug report against Optlink when I get
one of
those Optlink terminated unexpectedly windows and I'm linking in
libraries
that I don't have the source code to and thus can't reduce?


In that case, the best thing is to zip it all up and file a bugzilla
report on it.


Do you need the sources or just the object/library binaries?

Re: Reducing Linker Bugs

2011-12-19 Thread dsimcha

The OMF library that I don't have the source to is a BLAS/LAPACK 
stub library that calls into a DLL.  It was uploaded ~5 years ago 
to DSource by Bill Baxter.  I know absolutely no details about 
how he compiled it.


On Monday, 19 December 2011 at 18:04:26 UTC, Walter Bright wrote:

On 12/19/2011 5:51 AM, dsimcha wrote:

On 12/19/2011 12:54 AM, Walter Bright wrote:

On 12/18/2011 8:38 PM, dsimcha wrote:

Two questions:

1. What's the best way to file a bug report against Optlink 
when I get

one of
those Optlink terminated unexpectedly windows and I'm 
linking in

libraries
that I don't have the source code to and thus can't reduce?


In that case, the best thing is to zip it all up and file a 
bugzilla

report on it.


Do you need the sources or just the object/library binaries?


For linker problems, doan need no steenkin' sources.

BTW, optlink is known to have problems with weak extern records.

Where did your omf libraries come from?

Re: Java Scala

2011-12-19 Thread dsimcha


On Monday, 19 December 2011 at 19:52:41 UTC, ddverne wrote:
On Sunday, 18 December 2011 at 07:09:21 UTC, Walter Bright 
wrote:
A programmer who doesn't know assembler is never going to 
write better than second rate programs.


Please I don't want to flame this thread or anything like that, 
but this isn't a lack of modesty or a little odd?


The phrase: Who never wrote anything in ASM will not make a 
firt-rate program is a bit odd, because for me it's like say: 
A programmer who never programs on punched cards will never 
going to write a first-rate program.


Finally, what I mean is:

Saying that will bring something good for the community? Or 
should a new programmer would stop his D programming studies 
and start with Assembly.


That misses the point.  Assembly language teaches the 
fundamentals of how a computer works at a low level.  It's 
similar to learning Lisp in that it makes you better able to 
reason about programming even if you never actually program in 
it.  The only difference is that Lisp stretches your reasoning 
ability towards the highest abstraction levels, assembly language 
does it for the lowest levels.


Programming on punchcards is equivalent to typing:  It is/was 
sometimes a necessary practical skill, but there's nothing 
conceptually deep about it that makes it worth learning even if 
it's not immediately practical.

Re: Java Scala

2011-12-18 Thread dsimcha


On 12/18/2011 2:14 AM, Russel Winder wrote:


Python is also used in industry and commerce, so it is not just a
teaching language.  Almost all post-production software uses C++ and
Python.  Most HPC is now Fortran, C++ and Python.

This latter would be a great area for D to try and break into, but sadly
I don't hink it would now be possible.


Please elaborate.  I think D for HPC is a terrific idea.  It's the only 
language I know of with all of the following four attributes:


1.  Allows you to program straight down to the bare metal with zero or 
close to zero overhead, like C and C++.


2.  Interfaces with C and Fortran legacy code with minimal or no overhead.

3.  Has modern convenience/productivity features like GC, a real module 
system and structural typing via templates.


4.  Has support for parallelism in the standard library.  (I'm aware of 
OpenMP, but in my admittedly biased opinion std.parallelism is orders of 
magnitude more flexible and easier to use.)

Re: Java Scala

2011-12-18 Thread dsimcha


On 12/18/2011 2:09 AM, Walter Bright wrote:

A programmer who doesn't know assembler is never going to write better
than second rate programs.


I don't even know assembler that well and I agree 100%.  I can read bits 
of assembler and recognize compiler optimizations and could probably 
mechanically translate C code to x86 assembler, but I'd be lost if asked 
to write anything more complicated than a small function from scratch or 
do anything without some reference material.


Even this basic level of knowledge has given me insights into language 
design.  For example:  I'd love to be asked in an interview whether 
default arguments to virtual functions are determined by the compile 
time or runtime type of the object.  To someone who knows nothing about 
assembler this seems like the most off-the-wall language-lawyer minutiae 
imaginable.  To someone who knows assembler, the answer is obviously the 
compile time type.  Otherwise, you'd have to store the function's 
default arguments in the virtual function table somehow, then look each 
one up and push it onto the stack at the call site.  This would get very 
hairy and inefficient very fast.

Reducing Linker Bugs

2011-12-18 Thread dsimcha


Two questions:

1.  What's the best way to file a bug report against Optlink when I get 
one of those Optlink terminated unexpectedly windows and I'm linking 
in libraries that I don't have the source code to and thus can't reduce?


2.  I'm getting on the Optlink hating bandwagon.  How hard would it be 
to use Objconv (http://www.agner.org/optimize/#objconv) to convert the 
OMF object files DMD outputs to COFF and then use the MinGW linker to 
link the COFF files, and automate this process in DMD?

[Issue 7130] New: NRVO Bug: Wrong Code With D'tor + Conditional Return

2011-12-18 Thread dsimcha

The NG server was down when I submitted this to Bugzilla and it's a 
pretty important issue, so I'm posting it to the NG manually now:


http://d.puremagic.com/issues/show_bug.cgi?id=7130

import core.stdc.stdio;

struct S {
this(this) {
printf(Postblit\n);
}

~this() {
printf(D'tor\n);
}
}

S doIt(int i) {
S s1;
S s2;
printf(s1 lives at %p.\n, s1);
printf(s2 lives at %p.\n, s2);
return (i == 42) ? s1 : s2;
}

void main() {
auto s = doIt(3);
printf(s lives at %p.\n, s);
}

Output:

s1 lives at 0xffc54368.
s2 lives at 0xffc54369.
D'tor
D'tor
s lives at 0xffc5437c.
D'tor

Both D'tors are called and the returned result lives at a different address
after being returned than before, as expected if not using NRVO.  On the 
other
hand, no postblit being called for whichever struct is returned, as 
expected if

using NRVO.

CURL Wrapper: Vote Thread

2011-12-17 Thread dsimcha

The time has come to vote on the inclusion of Jonas Drewsen's CURL 
wrapper in Phobos.



Code: https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
Docs: http://freeze.steamwinter.com/D/web/phobos/etc_curl.html


For those of you on Windows, a libcurl binary built by DMC is available 
at http://gool.googlecode.com/files/libcurl_7.21.7.zip.



Voting lasts one week and ends on 12/24.

Re: Second Round CURL Wrapper Review

2011-12-12 Thread dsimcha

On Tuesday, 13 December 2011 at 00:47:26 UTC, David Nadlinger 
wrote:
I don't know if you already have a solution in the works, but 
maybe the future interface I did for Thrift is similar to what 
you are looking for: 
http://klickverbot.at/code/gsoc/thrift/docs/thrift.util.future.html


David


Doesn't std.parallelism's task parallelism API work for this?  
(Roughly speaking a task in std.parallelism == a future in your 
Thrift API.)  If not, what can I do to fix it so that it can?


Looking briefly at your API, one thing I notice is the ability to 
cancel a future.  This would be trivial to implement in 
std.parallelism for tasks that haven't yet started executing, but 
difficult if not impossible for tasks that are already executing. 
 Does your Thrift API allow cancelling futures that are already 
executing?  If so, how is that accomplished?


The TFutureAggregatorRange could be handled by a parallel foreach 
loop if I understand correctly, though it would look a little 
different.

Re: Fixing const arrays

2011-12-11 Thread dsimcha


On 12/10/2011 4:47 PM, Andrei Alexandrescu wrote:

We decided to fix this issue by automatically shedding the top-level
const when passing an array or a pointer by value into a function.


Really silly question:  Why not do the same for primitives (int, float, 
char, etc.) or even structs without indirection?  I've seen plenty of 
code that blows up when passed an immutable double because it tries to 
mutate its arguments.  About 1.5 years ago I fixed a bug like this in 
std.math.pow().

Re: A benchmark, mostly GC

2011-12-11 Thread dsimcha


On 12/11/2011 4:26 PM, Timon Gehr wrote:

We are talking about supporting precise GC, not about custom runtime
reflection. There is no way to get precise GC right without compiler
support.


FWIW my original precise heap scanning patch generated pointer offset 
information using CTFE and templates.  The code to do this is still in 
Bugzilla and only took a couple hours to write.

Re: Second Round CURL Wrapper Review

2011-12-11 Thread dsimcha


Here's my review.  Remember, review ends on December 16.

Overall, this library has massively improved due to the rounds of 
review it's been put through.  I only found a few minor nitpicks. 
 However, a recurring pattern is minor grammar mistakes in the 
documentation.  Please proofread all documentation again.


Docs:

The high level API is build - The high level API is built

LibCurl is licensed under a MIT/X derivate license - LibCurl 
is licensed under an MIT/X derivative license


AutoConnect:  Connection type used when the url should be used 
to auto detect protocol.  -  auto detect THE protocol


Why is there a link to curl_easy_set_opt in the byLineAsync and 
byChunkAsync docs?


In onSend:  The length of the void[] specifies the maximum 
number of bytes that can be send. - can be SENT


What is the use case for exposing struct Curl?  I prefer if this 
were unexposed because we'll obviously be unable to provide a 
replacement if/when the backend to this library is rewritten in 
pure D.


Actually, that leads to another question:  Should this module 
really be named etc.curl/std.curl/std.net.curl, or should the 
fact that it currently uses Curl as a backend be relegated to an 
implementation detail?


Code:

pragma(lib) basically doesn't work on Linux because the object 
format doesn't support it.  Don't rely on it.


Should the protocol detection be case-insensitive, i.e. ftp://; 
== FTP://?

Re: Second Round CURL Wrapper Review

2011-12-11 Thread dsimcha


On 12/11/2011 7:53 PM, dsimcha wrote:

Should the protocol detection be case-insensitive, i.e. ftp://; ==
FTP://?


Oh, one more thing:  Factor the protocol detection out into a function. 
 You have the same expression cut and pasted everywhere:


if(url.startsWith(ftp://;) || url.startsWith(ftps://) ...

Re: A benchmark, mostly GC

2011-12-11 Thread dsimcha


On 12/11/2011 9:41 PM, Timon Gehr wrote:

On 12/11/2011 11:37 PM, dsimcha wrote:

On 12/11/2011 4:26 PM, Timon Gehr wrote:

We are talking about supporting precise GC, not about custom runtime
reflection. There is no way to get precise GC right without compiler
support.


FWIW my original precise heap scanning patch generated pointer offset
information using CTFE and templates. The code to do this is still in
Bugzilla and only took a couple hours to write.


But it is not precise for the stack, right? How much work is left to the
programmer to generate the information?


It wasn't precise on the stack, but for unrelated reasons.  As far as 
work left to the programmer, I has created templates for new (which I 
thought at the time might get integrated into the compiler).  To use the 
precise heap scanning, all you had to do was:


class C {
void* ptr;
size_t integer;
}

void main() {
auto instance = newTemplate!C();
}

DustMite: Unwrap? Imports?

2011-12-08 Thread dsimcha

I've recently started using DustMite to reduce compiler errors in 
SciD, which instantiates an insane number of templates and is 
nightmarish to reduce by hand.


Two questions:

1.  What exactly does unwrap (as opposed to remove) do?

2.  When there are multiple imports in a single statement, i.e. 
import foo, bar;, does DustMite try to get rid of individual 
ones without deleting the whole statement?  Is this what unwrap 
does?

Re: rt_finalize WTFs?

2011-12-05 Thread dsimcha

== Quote from Martin Nowak (d...@dawgfoto.de)'s article
 I appreciate the recursion during mark, wanted to do this myself
 sometime ago but expected a little more gain.

The reason the gain wasn't huge is because on the benchmark I have that 
involves a
deep heap graph, sweeping time dominates marking time.  The performance gain for
the mark phase only (which is important b/c this is when the world needs to be
stopped) is ~20-30%.

 Some more ideas:
   - Do a major refactoring of the GC code, making it less reluctant
 to changes. Adding sanity checks or unit tests would be great.
 This probably reveals some obfuscated performance issues.

Not just obfuscated ones.  I've wanted to fix an obvious perf bug for two years
and haven't done it because the necessary refactoring would be unbelievably 
messy
and I'm too afraid I'll break something.  Basically, malloc() sets the bytes
between the size you requested and the size of the block actually allocated to
zero to prevent false pointers.  This is reasonable.  The problem is that it 
does
so **while holding the GC's lock**.  Fixing it for just the case when malloc() 
is
called by the user is also easy.  The problem is fixing it when malloc() gets
called from realloc(), calloc(), etc.

   - Add more realistic GC benchmarks, just requires adding to
 druntime/test/gcbench using the new runbench. The tree1 mainly
 uses homogeneous classes, so this is very synthesized.

I'll crowdsource this.  I can't think of any good benchmarks that are  a few
hundred lines w/ no dependencies but aren't pretty synthetic.

   - There is one binary search pool lookup for every scanned address in
 range.
 Should be a lot to gain here, but it's difficult. It needs a multilevel
 mixture of bitset/hashtab.

I understand the problem, but please elaborate on the proposed solution.  You've
basically got a bunch of pools, each of which represents a range of memory
addresses, not a single address (so a basic hashtable is out).  You need to know
which range some pointer fits in.  How would you beat binary search/O(log N) 
for this?

   - Reduce the GC roots range. I will have to work on this for
 shared library support anyhow.

Please clarify what you mean by reduce the roots range.

Thanks for the feedback/suggestions.

Re: rt_finalize WTFs?

2011-12-05 Thread dsimcha

== Quote from Martin Nowak (d...@dawgfoto.de)'s article
  More promising is to put pool addresses ranges in a trie.
 
  addr[7]  [...  . ...]
/ |\
  addr[6] [...   .   ...][...   .   ...]
  /   |\ /   |   \
  addr[5] pool:8 [...   .  ...]
 /   |   \
  addr[4]  pool:8 [] pool:5
 
 Actually 64-bit should use a hashtable for the upper 32-bit and then
 the the 32-bit trie for lower.

Why do you expect this to be faster than a binary search?  I'm not saying it 
won't
be, just that it's not a home run that deserves a high priority as an
optimization.  You still have a whole bunch of indirections, probably more than
you would ever have for binary search.

Re: rt_finalize WTFs?

2011-12-05 Thread dsimcha


On 12/5/2011 6:39 PM, Trass3r wrote:

On 05/12/2011 01:46, dsimcha wrote:

I'm at my traditional passtime of trying to speed up D's garbage
collector again


Have you thought about pushing for the inclusion of CDGC at
all/working on the tweaks needed to make it the main GC?


So true, it's been rotting in that branch.


IIRC CDGC includes two major enhancements:

1.  The snapshot GC for Linux.  (Does this work on OSX/FreeBSD/anything 
Posix, or just Linux?  I'm a bit skeptical about whether a snapshot GC 
is really that great an idea given its propensity to waste memory on 
long collect cycles with a lot of mutation.)


2.  I think there was some precise heap scanning-related stuff in it.  I 
originally tried to implement precise heap scanning a couple years ago, 
but it went nowhere for reasons too complicated to explain here.  Given 
this experience, I'm not inclined to try again until the compiler has 
extensions for generating pointer offset information.

rt_finalize WTFs?

2011-12-04 Thread dsimcha

I'm at my traditional passtime of trying to speed up D's garbage 
collector again, and I've stumbled on the fact that rt_finalize is 
taking up a ridiculous share of the time (~30% of total runtime) on a 
benchmark where huge numbers of classes **that don't have destructors** 
are being created and collected.  Here's the code to this function, from 
lifetime.d:


extern (C) void rt_finalize(void* p, bool det = true)
{
debug(PRINTF) printf(rt_finalize(p = %p)\n, p);

if (p) // not necessary if called from gc
{
ClassInfo** pc = cast(ClassInfo**)p;

if (*pc)
{
ClassInfo c = **pc;
byte[]w = c.init;

try
{
if (det || collectHandler is null || 
collectHandler(cast(Object)p))

{
do
{
if (c.destructor)
{
fp_t fp = cast(fp_t)c.destructor;
(*fp)(cast(Object)p); // call destructor
}
c = c.base;
} while (c);
}
if ((cast(void**)p)[1]) // if monitor is not null
_d_monitordelete(cast(Object)p, det);
(cast(byte*) p)[0 .. w.length] = w[];  // WTF?
}
catch (Throwable e)
{
onFinalizeError(**pc, e);
}
finally  // WTF?
{
*pc = null; // zero vptr
}
}
}
}

Getting rid of the stuff I've marked with //WTF? comments (namely the 
finally block and the re-initializing of the memory occupied by the 
finalized object) speeds things up by ~15% on the benchmark in question. 
 Why do we care what state the blob of memory is left in after we 
finalize it?  I can kind of see that we want to clear things if 
delete/clear was called manually and we want to leave the object in a 
state that doesn't look valid.  However, this has significant 
performance costs and IIRC is already done in clear() and delete is 
supposed to be deprecated.  Furthermore, I'd like to get rid of the 
finally block entirely, since I assume its presence and the effect on 
the generated code is causing the slowdown, not the body, which just 
assigns a pointer.


Is there any good reason to keep this code around?

Re: rt_finalize WTFs?

2011-12-04 Thread dsimcha

Thanks for the benchmark.  I ended up deciding to just create a second 
function, rt_finalize_gc, that gets rid of a whole bunch of cruft that 
isn't necessary in the GC case.  I think it's worth the small amount of 
code duplication it creates.  Here are the results of my efforts so far: 
 https://github.com/dsimcha/druntime/wiki/GC-Optimizations-Round-2 . 
I've got one other good idea that I think will shave a few seconds off 
the Tree1 benchmark if I don't run into any unforeseen obstacles in 
implementing it.


On 12/4/2011 10:07 PM, Martin Nowak wrote:

On Mon, 05 Dec 2011 02:46:27 +0100, dsimcha dsim...@yahoo.com wrote:


I'm at my traditional passtime of trying to speed up D's garbage
collector again, and I've stumbled on the fact that rt_finalize is
taking up a ridiculous share of the time (~30% of total runtime) on a
benchmark where huge numbers of classes **that don't have
destructors** are being created and collected. Here's the code to this
function, from lifetime.d:

extern (C) void rt_finalize(void* p, bool det = true)
{
debug(PRINTF) printf(rt_finalize(p = %p)\n, p);

if (p) // not necessary if called from gc
{
ClassInfo** pc = cast(ClassInfo**)p;

if (*pc)
{
ClassInfo c = **pc;
byte[] w = c.init;

try
{
if (det || collectHandler is null || collectHandler(cast(Object)p))
{
do
{
if (c.destructor)
{
fp_t fp = cast(fp_t)c.destructor;
(*fp)(cast(Object)p); // call destructor
}
c = c.base;
} while (c);
}
if ((cast(void**)p)[1]) // if monitor is not null
_d_monitordelete(cast(Object)p, det);
(cast(byte*) p)[0 .. w.length] = w[]; // WTF?
}
catch (Throwable e)
{
onFinalizeError(**pc, e);
}
finally // WTF?
{
*pc = null; // zero vptr
}
}
}
}

Getting rid of the stuff I've marked with //WTF? comments (namely the
finally block and the re-initializing of the memory occupied by the
finalized object) speeds things up by ~15% on the benchmark in
question. Why do we care what state the blob of memory is left in
after we finalize it? I can kind of see that we want to clear things
if delete/clear was called manually and we want to leave the object in
a state that doesn't look valid. However, this has significant
performance costs and IIRC is already done in clear() and delete is
supposed to be deprecated. Furthermore, I'd like to get rid of the
finally block entirely, since I assume its presence and the effect on
the generated code is causing the slowdown, not the body, which just
assigns a pointer.

Is there any good reason to keep this code around?


Not for the try block. With errors being not recoverable you don't need
to care
about zeroing the vtbl or you could just copy the code into the catch
handler.
This seems to cause less spilled variables.

Most expensive is the call to a memcpy@PLT, replace it with something
inlineable.
Zeroing is not much faster than copying init[] for small classes.

At least zeroing should be worth it, unless the GC would not scan the
memory otherwise.

gcbench/tree1 = 41.8s = https://gist.github.com/1432117 =
gcbench/tree1 = 33.4s

Please add useful benchmarks to druntime.

martin

Re: gl3n - linear algebra and more for D

2011-12-03 Thread dsimcha

I don't know much about computer graphics but I take it that a sane 
design for a matrix/vector library geared towards graphics is completely 
different from one geared towards general numerics/scientific computing? 
 I'm trying to understand whether SciD (which uses BLAS/LAPACK and 
expression templates) overlaps with this at all.


On 12/2/2011 5:36 PM, David wrote:

Hello,

I am currently working on gl3n - https://bitbucket.org/dav1d/gl3n - gl3n
provides all the math you need to work with OpenGL, DirectX or just
vectors and matrices (it's mainly targeted at graphics - gl3n will never
be more then a pure math library). What it supports:

  * vectors
  * matrices
  * quaternions
  * interpolation (lerp, slerp, hermite, catmull rom, nearest)
  * nearly all glsl functions (according to spec 4.1)
  * some more cool features, like templated types (vectors, matrices,
quats), cool ctors, dynamic swizzling

And the best is, it's MIT licensed ;). Unfortunatly there's no
documentation yet, but it shouldn't be hard to understand how to use it,
if you run anytime into troubles just take a look into the source, I did
add to every part of the lib unittests, so you can see how it works when
looking at the unittests, furthermore I am very often at #D on freenode.
But gl3n isn't finished! My current plans are to add more interpolation
functions and the rest of the glsl defined functions, but I am new to
graphics programming (about 4 months I am now into OpenGL), so tell me
what you're missing, the chances are good that I'll implement and add
it. So let me know what you think about it.

Before I forget it, a bit of code to show you how to use gl3n:


vec4 v4 = vec4(1.0f, vec3(2.0f, 3.0f, 4.0f));
vec4 v4 = vec4(1.0f, vec4(1.0f, 2.0f, 3.0f, 4.0f).xyz)); // dynamic
swizzling with opDispatch
vec3 v3 = my_3dvec.rgb;
float[] foo = v4.xyzzzwzyyxw // not useful but possible!
glUniformMatrix4fv(location, 1, GL_TRUE, mat4.translation(-0.5f, -0.54f,
0.42f).rotatex(PI).rotatez(PI/2).value_ptr); // yes they are row major!
mat3 inv_view = view.rotation;
mat3 inv_view = mat3(view);
mat4 m4 = mat4(vec4(1.0f, 2.0f, 3.0f, 4.0f), 5.0f, 6.0f, 7.0f, 8.0f,
vec4(…) …);

struct Camera {
 vec3 position = vec3(0.0f, 0.0f, 0.0f);
 quat orientation = quat.identity;

 Camera rotatex(real alpha) { orientation.rotatex(alpha); return this; }
 Camera rotatey(real alpha) { orientation.rotatey(alpha); return this; }
 Camera rotatez(real alpha) { orientation.rotatez(alpha); return this; }

 Camera move(float x, float y, float z) {
 position += vec3(x, y, z);
 return this;
 }
 Camera move(vec3 s) {
 position += s;
 return this;
 }

 @property camera() {
 //writefln(yaw: %s, pitch: %s, roll: %s,
degrees(orientation.yaw), degrees(orientation.pitch),
degrees(orientation.roll));
 return mat4.translation(position.x, position.y, position.z) *
orientation.to_matrix!(4,4);
 }
}

 glUniformMatrix4fv(programs.main.view, 1, GL_TRUE,
cam.camera.value_ptr);
 glUniformMatrix3fv(programs.main.inv_rot, 1, GL_TRUE,
cam.orientation.to_matrix!(3,3).inverse.value_ptr);


I hope this gave you a little introduction of gl3n.

- dav1d

Re: Java Scala

2011-12-03 Thread dsimcha


On 12/3/2011 10:39 AM, Andrei Alexandrescu wrote:

On 12/3/11 3:02 AM, Russel Winder wrote:

The PyPy JIT is clearly a big win. I am sure Armin will come up with
more stuff :-)


Do they do anything about the GIL?

Andrei



Unfortunately, no.  I checked into this at one point because I basically 
use parallelism for everything in D and have an 8-core computer at work. 
 Therefore, if PyPy is a factor of 5 (just making up numbers) slower 
than D for equivalently written code, it's 40x slower once you consider 
that parallelism is easy in D and really hard except at the coarsest 
grained levels in PyPy.

Second Round Review of CURL Wrapper

2011-12-02 Thread dsimcha

I volunteered ages ago to manage the review for the second round of 
Jonas Drewsen's CURL wrapper.  After the first round it was decided 
that, after a large number of minor issues were fixed, a second round 
would be necessary.


Significant open issues:

1.  Should libcurl be bundled with DMD on Windows?

2.  etc.curl, std.curl, or std.net.curl?  (We had a vote a while back 
but it was buried deep in a thread and a lot of people may have missed 
it:  http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )


Code: https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
Docs: http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

For those of you on Windows, a libcurl binary built by DMC is available 
at http://gool.googlecode.com/files/libcurl_7.21.7.zip.


Review starts now and ends on December 16, followed by one week of 
voting.  __Please post all reviews to digitalmars.D, not to the 
announcement forum.__

Re: Java Scala

2011-12-02 Thread dsimcha


On 12/2/2011 3:08 AM, Walter Bright wrote:

On 12/1/2011 11:59 PM, Russel Winder wrote:

(*) RPython is a subset of Python which allows for the creation of
native code executables of interpreters, compilers, etc. that are
provably faster than hand written C. http://pypy.org/


Provably faster?

I can't find support for that on http://pypy.org


http://speed.pypy.org/

Not exactly rigorous mathematical proof, but pretty strong evidence. 
Also, I use PyPy once in a while for projects where speed matters a 
little but I want to share my code with Python people or want to use 
Python's huge standard library.  Anecdotally, it's definitely faster. 
The reason has nothing to do with the language it's written in.  It's 
because PyPy JIT compiles a lot of the Python code instead of 
interpreting it.

Second Round CURL Wrapper Review

2011-12-02 Thread dsimcha

I volunteered ages ago to manage the review for the second round of 
Jonas Drewsen's CURL wrapper.  After the first round it was decided 
that, after a large number of minor issues were fixed, a second round 
would be necessary.


Significant open issues:

1.  Should libcurl be bundled with DMD on Windows?

2.  etc.curl, std.curl, or std.net.curl?  (We had a vote a while back 
but it was buried deep in a thread and a lot of people may have missed 
it:  http://www.easypolls.net/poll.html?p=4ebd3219011eb0e4518d35ab )


Code: https://github.com/jcd/phobos/blob/curl-wrapper/etc/curl.d
Docs: http://freeze.steamwinter.com/D/web/phobos/etc_curl.html

For those of you on Windows, a libcurl binary built by DMC is available 
at http://gool.googlecode.com/files/libcurl_7.21.7.zip.


Review starts now and ends on December 16, followed by one week of voting.

Re: Is D more cryptic than C++?

2011-12-01 Thread dsimcha


On 11/30/2011 11:32 PM, Abrahm wrote:

Jesse Phillipsjessekphillip...@gmail.com  wrote in message
news:jb6qfv$1kut$1...@digitalmars.com...

What bearophile was referring to was the use of templates is common.


Are you sure about that? What say you Bear?


D's
templates have the advantage of being easier on the eyes and more
powerful (with the inclusion of 'static if' in the language).


Having come from C++land, and knowing what some people do with it,
making it EASIER to apply templates does not seem necessarily a good
thing to me. (Ref: template metaprogramming). That said, does your
statement above about D's template machinery being powerful etc., mean
it's easier to do template metaprogramming in D? If so, I, personally,
do not find that any asset at all (though I know some surely will, for
there have been books written on that abhorrence).



A lot of people from C++ backgrounds say this.  What they miss is that 
template metaprogramming in C++ is so ugly because the language wasn't 
designed for it.  In D you can do readable template metaprogramming.

Re: Phobos Wish List/Next in Review Queue?

2011-11-23 Thread dsimcha


On 11/23/2011 9:26 PM, Walter Bright wrote:

On 11/19/2011 7:02 PM, dsimcha wrote:

* Streams. (Another item where the bottleneck is mostly at the design
level and
people not really knowing what they want.)


I'm not sure what the purpose of streams would be, now that we have ranges.


Right.  As I mentioned in a previous post buried deep in this thread, I 
think streams should just be a flavor of ranges that have most or all of 
the following characteristics:


1.  Live in std.stream.

2.  Oriented toward I/O.

3.  Heavy use of higher order ranges/stacking for things like 
compression/decompression and encryption/decryption.


4.  Mostly focused on input ranges as opposed to random 
access/forward/bidirectional, since this is the best model for data from 
a network or stdin.

std.csv accepted into Phobos

2011-11-20 Thread dsimcha

I'm pleased to announce that, by a vote of 5-1, std.csv has been 
accepted into Phobos.  Also, by a vote of 3-2 with one abstention, the 
community has decided on Version 2 of the library (the one where the 
Record struct, etc. is hidden in a style similar to std.algorithm rather 
than explicitly documented).  Congratulations, Jesse.

Re: Phobos Wish List/Next in Review Queue?

2011-11-20 Thread dsimcha


On 11/20/2011 12:30 PM, Jonas Drewsen wrote:

* Containers. (AFAIK noone is working on this. It's tough to get started
because, despite lots of discussion at various times on this forum,
noone seems to really know what they want. Since the containers in
question are well-known, it's much more a design problem than an
implementation problem.)

 

* Allocators. (I think Phobos desperately needs a segmented stack/region
based allocator and I've written one. I've also tried to define a
generic allocator API, mostly following Andrei's suggestions, but I'll
admit that I didn't really know what I was doing for the general API.
Andrei has suggested that allocators should have real-world testing on
containers before being included in Phobos. Therefore, containers block
allocators and if the same person doesn't write both, there will be a
lot of communication overhead to make sure the designs are in sync.)


I've though about doing some containers myself but have hesitated since
the general opinion seem to be that allocators need to be in place first.


Yeah, this is problematic.  In voting against my allocator proposal, 
Andrei mentioned that he wanted the allocators to be well-tested in the 
container API first.  This means either we have a circular dependency or 
allocators and containers need to be co-developed.  Co-developing them 
is problematic.  If one person does containers and another allocators, 
the project might be overwhelmed by communication overhead.  If the same 
person does both, then this is asking a pretty lot for a hobby project.


Of course, I hope to graduate in 1 year and will be looking for a job 
when I do.  Any company out there have a strategic interest in D and 
want to hire me to work full-time on allocators and containers?



* Streams. (Another item where the bottleneck is mostly at the design
level and people not really knowing what they want.)


What does streams provide that could not be provided by ranges?


If I understand correctly, streams _would_ be a flavor of ranges.  They 
would just be ranges that are geared towards being stacked on top of one 
another specifically for the purpose of I/O.  They would typically be 
design around the vanilla input range (not forward, random access, etc.) 
or output ranges.


Traditionally, streams would also be class based instead of template 
based.  However, IMHO a good case can be made for template based stream 
ranges in D because we have std.range.inputRangeObject and 
std.range.outputRangeObject.  This means that you can stack streams 
using templates, with no virtual function call overhead, and then if you 
need to respect some binary interface you could just stack an 
inputRangeObject() or outputRangeObject() on top of all your other crap 
and only have one virtual call.  Example:


auto lines = lineReader(
gzipUncompresser(
rawFile(foo.gz)
)
);

// LineReader!(GzipUncompresser!(RawFile)))
pragma(msg, typeof(lines));

auto objectOriented = inputRangeObject(lines);

// InputRangeObject!(char[])
pragma(msg, typeof(objectOriented));

Re: Phobos Wish List/Next in Review Queue?

2011-11-20 Thread dsimcha


On 11/20/2011 12:30 PM, Jonas Drewsen wrote:

* Some higher level networking support, such as HTTP, FTP, etc. (Jonas
Drewsen's CURL wrapper handles a lot of this and may be ready for a
second round of review.)


As I've mentioned in another thread it is ready for a second round of
review. We just need someone to step up and run the review since it
wouldn't be apropriate for me to do it myself. Anyone?


If noone else wants to volunteer, I will again.  Is there something I'm 
missing?  I find that being review manage takes very little effort: 
Post an initial review announcement, post a reminder, maybe post a 
summary/moderation message here and there, post a vote message, post a 
vote reminder, tally votes.  It really doesn't take much time.

Phobos Wish List/Next in Review Queue?

2011-11-19 Thread dsimcha

Now that we've got a lot of contributors to Phobos and many projects in 
the works, I decided to start a thread to help us make a rough plan for 
Phobos's short-to-medium term development.  There are three goals here:


1.  Determine what's next in the review queue after std.csv (voting on 
std.csv ends tonight, so **please vote**).


2.  Come up with a wish list of high-priority modules that Phobos is 
missing that would make D a substantially more attractive language than 
it is now.


3.  Figure out who's already working on what from the wish list and what 
bottlenecks, if any, are getting in the way and what can be done about them.


The following is the wish list as I see it.  Please suggest additions 
and correct any errors, as this is mostly off the top of my head.  Also, 
status updates if you're working on any of these and anything 
substantial has changed would be appreciated.


*  Some higher level networking support, such as HTTP, FTP, etc.  (Jonas 
Drewsen's CURL wrapper handles a lot of this and may be ready for a 
second round of review.)


*  Serialization.  (Jacob Carolberg's Orange library might be a good 
candidate.  IIRC he said it's close to ready for review.)


*  Encryption and hashing.  (This is more an implementation problem than 
a design problem and AFAIK noone is working on it.)


*  Containers.  (AFAIK noone is working on this.  It's tough to get 
started because, despite lots of discussion at various times on this 
forum, noone seems to really know what they want.  Since the containers 
in question are well-known, it's much more a design problem than an 
implementation problem.)


*  Allocators.  (I think Phobos desperately needs a segmented 
stack/region based allocator and I've written one.  I've also tried to 
define a generic allocator API, mostly following Andrei's suggestions, 
but I'll admit that I didn't really know what I was doing for the 
general API.  Andrei has suggested that allocators should have 
real-world testing on containers before being included in Phobos. 
Therefore, containers block allocators and if the same person doesn't 
write both, there will be a lot of communication overhead to make sure 
the designs are in sync.)


*  Streams.  (Another item where the bottleneck is mostly at the design 
level and people not really knowing what they want.)


*  Compression/archiving.  (Opening standard compressed/archived file 
formats needs to just work.  This includes at least zip, gzip, tar and 
bzip2.  Of course, zip already is available and gzip is supported by the 
zlib module but with a crufty C API.  At least gzip and bzip2, which are 
stream-based as opposed to file-based, should be handled via streams, 
which means that streams block compression/archiving.  Also, since tar 
and zip are both file based, they should probably be handled by the same 
API, which might mean deprecating std.zip and rewriting it.)


*  An improved std.xml.  (I think Thomas Sowinski is working on a 
replacement, but I haven't seen any updates in a long time.)


*  Matrices and linear algebra.  (Cristi Cobzarenco's GSoC project is a 
good starting point but it needs polish.  I've been in contact with him 
occasionally since GSoC ended and he indicated that he wants to get back 
to working on it but doesn't have time.  I've contributed to it 
sparingly, but find it difficult because I haven't gotten around to 
familiarizing myself with the implementation details yet, and it's hard 
to get into a project that complex with a few hours a week as opposed to 
focusing full time on it.)


*  std.database.  (Apparently Steve Teale is working on this.  This is a 
large, complicated project because we're trying to define a common API 
for a variety of RDBMSs.  Again, it's more a design problem than an 
implementation problem.)


*  Better support for creating processes/new std.process.  (Lars 
Kyllingstad wrote a replacement candidate for Posix and Steve 
Schveighoffer ported it to Windows, but issues with the DMC runtime 
prevent it from working on Windows.)


*  Parallel algorithms.  (I've implemented a decent amount of these in 
my std.parallel_algorithm Github project, but I've become somewhat 
frustrated and unmotivated to finish this project because so many of the 
relevant algorithms seem memory bandwidth bound and aren't substantially 
faster when parallelized than when run serially.)


After writing this, the general pattern I notice is that lots of stuff 
is blocked by design, not implementation.  In a lot of cases people 
don't really know what they want and analysis paralysis results.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1654 matches

Mail list logo