Re: AA rehash threshold

2014-11-20 Thread Jerry Quinn via Digitalmars-d
Steven Schveighoffer schvei...@yahoo.com writes:

 On 11/18/14 9:46 PM, deadalnix wrote:

 After all, unless your hash table is very small, the first hit is
 most likely a cache miss, meaning ~300 cycles. at this point the
 cache line is in L1, with a 3 cycle access time. So accessing
 several element in the same cache line is often preferable.

 In the case of AAs, all bucket elements are pointers, so this point is moot
 for our purposes -- one may have to jump outside the cache to find the first
 element. A good improvement (this is likely to come with a full library type)
 is to instead store an inline element for each bucket entry instead of just a
 pointer to an element. I recall when writing dcollections, this added a
 significant speedup.

This works nicely for small types, but has gotchas.  For example, if
you've got an AA of ints, what value indicates that this is a value
folded into the bucket entry vs actually being a pointer?  You'll need
extra space to make sure it's safe.  Alignment is another concern.
Let's say you have bool for the entry/element flag.  With ints you've
doubled the size of the bucket array.

If you have large objects as the elements, you'll waste space.  Probably
the implementation needs to be switchable depending on the size of the object.

As an aside, I find it invaluable to have statistics about hash table
buckets, ala C++11.  When building custom hash functions and hash table
implementations, it's almost necessary.

I'd like to see bucket stats available in some manner for built-in AAs.
If they're going away for a library type, we should have such stat info
as part of the API.

Jerry


Re: AA rehash threshold

2014-11-20 Thread Jerry Quinn via Digitalmars-d
Steven Schveighoffer schvei...@yahoo.com writes:

 On 11/20/14 5:30 PM, Jerry Quinn wrote:
 This works nicely for small types, but has gotchas.  For example, if
 you've got an AA of ints, what value indicates that this is a value
 folded into the bucket entry vs actually being a pointer?  You'll need
 extra space to make sure it's safe.  Alignment is another concern.
 Let's say you have bool for the entry/element flag.  With ints you've
 doubled the size of the bucket array.

 Hm.. the bucket entry will simply be what a node is now. You are actually
 saving space, because you don't need to store that extra pointer to the first
 node.

Almost. You do need a way to indicate that the bucket is empty.  So you
need another flag though it can be smaller than a pointer (for 64 bit).  But
the overhead is low except for things like ints.

 Where it gets dicey is when you rehash. Some of the nodes will be moved into
 the bucket space, some will move out of it. But I don't think it's a big deal,
 as a rehash doesn't happen very often.

True.  Umm, that does raise a question - are addresses of AA contents
required to be stable?  C++11 requires that of its hashtables.  But it
ties your hands when trying to write customized hash tables.



Re: std.benchmark is in reviewable state

2011-09-26 Thread Jerry Quinn
Walter Bright Wrote:

 We already have std.perf:
 
   http://www.d-programming-language.org/phobos/std_perf.html
 
 A review should compare the two.

std.perf doesn't show up in the nav bar to the left.  I didn't know it existed 
until your post :-)

Jerry




Re: Multithreaded file IO?

2011-09-25 Thread Jerry Quinn
Jonathan M Davis Wrote:

 On Saturday, September 24, 2011 01:05:52 Jerry Quinn wrote:
  Jonathan M Davis Wrote:
   On Friday, September 23, 2011 23:01:17 Jerry Quinn wrote:
   
   A direct rewrite would involve using shared and synchronized (either on
   the class or a synchronized block around the code that you want to
   lock). However, the more idiomatic way to do it would be to use
   std.concurrency and have the threads pass messages to each other using
   send and receive.
  
  I'm trying the direct rewrite but having problems with shared and
  synchronized.
  
  class queue {
File file;
this(string infile) {
  file.open(infile);
}
synchronized void put(string s) {
  file.writeln(s);
}
  }
  
  queue.d(10): Error: template std.stdio.File.writeln(S...) does not match any
  function template declaration queue.d(10): Error: template
  std.stdio.File.writeln(S...) cannot deduce template function from argument
  types !()(string)
  
  Remove the synchronized and it compiles fine with 2.055.
 
 Technically, sychronized should go on the _class_ and not the function, but 
 I'm not sure if dmd is currently correctly implemented in that respect (since 
 if it is, it should actually be an error to stick synchronized on a 
 function). 
 Regardless of that, however, unless you use shared (I don't know if you are), 
 each instance of queue is going to be on its own thread. So, no mutexes will 
 be necessary, but you won't be able to have multiple threads writing to the 
 same File. It could get messy.

I get similar errors if I put synchronized on the class, or shared.  My best 
guess is that a File struct cannot currently (2.055) be shared or accessed in a 
synchronized context.

If I make the class synchronized and use __gshared on the File then it 
compiles.  I don't know if it works, though.


 You could use a synchronized block instead of synchronizing the class
 
 void put(string s)
 {
 synchronized(this)
 {
 file.writeln(s);
 }
 }
 
 and see if that works. But you still need the variable to be shared.

I'll try that.


   So, what you'd probably do is spawn 3 threads from the main thread. One
   would read the file and send the data to another thread. That second
   thread would process the data, then it would send it to the third
   thread, which would write it to disk.
  
  I think that would become messy when you have multiple processing threads. 
  The reader and writer would have to handshake with all the processors.
 
 The reader therad would be free to read at whatever rate that it can. And the 
 processing thread would be free to process at whatever rate that it can. The 
 writer thread would be free to write and whatever rate it can. I/O issues 
 would be reduced (especially if the files be read and written to are on 
 separate drives), since the reading and writing threads would be separate. I 
 don't see why it would be particularly messy. It's essentially what TDPL 
 suggests a file copy function should do except that there's a thread in the 
 middle which does so processing of the data before sending it the thread 
 doing 
 the writing. Really, this is a classic example of the sort of situation which 
 std.concurrency's message passing is intended for.

If you only have one processing thread, this is true.  However if you have, 
say, 16 processing threads,  the code becomes messy.  The reader has to 
handshake with each processor and select which processor to send the next chunk 
of input data to.  The same dance has to happen when the processors are ready 
to write, since I want all the output ordered correctly.  Broadcast (which 
isn't present in std.concurrency) wouldn't work because only one processor 
should get an input chunk.

The problem is really a single-writer, multiple-reader problem.

  std.parallelism actually looks the closest to what I want.  Not sure if I
  can make it work easily though.
 
 For std.parallelism to work, each iteration of a parallel foreach _must_ be 
 _completely_ separate from the others. They can't access the same data. So, 
 for instance, they could access separate elements in an array, but they can't 
 ever access the same element (unless none of the write to it), and something 
 like a file is not going to work at all, since then you'd be trying to write 
 to 
 the file from multiple threads at once with no synchronization whatsoever. 
 std.parallelism is for when you do the same thing many times _in parallel_ 
 with each other, and your use case does not sound like that at all. I really 
 think that std.concurrency is what you should be using if you don't want to 
 do 
 it the C/C++ way and use shared.

Well I would have used the C++ shared way if I could get it to compile :-)

I actually am doing the same thing many times in parallel.  I have a single 
input file, but each line is processed independently by a worker thread.  The 
results are output to a single file in order.  Is map() what I want, perhaps

Re: Multithreaded file IO?

2011-09-25 Thread Jerry Quinn
Lutger Blijdestijn Wrote:

 If you didn't know, the concurrency chapter of tdpl is a free chapter: 
 http://www.informit.com/articles/article.aspx?p=1609144
 
 It has an example of file copying with message passing: 
 http://www.informit.com/articles/article.aspx?p=1609144seqNum=7

What I really want is a shared fifo where the input is lines from a file, and 
many workers grab something from the fifo.  They then push their results into a 
shared reordering output queue.


Re: Why do we have transitive const, again?

2011-09-23 Thread Jerry Quinn
Adam D. Ruppe Wrote:

 IMO immutable has taken a step *backward* a few releases ago,
 and that's about all I've thought about it.
 
 immutable string a = lol;
 
 auto b = replace(a, lol, rofl); // used to work, now doesn't.

This works for me in 2.055.

Jerry



Multithreaded file IO?

2011-09-23 Thread Jerry Quinn
Hi folks,

I wasn't sure whether this should go here or in the D devel list...

I'm trying to port a program where threads read from a file, process the data, 
then write the output data.  The program is cpu-bound.  In C++ I can do 
something like this:

class QueueIn {
  ifstream in;
  mutex m;

  string get() {
string s;
m.lock();
getline(in, s);
m.unlock();
   return s;
  }
};

class QueueOut {
  ofstream out;
  mutex m;
  void put(string s) {
m.lock();
out.write(s);
m.unlock();
  }
};


In D, I'm so far having trouble figuring out the right idiom to do what I want. 
 I looked at std.parallel, but it seems a bit tricky to make my stuff work in 
this setting.  A std.stdio File cannot be part of shared class.

How would you do this with the latest D2?

Thanks
Jerry



Re: Multithreaded file IO?

2011-09-23 Thread Jerry Quinn
Jonathan M Davis Wrote:

 On Friday, September 23, 2011 23:01:17 Jerry Quinn wrote:
 
 A direct rewrite would involve using shared and synchronized (either on the 
 class or a synchronized block around the code that you want to lock). 
 However, 
 the more idiomatic way to do it would be to use std.concurrency and have the 
 threads pass messages to each other using send and receive.

I'm trying the direct rewrite but having problems with shared and synchronized.

class queue {
  File file;
  this(string infile) {
file.open(infile);
  }
  synchronized void put(string s) {
file.writeln(s);
  }
}

queue.d(10): Error: template std.stdio.File.writeln(S...) does not match any 
function template declaration
queue.d(10): Error: template std.stdio.File.writeln(S...) cannot deduce 
template function from argument types !()(string)

Remove the synchronized and it compiles fine with 2.055.


 So, what you'd probably do is spawn 3 threads from the main thread. One would 
 read the file and send the data to another thread. That second thread would 
 process the data, then it would send it to the third thread, which would 
 write 
 it to disk.

I think that would become messy when you have multiple processing threads.  The 
reader and writer would have to handshake with all the processors.


 Unfortunately, I'm not aware of any good code examples of this sort of thing 
 online. TDPL has some good examples, but obviously you'd have to have the 
 book 
 to read it. Given some time, I could probably cook up an example, but I don't 
 have anything on hand.

std.parallelism actually looks the closest to what I want.  Not sure if I can 
make it work easily though.


Support for feeding data to engine threads?

2011-09-12 Thread Jerry Quinn
I'm looking at porting an app that maintains a work queue to be processed by 
one of N engines and written out in order.  At first glance, std.parallelism 
already provides the queue, but the Task concept appears to assume that there's 
no startup cost per thread.

Am I missing something or do I need to roll a shared queue object?


Re: Support for feeding data to engine threads?

2011-09-12 Thread Jerry Quinn
Timon Gehr Wrote:

 On 09/12/2011 07:23 PM, Jerry Quinn wrote:
  I'm looking at porting an app that maintains a work queue to be processed 
  by one of N engines and written out in order.  At first glance, 
  std.parallelism already provides the queue, but the Task concept appears to 
  assume that there's no startup cost per thread.
 
  Am I missing something or do I need to roll a shared queue object?
 
 I don't know if I get you right, but std.parallelism uses a task pool. 
 Usually no threads are started or stopped during processing.

OK I guess what I'm looking for is WorkerLocalStorage.  I can create an engine 
per thread.  However, I probably need to have each thread do the initialization 
work.  If I create the engine on the main thread, it won't be properly accessed 
by the worker thread, right?  I.e. I need each thread to run engine.init() 
which will do a whole pile of loading and setup first before I can start 
feeding data to the pool.

The impression I get is that

for (int i=0; i  nthreads; i++)
  taskPool.workerLocalStorage(new engine(datafile))

will not get me what I want.








Re: Support for feeding data to engine threads?

2011-09-12 Thread Jerry Quinn
Timon Gehr Wrote:

 On 09/12/2011 08:01 PM, Jerry Quinn wrote:
  Timon Gehr Wrote:
 
  On 09/12/2011 07:23 PM, Jerry Quinn wrote:
  I'm looking at porting an app that maintains a work queue to be processed 
  by one of N engines and written out in order.  At first glance, 
  std.parallelism already provides the queue, but the Task concept appears 
  to assume that there's no startup cost per thread.
 
  Am I missing something or do I need to roll a shared queue object?
 
  I don't know if I get you right, but std.parallelism uses a task pool.
  Usually no threads are started or stopped during processing.
 
  OK I guess what I'm looking for is WorkerLocalStorage.  I can create an 
  engine per thread.  However, I probably need to have each thread do the 
  initialization work.  If I create the engine on the main thread, it won't 
  be properly accessed by the worker thread, right?  I.e. I need each thread 
  to run engine.init() which will do a whole pile of loading and setup first 
  before I can start feeding data to the pool.
 
  The impression I get is that
 
  for (int i=0; i  nthreads; i++)
 taskPool.workerLocalStorage(new engine(datafile))
 
  will not get me what I want.
 
 
 
 Calling workerLocalStorage once suffices.
 
 auto engine=taskPool.workerLocalStorage(new Engine(datafile));
 
 This will create one engine per working thread and and the same datafile.
 
 You can access the engine from each thread with engine.get. What is the 
 exact role of datafile? Does it have to be distinct for each engine?

datafile is just a set of parameters for configuring the engine, i.e location 
of data, parameter values, etc.  In this setting it would be the same for each 
engine.






Re: Support for feeding data to engine threads?

2011-09-12 Thread Jerry Quinn
dsimcha Wrote:

 == Quote from Jerry Quinn (jlqu...@optonline.net)'s article
  I'm looking at porting an app that maintains a work queue to be processed 
  by one
 of N engines and written out in order.  At first glance, std.parallelism 
 already
 provides the queue, but the Task concept appears to assume that there's no 
 startup
 cost per thread.
  Am I missing something or do I need to roll a shared queue object?
 
 Not sure if I'm misunderstanding what you're asking, but I'll answer the 
 question
 I think you're asking.  std.parallelism's Task can be executed in two ways.
 executeInNewThread() does start a new thread for every Task.  However, you can
 also submit a Task to a TaskPool and have it executed by an existing worker
 thread.  The whole point of using TaskPool to execute a Task is to avoid 
 paying
 the thread start-up cost for every Task.

The question I was asking was how to execute a huge amount of per-thread 
initialization once per thread in the TaskPool framework.



Re: Support for feeding data to engine threads?

2011-09-12 Thread Jerry Quinn
dsimcha Wrote:

 == Quote from Jerry Quinn (jlqu...@optonline.net)'s article
  Timon Gehr Wrote:
   On 09/12/2011 07:23 PM, Jerry Quinn wrote:
I'm looking at porting an app that maintains a work queue to be 
processed by
 one of N engines and written out in order.  At first glance, std.parallelism
 already provides the queue, but the Task concept appears to assume that 
 there's no
 startup cost per thread.
   
Am I missing something or do I need to roll a shared queue object?
  
   I don't know if I get you right, but std.parallelism uses a task pool.
   Usually no threads are started or stopped during processing.
  OK I guess what I'm looking for is WorkerLocalStorage.  I can create an 
  engine
 per thread.  However, I probably need to have each thread do the 
 initialization
 work.  If I create the engine on the main thread, it won't be properly 
 accessed by
 the worker thread, right?  I.e. I need each thread to run engine.init() which 
 will
 do a whole pile of loading and setup first before I can start feeding data to 
 the
 pool.
  The impression I get is that
  for (int i=0; i  nthreads; i++)
taskPool.workerLocalStorage(new engine(datafile))
  will not get me what I want.
 
 The parameter for taskPool.workerLocalStorage is lazy, so it's even easier:
 
 taskPool.workerLocalStorage(new engine(datafile));
 
 will create one new engine **per thread** because the lazy parameter is 
 passed to
 it and evaluated **once per thread**.  I'm not sure what an engine is in this
 context, though.

An engine is simply a complex class that takes a lot of memory and setup work 
to prepare.  I'm working on language processing, so we use large multiGB models.

Great, I'm getting closer.  So the evaluation happens in the worker thread, not 
the main thread, right?

The other thing I think I have to do is sequence the initializations.  
Otherwise the disk/network will thrash trying to load N copies simultaneously.  
I imagine this can be done with a shared variable inside the engine.

 
 As far as initialization, you could do something like:
 
 auto isInitialized = taskPool.workerLocalStorage(false);
 
 void doStuff() {
 if(!isInitialized.get) {
 engines.get.initialize();
 isInitialized.get = true;
 }
 
 // Do some real processing.
 }

I prefer to do all the initializations before work starts so that timing can be 
more accurate.  These things take a lot to get fully loaded and ready to run.


Thanks for all the help!  It might be useful to indicate in the workerLocalData 
call that the lazy processing happens in the worker's thread.

Jerry






Re: Support for feeding data to engine threads?

2011-09-12 Thread Jerry Quinn
dsimcha Wrote:

 == Quote from Jerry Quinn (jlqu...@optonline.net)'s article
  dsimcha Wrote:
   == Quote from Jerry Quinn (jlqu...@optonline.net)'s article
I'm looking at porting an app that maintains a work queue to be 
processed by one
   of N engines and written out in order.  At first glance, std.parallelism 
   already
   provides the queue, but the Task concept appears to assume that there's 
   no startup
   cost per thread.
Am I missing something or do I need to roll a shared queue object?
  
   Not sure if I'm misunderstanding what you're asking, but I'll answer the 
   question
   I think you're asking.  std.parallelism's Task can be executed in two 
   ways.
   executeInNewThread() does start a new thread for every Task.  However, 
   you can
   also submit a Task to a TaskPool and have it executed by an existing 
   worker
   thread.  The whole point of using TaskPool to execute a Task is to avoid 
   paying
   the thread start-up cost for every Task.
  The question I was asking was how to execute a huge amount of per-thread
 initialization once per thread in the TaskPool framework.
 
 Hmm, currently there isn't an easy/obvious way, though I'm thinking maybe one
 should be added.  I've kind of needed it, too, and I don't anticipate it being
 very hard to implement.  If you can suggest a good API for this, I'll work on 
 it
 for next release.

I'd probably naturally want to do something like:

auto engines = taskPool.init(new Engine(params));
auto lines = File(foo.txt).byLine();
auto results = taskPool.map!(engines.process)(lines) 
foreach (auto r; results)
  writeln(r);

This would create a set of new Engine objects per object, but initialized 
within each thread as thread-local data.  Each Engine object has a process() 
call taking a line as input and returning a string in this case.  Any free 
engine can process more work as long as there is still work available.

For the large-scale use case, it could take an extra argument and allow only N 
simultaneous init's under the covers.




Google +1 button for dpl.org?

2011-07-13 Thread Jerry Quinn
Hi folks,

This seemed like the right place to toss this out...

What do folks think of adding a google +1 button to d-programming-language.org? 
 I just created a google+ account and discovered it.

Jerry



Re: Question about D, garbage collection and fork()

2011-03-10 Thread Jerry Quinn
Steven Schveighoffer Wrote:
 
 Do you know what causes the OS to regard that memory as read-only?  Since  
 fork() is a C system call, and D gets its heap memory the same as any  
 other unix process (brk()), I can't see why it wouldn't work.  As long as  
 you do the same thing you do in C, I think it will work.

It's not that the OS considers the memory actually read-only.  It uses 
copy-on-write so the pages will be shared between the processes until one or 
the other attempts to write to the page.

So if the garbage collector moves things around, it will cause the pages to be 
copied and unshared.  So my question is really probably whether the garbage 
collector will tend to dirty shared pages or not.

Jerry





Question about D, garbage collection and fork()

2011-03-09 Thread Jerry Quinn
Where I work, we find it very useful to start a process, load data, then fork() 
to parallelize.  Our data is large, such that we'd run out of memory  trying to 
run a complete copy on each core.  Once the process is loaded, we don't need 
that much writable memory, so fork is appealing to share the loaded pages.  
It's possible to use mmap for some of the data, but inconvenient for other 
data, even though it's read-only at runtime.

So here's my question:  In D, if I create a lot of data in the 
garbage-collected heap that will be read-only, then fork the process, will I 
get the benefit of the operating system's copy-on-write and only use a small 
amount of additional memory per process?

In case you're wondering why I wouldn't use threading, one argument is that if 
you have a bug and the process crashes, you only lose one process instead of N 
threads.  That's actually useful for robustness.

Thoughts?




Re: Proposal for std.path replacement

2011-03-03 Thread Jerry Quinn
Lars T. Kyllingstad Wrote:

 As mentioned in the std.path.getName(): Screwy by design? thread, I 
 started working on a rewrite of std.path a long time ago, but I got 
 sidetracked by other things.  The recent discussion got me working on it 
 again, and it turned out there wasn't that much left to be done.
 
 So here it is, please comment:
 
 http://kyllingen.net/code/ltk/doc/path.html
 https://github.com/kyllingstad/ltk/blob/master/ltk/path.d

Rather than:

drive()  stripDrive()
extension()  stripExtension()

would it make sense to combine them?

string[2] splitDrive(path)
string[2] splitExtension(path)

Just a thought.

Jerry




Re: xxxInPlace or xxxCopy?

2011-01-19 Thread Jerry Quinn
Andrei Alexandrescu Wrote:

 On 1/19/11 6:53 PM, Jonathan M Davis wrote:
  On Wednesday, January 19, 2011 15:33:16 Andrei Alexandrescu wrote:
  I'm consolidating some routines from std.string into std.array. They are
  specialized for operating on arrays, and include the likes of insert,
  remove, replace.
 
  One question is whether operations should be performed in place or on a
  copy. For example:
 
 So I guess vote stays unchanged :o).
 
  Thoughts?
 
  Haven't we been using the approach that string operations generally make 
  copies
  (in many cases slices) and marking functions that do it in place with 
  InPlace?
 
 Problem is, even though the example uses strings, the functions apply to 
 all arrays.

The big difference is operating on immutable arrays vs mutable ones.  For 
immutable arrays, you  have to do copies.  But mutable ones allow in-place 
editing.  If I'm working with mutable arrays of ints, I don't want to have to 
type InPlace after every function and I *really* don't want the array to be 
copied or efficiency will go down the tubes.

Nor do I want to add Copy to every string operation.  This might be an argument 
to leave the string functions where they are.  To a certain extent, strings are 
special, even though they really aren't.

Is it too ugly to contemplate algorithms doing in-place operations on mutable 
arrays and return a copy instead for immutable ones?








Re: eliminate junk from std.string?

2011-01-11 Thread Jerry Quinn
Andrei Alexandrescu Wrote:

 On 1/9/11 4:51 PM, Andrei Alexandrescu wrote:
  There's a lot of junk in std.string that should be gone. I'm trying to
  motivate myself to port some functions to different string widths and...
  it's not worth it.
 
  What functions do you think we should remove from std.string? Let's make
  a string and then send them the way of the dino.
 
 
  Thanks,
 
  Andrei
 
 I have uploaded a preview of the changed APIs here:
 
 http://d-programming-language.org/cutting-edge/phobos/std_string.html

Unclear if iswhite() refers to ASCII whitespace or Unicode.  If Unicode, which 
version of the standard?
Same comment for icmp().  Also, in the Unicode standard, case folding can 
depend on the specific language.

There is room for ascii-only functions, but unless a D version of ICU is going 
to be done separately, it would be nice to have full unicode-aware functions 
available.

You've got chop() marked as deprecated.  Is popBack() going to make sense as 
something that removes a variable number of chars from a string in the CR-LF 
case?  That might be a bit too magical.

Rather than zfill, what about modifying ljustify, rjustify, and center to take 
an optional fill character?

One set of functions I'd like to see are startsWith() and endsWith().  I find 
them frequently useful in Java and an irritating lack in the C++ standard 
library.

Jerry











Re: eliminate junk from std.string?

2011-01-11 Thread Jerry Quinn
Jerry Quinn Wrote:

 One set of functions I'd like to see are startsWith() and endsWith().  I find 
 them frequently useful in Java and an irritating lack in the C++ standard 
 library.

Just adding that these functions are useful because they're more efficient than 
doing a find and checking that the match is in the first position.

Jerry



Re: eliminate junk from std.string?

2011-01-11 Thread Jerry Quinn
Andrei Alexandrescu Wrote:

 On 1/11/11 1:45 PM, Jerry Quinn wrote:
  Unclear if iswhite() refers to ASCII whitespace or Unicode.  If Unicode, 
  which version of the standard?
 
 Not sure.
 
 enum dchar LS = '\u2028';   /// UTF line 
 separator
 enum dchar PS = '\u2029';   /// UTF 
 paragraph separator
 
 bool iswhite(dchar c)
 {
  return c = 0x7F
  ? indexOf(whitespace, c) != -1
  : (c == PS || c == LS);
 }
 
 Which version?

This looks pretty incomplete if the goal is to return true for any unicode 
whitespace character.   My comment was really that if we're going to offer 
things like this, they need to be more completely defined.


  Same comment for icmp().  Also, in the Unicode standard, case folding can 
  depend on the specific language.
 
 That uses toUniLower. Not sure how that works.

And doesn't mention details about the Unicode standard version it implements.


  You've got chop() marked as deprecated.  Is popBack() going to make
  sense as something that removes a variable number of chars from a
  string in the CR-LF case?  That might be a bit too magical.
 
 Well I found little use for chop in e.g. Perl. People either use chomp 
 or want to remove the last character. I think chop is useless.

Agreed, chomp is more useful.  My question is whether popBack() should 
automatically act like perl chomp() for strings or not?

  One set of functions I'd like to see are startsWith() and endsWith().  I 
  find them frequently useful in Java and an irritating lack in the C++ 
  standard library.
 
 Yah, those are in std.algorithm. Ideally we'd move everything that's 
 applicable beyond strings to std.algorithm.

Ah, missed those.

Jerry



Re: eliminate junk from std.string?

2011-01-11 Thread Jerry Quinn
Jerry Quinn Wrote:


   Same comment for icmp().  Also, in the Unicode standard, case folding can 
   depend on the specific language.
  
  That uses toUniLower. Not sure how that works.
 
 And doesn't mention details about the Unicode standard version it implements.

Actually it does.  *munch* *munch* my words are delicious.

It would be good to have better docs on what icmp() does.  Also, it might make 
sense to do icmp() using unicode case folding and normalization rather than 
simple lowercase.  Thinking out loud here.



Re: [review] new string type

2010-12-03 Thread Jerry Quinn
Steven Schveighoffer Wrote:

 On Fri, 03 Dec 2010 11:15:36 -0500, spir denis.s...@gmail.com wrote:
 
  On Thu, 02 Dec 2010 16:24:03 -0500
  Steven Schveighoffer schvei...@yahoo.com wrote:
 
  Yes, it does seem odd, but then again, how often do you need the
  individual characters of a string?  I wrote php code for about 6 months  
  as
  a full time job before I found I needed to access individual characters,
  and then I had to look up how to do it :)  It's just not a common thing.
 
  Well, I wonder whether I have ever written any piece of code without  
  string indexing ;-)
  Surely depends on your app domain(s)...
 
 Indexing a string is rare, unless you are parsing something (yes it does  
 truly depend on the domain), but even rarer is indexing a string based on  
 a hard-coded value.  Really, this last case is where the type would behave  
 in a confusing way.

I tend to do a lot of transforming strings, but I need to track offsets back to 
the original text to maintain alignment between the results and the input.  For 
that, indexes are necessary and we use them a lot.

Probably the right thing to do in this case is just pay for the cost of using 
dchar everywhere, but if you're working with large enough quantities of data, 
storage efficiency matters.

Jerry



Re: [review] new string type

2010-12-03 Thread Jerry Quinn
Steven Schveighoffer Wrote:

 On Fri, 03 Dec 2010 14:40:30 -0500, Jerry Quinn jlqu...@optonline.net  
 wrote:
 
  I tend to do a lot of transforming strings, but I need to track offsets  
  back to the original text to maintain alignment between the results and  
  the input.  For that, indexes are necessary and we use them a lot.
 
 In my daily usage of strings, I generally use a string as a whole, not  
 individual characters.  But I do occasionally use it.
 
 Let's also understand that indexing is still present, what is deactivated  
 is the ability to index to arbitrary code-units.  It sounds to me like  
 this new type would not affect your ability to store offsets (you can  
 store an index, use it later when referring to the string, etc. just like  
 you can now).
 
 My string type does not allow for writeable strings.  My plan was to allow  
 you access to the underlying char[] and let you edit that way.  Letting  
 someone write a dchar into the middle a utf-8 string could cause lots of  
 problems, so I just disabled it by default.

That's reasonable.  Im not trying to create invalid strings.

I'm actually working in C++ but keeping an eye on things going on
in D-land.  The kind of stuff we do is to normalize text in preparation
for natural language processing. 

As a simple example, let's say you want to use a set of regexes to identify 
patterns
in text.  You want to return the offsets of each regex that matches.  However, 
before the regexes
run, you replace all html tags with a placeholder, so they can easily span
tags without worrying about the contents.  Before you return the results to the
user, though, you need to translate those offsets back to ones for the original
string.

Everything is unicode of course and we care about processing unicode code 
points, but want to maintain UTF-8 storage underneath to keep size down.

In reality, we're often doing things like single character normalizations as 
well as larger spans, but still need to maintain alignment to the original data.

As long as this is reasonable to do, I'm fine.  I just wasn't sure from the 
descriptions I was seeing.

Jerry




Re: FloatLiteral 1f

2010-07-28 Thread Jerry Quinn
 On 27/07/2010 21:46, Philip Trettner wrote:
  Hello.
 
  When I came across the lexical definition of a FloatLiteral I wondered
  how '1f' gets recognized.
 
  The definition says:
  FloatLiteral:
  Float
  Float Suffix
  Integer ImaginarySuffix
  Integer FloatSuffix ImaginarySuffix
  Integer RealSuffix ImaginarySuffix
 
  Float contains a dot or an exponent, so this cannot be.
  The suffixes for integers are only imaginary, so '1f' won't be
  recognized either.
 
  But the D Compiler compiles fine (as it should be).
 
  Am I missing something or is this a flaw in the FloatLiteral
  definition?
 
  kind regards, Philip

This was reported in bug 949:

http://d.puremagic.com/issues/show_bug.cgi?id=949



Re: container stuff

2010-05-25 Thread Jerry Quinn
Andrei Alexandrescu Wrote:

 On 05/25/2010 06:04 PM, Steven Schveighoffer wrote:
  On Tue, 25 May 2010 18:27:32 -0400, Andrei Alexandrescu
  seewebsiteforem...@erdani.org wrote:
 
  I've uploaded a work in progress on the container design here:
 
  http://erdani.com/d/phobos/std_container.html
 
  It's deceptively simple - the entire design is a nomenclature, really.
  Any given container may choose to implement whichever primitives it
  can, if (and only if) it can satisfy the requirements. Beyond that,
  each container can define its own primitives.
 
  There are a bunch of soft primitives. Those are meant to put a stop
  to the iterator invalidation problems experienced in the STL. The
  container implementor may alias softXyz to xyz if she knows the
  operation won't mess the ranges currently iterating the container
  (which is the case for most node-based containers). I haven't yet
  discussed subtler cases in which a range starts with a removed element
  etc., but I plan to.

softXXX might be better named safeXXX or stableXXX.  The names might be more 
suggestive of preserving iteration.

The doc isn't quite clear whether make is a member function or
standalone.  I assume it's standalone, but that's worth firming up.

  I don't like insertInstead, can we make it replace?
 
 replace was the original name. insertInstead is consistent with the 
 other two, so we have (softI|i)nsert[Before|After|Instead].

I second the request for the name replace.  Despite the consistency, I think 
replace is clear to programmers as well as being more familiar and comfortable. 
 Also, the operation really isn't insert instead, it's delete then insert 
instead.  So there is lack of symmetry because it removes elements while the 
other does not.

Another  issue I see is that opIndex and its brethren takes KeyType, but for 
something like vector, this should be size_t..  I don't think you want 
ElementType to be tuple!(size_t, ElementType) in this case.  

Related to this, do you intend removal of a single element to use remove(range) 
or removeKey?

Finally, opSlice(begin, end) is not there.  Was this intentional or overlooked?

Jerry



Is this a bug with goto?

2010-04-17 Thread Jerry Quinn
In the spec it says that it's illegal to skip an initialization using goto.  
Unless I'm mistaken, the code below does that for b, s, and c.  However, it 
compiles without complaint.

So, should the compiler be complaining, or is the text about goto really saying 
that behavior is undefined.  Obviously the latter makes it easier for the 
compiler writer :-)

void foo() {
  goto L0;
int b;
struct S { int a; this(int c) { a=c; } }
S s = S(5);
class C { int a; }
C c = new C;
 L0: ;
s.a++;
c.a++;
}

thanks,
Jerry


Re: Is this a bug with goto?

2010-04-17 Thread Jerry Quinn
Andrei Alexandrescu Wrote:
 (Hi Jerry! Glad to see you're ramping up participation lately.) Yes, 
 that's a bug. There are many other bugs related to jumps, including in 
 switch statements.

http://d.puremagic.com/issues/show_bug.cgi?id=4101

cheers,
Jerry



Re: Does D allow paradoxical code?

2010-03-27 Thread Jerry Quinn
Daniel Keep Wrote:
 
  This generally raises the question of what should be the order of 
  evaluation for constructs like mixins, conditional compilation, and CTFE.   
  Each has the potential to modify the other.
  
  What should be the rule to break the ambiguity?  Or do we leave it 
  implementation-defined?
  
  Thoughts?
  Jerry
 
 The D spec, as far as I know, says nothing on this.  Then again, the D
 spec says nothing about a lot of things it should.  Personally, I just
 ignore the spec; DMD is the only reliable reference for the language.

That's why I'm bringing this issue up.  The spec needs to become the reliable 
reference rather than the DMD implementation.  Otherwise, D will be difficult 
to use seriously because you'll never know exactly what the language really 
allows.

 Now, being as I'm supposed to go off and do my walk around the block
 now, I'll be lazy and start making educated guesses.  I don't think the
 DMD frontend is smart enough to handle this; nor do I think it should.
 
 If you were to allow this sort of thing, you'd never be able to bloody
 well compile it.  The simplest solution is to simply make canonical what
 I think DMD already does: static code [1] can be arranged in any order
 [2], but meta code [3] is processed strictly in lexical order.
 
 Indeed, version actually already does this: you're not allowed to set a
 version after you've already used it.

I agree with you that this is a bit silly and I'm not looking to have the 
compiler properly run forever.  I'm raising the issue to get the spec to be 
firmer about how these meta constructs get evaluated at compile time.

Lexical order makes sense to me.  Another variant of the problem I described is 
if you were to mixin a protection attribute where the attribute makes the 
string that builds it no longer visible.

Jerry



Does D allow paradoxical code?

2010-03-25 Thread Jerry Quinn
What should happen when the following code is compiled?

mixin(vs);
version(v1) {
  const string vs = version = v2;;
  pragma(msg, yes);
}
 else {
  const string vs = version = v1;;
  pragma(msg, no);
 }

Currently, there's an error saying that vs must be a string.  However, as far 
as I can tell, the spec says nothing about order affecting module-scope 
declarations.  So vs should be valid, and once it substitutes in the mixin, you 
have a cycle where no matter what vs is, vs should be the other value.

If you don't have the conditional compilation and put the declaration of vs 
after the mixin, it works as expected.

This generally raises the question of what should be the order of evaluation 
for constructs like mixins, conditional compilation, and CTFE.   Each has the 
potential to modify the other.

What should be the rule to break the ambiguity?  Or do we leave it 
implementation-defined?

Thoughts?
Jerry


Re: An example of Clang error messages

2010-03-04 Thread Jerry Quinn
Walter Bright Wrote:

 grauzone wrote:
  How many man years would you assume writing a full D compiler takes? 
  (Frontend starting from scratch, using an existing backend.)
 
 But there's little reason to implement a D compiler from scratch. For 
 open source uses, you can just go ahead and use dmd. For closed, 
 proprietary uses, we can work something out.

Hi, me again.  Following up on this comment, can I bring up the proposal of 
contributing a snapshot of the dmd frontend to the GCC project?  I think for D 
to be really viable in GCC, it must be a part of the source tree.

I hope I'm not being a pest.

Thanks,
Jerry



Re: Container hierarchy vs. container types

2010-03-04 Thread Jerry Quinn
Andrei Alexandrescu Wrote:

 I decided to let no assumption unchallenged and got back to the 
 question: do we really need a container _hierarchy_? How much joy and 
 usefulness do people derive from dynamically changing an array with a 
 list under the same interface? Or a rb-tree vs. a binary tree vs. a 
 left-leaning binary tree?

 As far as I can tell such changes of containers are a design-time 
 decision, and extremely rarely (never as far as I remember) a runtime 
 decision to hide behind a binary interface. I'm starting to think that 
 it's quite the other way around, and a Sapir-Whorf kind of thing: The 
 fact that Java lacks symbol aliasing capability conditions the thinking 
 that a hierarchy is the right thing to do, whereas the true solution is 
 to design a flat federation of container types with name- and 
 semantics-compatible primitives, to then use them directly or via 
 design-time aliases.

I've rarely found it helpful that an ArrayList is a List, or that HashMap is a
Map.  If an API I'm passing to takes the base class, it could easily have
been done as a template. And dynamic switching hasn't ever been useful
to me. As long as the API is the same, recompiling will work and the
containers will continue to behave well.

If dynamic switching is really necessary for an application, it can be done by 
wrapping
the container with a class hierarchy as needed.

 Obviously I might be wrong in any number of ways. Switching containers 
 dynamically might be very useful. The precision, flexibility, and beauty 
 I'm aiming for may be useless. Container hierarchies may have some other 
 awesomeness I'm failing to see. But for the time being I'm very 
 seriously considering a simple design:
 
 * Each specific container is an unconnected class (or struct, haven't 
 decided yet - probably structs to accommodate reference-counted containers);

Also, structs would reduce the indirections required to access the data, when
placing them inside other aggregates as well as reduce memory allocation
traffic.  I'd rather not pay for 2 allocations to create an instance of:

class C { vector!(int) v; }
 
 * Naming matches dictate semantic matches. There should be no false 
 friends - if a container defines removeFront, that should mean the same 
 thing as for other containers (i.e. O(1) removal of the first element).

This I would hesitate a bit on.  I recently changed STL maps for custom 
hashmaps and
not changing operator[] for different code is useful.  I'm changing the 
computational complexity
of an operation but conceptually it's the same kind of access.  I'm doing so 
because I know
that for my application, the tradeoffs elsewhere are acceptable.

 This design follows and enriches STL's design even after we have, as a 
 community, seen many hierarchy-based designs, and even though it looks 
 like hierarchies have won. The main improvements I want to bring to 
 STL's design are (a) a better categorization of containers, (b) of 
 course replace iterators with ranges, (c) either eliminate allocators or 
 make them work.
 
 Needless to say, I'd be very curious to hear other opinions in this 
 matter. Please speak up - the future of D depends, in small part, on 
 this too.

The concept of allocators is nice in principal, though I've never had to put 
them
into action.  A couple of places I could see having used STL allocators: 
wrapping
JNI memory or GC memory in an STL container.  Neither is particularly necessary
in D.  Another place I could envision an allocator is if I load a large static 
data
structure with mmap and want to wrap an STL structure around it.   I've wanted 
to
do this in particular with strings.  This is where C++ allocators fall down at 
least.
Trying to mix custom allocated data structures with standard ones is going to 
be painful. 

Jerry



Re: Making all strings UTF ranges has some risk of WTF

2010-02-04 Thread Jerry Quinn
Don Wrote:
 We seem to be approaching the point where char[], wchar[] and dchar[] 
 are all arrays of dchar, but with different levels of compression.
 It makes me wonder if the char, wchar types actually make any sense.
 If char[] is actually a UTF string, then char[] ~ char should be 
 permitted ONLY if char can be implicitly converted to dchar. Otherwise, 
 you're performing cast(char[])(cast(ubyte[])s ~ cast(ubyte)c) which will 
 not necessarily result in a valid unicode string.

Well, if you're working with a LOT of text, you may be mmapping GB's of UTF-8 
text.  Yes, this does happen.  You better be able to handle it in a sane 
manner, i.e. not reallocating the memory to read the data in.  So, there is a 
definite need for casting to array of char, and dealing with the inevitable 
stray non-unicode char in that mess.  

Real-world text processing can be a messy affair.  It probably requires walking 
such an array and returning slices cast to char after they've been validated. 



Re: D compiler as part of GCC

2010-01-24 Thread Jerry Quinn
Brad Roberts Wrote:

 On 1/23/2010 4:15 PM, Walter Bright wrote:
  Leandro Lucarella wrote:
  Walter Bright, el 23 de enero a las 12:54 me escribiste:
  Jerry Quinn wrote:
  Walter Bright Wrote:
  Will they take a fork of the dmd source, such that they own the
  copyright to the fork and Digital Mars still has copyright to
  the original?
  Hi, Walter,
 
  The answer appears to be yes:
 
  http://gcc.gnu.org/ml/gcc/2010-01/msg00430.html
 
  Jerry
 
  That's great news. I suppose I should look over the forms they talk
  about!
 
  Great news indeed! Since DMD FE is GPL I think it won't be any trouble to
  fold in the new changes back to GDC as they did (and LDC too), so it
  won't
  be really a *fork*, right?
  
  Well, still I won't be supporting gdc directly. It would mean a team
  that would be willing to take new DMD FE updates and fold them into GDC,
  and then follow whatever gcc's build and release conventions are.
 
 I don't think you got the answer you were looking for.  You got an answer to a
 different question.  If you assign the copyright over to the FSF, they then 
 own
 the code.  You'd have a license to use it as you like in return, but you would
 no longer be the owner.

 Additionally, as pointed out in the gcc@ thread, contributions coming into the
 gcc tree wouldn't have anything other than the gpl license attached to them 
 and
 that would likely make them problematic to re-distribute from your tree with 
 the
 dual gpl/artistic license.
 
 In simpler words, this is still far from straightforward.

I think you're slightly incorrect, Brad.  DigitalMars still owns the copyright 
to the original source (call it copy A).  A fork (called copy B) is donated to 
the FSF.   DigitalMars still gets to make changes to copy A and license them as 
it sees fit.  Copy B is part of the GCC codebase and would evolve separately.

Moving changes between them would require the same kind of donation process as 
the original transfer.  Folks making changes to the DMD FE would have to 
contribute those changes to FSF as well to get them into copy B and vice versa.

My apologies if that's the same as what you said.  I read your comment a couple 
of times and was a bit confused.
 
 I'd still love for there to be fewer split efforts on the compiler front, so I
 do encourage trying to find a workable solution.. but tread carefully.

The GCC java front end currently uses the Eclipse compiler to produce bytecode 
and then compiles the bytecode to native. But that's the only front end I'm 
aware of that isn't fully integrated into the GCC tree.

The D front end doesn't produce a portable intermediate representation like 
that so I think it would be harder to always use the latest DMD front end.

I see this possibility as Walter giving GCC a running start so that the D 
ecosystem has another viable compiler option available with relatively low 
effort.  

In the end, the language spec should be the thing that unifies the D community 
rather than the adhoc definition provided by a particular front end 
implementation.  It's just a matter of how to get there.

Later,
Jerry



Re: D compiler as part of GCC

2010-01-23 Thread Jerry Quinn
Walter Bright Wrote:
 Will they take a fork of the dmd source, such that they own the 
 copyright to the fork and Digital Mars still has copyright to the original?

Hi, Walter,

The answer appears to be yes:

http://gcc.gnu.org/ml/gcc/2010-01/msg00430.html

Jerry



Re: D compiler as part of GCC

2010-01-18 Thread Jerry Quinn
Eldar Insafutdinov Wrote:

 bearophile Wrote:
 
  Jerry Quinn:
   I'm interested in creating a D front end for GCC that would be part of 
   the GCC codebase.
  
  What about helping LDC devs create a good D2 implementation instead? It's 
  probably 1/5 or 1/10 of the work you think about, because lot of work is 
  already done, and surely some people will help you (me too).

One reason is that I'm already positioned to contribute code to GCC. and it is 
more difficult for me to become an LDC dev.  ANother is that GCC has very broad 
backend support.  I know LLVM backend support is expanding but it still has 
some distance to go.  GCC is also the default compiler for many Linux 
distributions, and D be part of that may help it propagate.

I also do think that the construction of another front end would provide 
positive benefit to the D community by improving the language specification and 
separating it from implementation.  Of course that's only true if this succeeds 
:-)

There's also a benefit to the GCC project in terms of improving the docs on the 
frontend interface.  That's already happened as I've tried to figure out how it 
works :-)  But that's not so relevant to the folks here.

  There's Dil, DMD, GDC, LDC, D#, etc, but one good, debugged and well 
  optimizing fully open source D2 compiler is much better than ten broken 
  and/or badly optimizing D compilers.
  
  Bye,
  bearophile
 
 I agree that having such a good intent the author of the post should better 
 concentrate his effort on helping GDC/LDC. LDC took couple of years to become 
 usable, and you have to consider that they took an existing front-end.
 
 Also what I think even when you complete this project, it is not only the 
 licensing issues that are preventing GDC from being included into GCC. They 
 will do that only if they are interested in this project, as it requires 
 maintenance. They will not update GCC-D frontend with every release of GCC 
 just because it is a part of it.

This is very true.  It would require people to be interested in continuing it's 
existence.

 Having a solid GDC implementation you can be sure that it will be included in 
 distributions (Debian had GDC for quite a long time).

In my mind, the endgame for GDC would be to have the work integrated into the 
official GCC sources.  That would provide the similar benefits to the ones I'm 
chasing.  Everyone who touched GDC would have to assign their code to the FSF 
and the DMD sources would also have to be assigned.  Perhaps that's the right 
answer in the end but I don't know.  It does seem to be a substantial effort to 
make that happen.

Jerry




Re: Tidy auto [Was: Re: @disable]

2010-01-18 Thread Jerry Quinn
bearophile Wrote:

 myself). If this is true then a syntax like:
 auto immutable x = y * 2;
 
 can be seen as too much long and boring to write all the time, so the 
 immutable keyword may need to be changed again :-) For example into val 
 (also as retard has said), so it becomes shorter (here auto is present if 
 and only if the programmer wants type inference, as I have written in other 
 posts, to clean up its semantics):
 auto val x = y * 2;

How about fixed?  It's a little longer than val, but still carries set in 
stone semantics.

auto fixed x = y * 2;



Re: one suggestion for improving the performance of gc and memroy management

2009-12-22 Thread Jerry Quinn
Michel Fortin Wrote:

 My opinion is that immutable and shared immutable should be two 
 different type modifiers. Contrary to other shared values, shared 
 immutable values can be cast to non-shared immutable (and non-shared 
 const) since they do not require memory barriers, but the reverse 
 should be forbidden (sending a non-shared immutable value to another 
 thread) because it would allow thread-local objects to escape.

You don't need the language to prevent the non-shared to shared cast as long
as the object can be moved to shared GC once it crosses the barrier.  The 
compiler
could do this when it sees the cast happening.

Jerry



Re: yank ''?

2009-12-07 Thread Jerry Quinn
Simen kjaeraas Wrote:

 On Mon, 07 Dec 2009 02:11:16 +0100, Jerry Quinn jlqu...@optonline.net  
 wrote:
  Well, I could see the value of poviding a rotate operator.
 
  Since  is tainted, what about @ and @ for integral rotation?
 
 I was thinking  and . They represent the fact that some bits end up
 on the wrong side. Still, I don't think there're enough use cases for an
 operator.

I'd argue against  and  since they'd be very easy to misread and mistype. 
 I can believe that an operator isn't necessary, but there should definitely be 
a standard way for folks to end up with single-asm instructions for rotation 
without resorting to ugliness.  Consider PowerPC that has a whole collection of 
powerful rotation instructions.




Re: yank ''?

2009-12-06 Thread Jerry Quinn
Walter Bright Wrote:

 dsimcha wrote:
  == Quote from KennyTM~ (kenn...@gmail.com)'s article
  No, it will _silently_ break code that uses  as unsigned right shift.
  
  Well, we could get around this by making  an error for a few releases, 
  and then
  only after everyone's removed their s that mean unsigned shift, we could 
  drop
  in the rotate semantics.
 
 It'll still silently break code moving from D1 to D2.

Well, I could see the value of poviding a rotate operator.

Since  is tainted, what about @ and @ for integral rotation?

Jerry



Re: Array, AA Implementations

2009-10-19 Thread Jerry Quinn
dsimcha Wrote:

 If anyone can think of any more, please let me know.  Also, just thought of 
 this
 now:  I wonder if it would make sense to use some polymorphism tricks (AA
 operations are slow enough that an extra pointer dereference or virtual 
 function
 call isn't going to make or break their performance) to allow implementations 
 to
 be automatically and dynamically changed depending on some of the attributes
 listed above.

This is not necessarily true.  If you are hammering a map of int to int, 
function call overhead will matter a great deal and you would like the compiler 
to be able to inline it.  Size will also matter for huge AA tables.

On the other hand, perhaps you're willing to say that built-in AA's will not be 
suitable for every case.

 For example, maybe binary trees are fastest for small N.  (Hypothetical, 
 haven't
 tested it.)  Obviously for sufficiently large N hash tables will be fastest.  
 We
 could create an abstract AA type that automatically switches from binary 
 trees to
 hash tables when it makes sense, make this the default builtin AA, and expose 
 both
 of the underlying implementations in std.containers for when the user wants 
 more
 control over the details rather than DWIM.


One thing I think is important is to codify the interface to the AA 
implementation, so that files built with different compilers could be linked 
together.

I'd also like to see a way  to allow the user to implement a custom AA for some 
uses and still be able take advantage of the nice syntactic sugar the language 
provides.



Follow-on question about delegates

2009-06-09 Thread Jerry Quinn
OK, having thought a bit more about delegates, I now have another question.

The ABI shows a delegate as consisting of a context ptr and a function ptr.  
The context ptr can be a class reference, ptr to struct, ptr to closure or ptr 
to stack frame.

If you're passing one of these delegates into another function, how does the 
receiving function figure out what kind of context it's looking at?  Each of 
these things will look different in memory.  Also, structs at least don't have 
a header that can be used to disambiguate the situation.

Thanks,
Jerry



question about foreach, opApply, and delegates

2009-06-08 Thread Jerry Quinn
Hi, all.  I find myself a little confused about how foreach, opApply, and 
delegates interact according to the docs.

Foreach on an aggregate will use the opApply call (assuming ranges aren't being 
used).  So if we use a simple example below, what exactly is the delegate that 
is passed to opApply?  The docs say a delegate is a pairing of an object 
reference and a function, where the object is passed as the 'this' parameter to 
the function.  But that doesn't seem to be the case here.

Is a virtual object with a function encompassing the body of the foreach being 
silently created and passed in?

Thanks,
Jerry


class C {
  uint[] a;
  int opApply(int delegate(ref uint) dg) {
int result = 0;
for (size_t i=0; i  a.length; i++) {
  result = dg(a[i]);
  if (result) break;
}
return result;
  }
}
void foo() {
  C c = new C;
  foreach (uint v; c) { writefln(v); }
}



Re: question about foreach, opApply, and delegates

2009-06-08 Thread Jerry Quinn
downs Wrote:

 Jerry Quinn wrote:
  Hi, all.  I find myself a little confused about how foreach, opApply, and 
  delegates interact according to the docs.
  
  Foreach on an aggregate will use the opApply call (assuming ranges aren't 
  being used).  So if we use a simple example below, what exactly is the 
  delegate that is passed to opApply? 
 
  The docs say a delegate is a pairing of an object reference and a function, 
  where the object is passed as the 'this' parameter to the function.  But 
  that doesn't seem to be the case here.
 
 If the docs say that, they're wrong.
 
 Generally speaking, a delegate is a pairing of a function pointer and a 
 context, where the context can be a struct pointer, a class reference *or a 
 stackframe*, as is the case with opApply.
 
 Hope that clears things up.

Yes, that helps.  All 3 replies are basically the same, and the docs are 
clearly insufficient to describe what's actually happening in a delegate.

I'll file a bug.

Thanks!




Re: const?? When and why? This is ugly!

2009-03-07 Thread jerry quinn
Walter Bright Wrote:

 When we first got into what to do with strings and 
 const/immutable/mutable, I was definitely in the camp that strings 
 should be mutable char[], or at worst const(char)[]. The thing is, 
 Andrei pointed out to me, languages that are considered very good at 
 dealing with strings (like Perl) use immutable strings. The fascinating 
 thing about strings in such languages is:
 
 Nobody notices they are immutable, they just work.
 
 So what is it about immutability that makes strings just work in a 
 natural and intuitive manner? The insight is that it enables strings, 
 which are reference types, to behave exactly as if they were value types.
 
 After all, it never occurs to anyone to think that the integer 123 could 
 be a mutable integer and perhaps be 133 the next time you look at it. 
 If you put 123 into a variable, it stays 123. It's immutable. People 
 intuitively expect strings to behave the same way. Only C programmers 
 expect that once they assign a string to a variable, that string may 
 change in place.
 
 C has it backwards by making strings mutable, and it's one of the main 
 reasons why dealing with strings in C is such a gigantic pain. But as a 
 longtime C programmer, I was so used to that I didn't notice what a pain 
 it was until I started using other languages where string manipulation 
 was a breeze.
 
 The way to do strings in D is to have them be immutable. If you are 
 building a string by manipulating its parts, start with mutable, when 
 finished then convert it to immutable and 'publish' it to the rest of 
 the program. Mutable char[] arrays should only exist as temporaries. 
 This is exactly the opposite of the way one does it in C, but if you do 
 it this way, you'll find you never need to defensively dup the string 
 just in case and things just seem to naturally work out.

So your suggestion is to do something like:

string manipulate() {
  char[] buf = read_20M_string_data();
  initialization_mangling(buf);
  return (string)buf;
}
string 20M_string = manipulate();

As long as we don't want to mangle the storage later in the program.



Re: const?? When and why? This is ugly!

2009-03-06 Thread jerry quinn
Andrei Alexandrescu Wrote:

 Steven Schveighoffer wrote:
  I think what Burton is saying is by annointing immutable(char)[] as the 
  type string, you are essentially sending a message to developers that 
  all strings should be immutable, and all *string parameters* should be 
  declared immutable.  What this does is force developers who want to deal 
  in const or mutable chars have to do lots of duplication or casting, 
  which either makes your code dog slow, or makes your code break const.
  
  Evidence is how (at least in previous releases) anything in Phobos that 
  took an argument that was a utf8 string of characters used the parameter 
  type string, making it very difficult to use when you don't have string 
  types.  If you want to find a substring in a string, it makes no sense 
  that you first have to make the argument invariant.  a substring function 
  isn't saving a pointer to that data.
  
  I think the complaint is simply that string is defined as immutable(char)
  [] and therefore is promoted as *the only* string type to use.  Using 
  other forms (such as in char[] or const(char)[] or char[]) doesn't look 
  like the argument is a string, when the word string is already taken to 
  mean something else.
  
  -Steve
 
 I see. Phobos is being changed to accept in char[] instead of string
 wherever applicable. As far as what the default string ought to be,
 immutable(char)[] is the safest of the three so I think it should be that.

So far this discussion has no examples.  Let me toss one out as a dart board to 
see what folks think:

char[] s = get_some_data();  // accesses a large buffer that i don't want to 
copy
size_t pos = my_find(s, blah);
s[pos] = c;

string s2 = another string;
size_t pos2 = my_find(s2, blah);
another_func(s2[pos2 .. pos2+4]);

size_t my_find(string haystack, string needle)
{
  // Perform some kind of search that is read-only
}

another_func(string s) {}

---

It seems like string is the obvious parameter type to use for my_find, but it 
doesn't work because we're passing mutable data in.  Instead we have to use 
const(char)[], which is less intuitive and uglier.

Or did I miss the thrust of the argument?




Re: problem with declaration grammar?

2009-02-20 Thread Jerry Quinn
Ellery Newcomer Wrote:

 jerry quinn wrote:
  Ellery Newcomer Wrote:
  Maybe I'm missing something.  The grammar shown in 
  http://www.digitalmars.com/d/2.0/declaration.html has the following rules:
 
  BasicType2:
  *
  [ ]
  [ Expression ]
  [ Expression .. Expression ]
  [ Type ]
  delegate Parameters FunctionAttributesopt
  function Parameters FunctionAttributesopt
 
  Declarator:
  BasicType2 Declarator DeclaratorSuffixesopt
  BasicType2 Identifier DeclaratorSuffixesopt
 
  With this definition, I don't see how you can get Declarator-Identifier.
 
  Jerry
 
  You are correct. BasicType2 can match nothing. It should also be able to 
  match what it does above multiple times.
  
  As I'm looking at this further, there seems to be more issues.  In 
  particular, I don't think the grammar can parse:
  
  int (*x)(char);
  
  as specified.  Doing so gives (my best attempt)
  
  Decl - BasicType Declarators ;
  BasicType - int
  Declarators - DeclaratorInitializer
  DeclaratorInitializer - Declarator
  Declarator - BasicType2 Identifier DeclaratorSuffixes
  BasicType2 - NULL (assuming that the grammar should be revised like this)
  Identifier - BAD PARSE
  
  
 
 yeah, if you haven't figured out by now, the grammar is a bunch of hooey.
 
 I spent like a month building an ANTLR grammar based on the above, and 
 then realized much of it was garbage.
 
 Then I spent two months going through the source code and rewriting most 
 of the rules. Just got done with it a week or two ago :) That was all 
 version 1, but it looks the same, so if memory serves the above rules 
 should look something like this:
 
 BasicType2_x:
   *
   [ ]
   [ Expression ]
   [ Expression .. Expression ]
   [ Type ]
   delegate Parameters FunctionAttributesopt
   function Paramters FunctionAttributesopt
 BasicType2:
   BasicType2_x
   BasicType2 BasicType2_x
   epsilon
 Declarator:
   BasicType2 Identifier DeclaratorSuffixesopt
   BasicType2 ( Declarator ) DeclaratorSuffixesopt
 
 Apologies for any BNF misuse

Cool.  Do you feel like posting the whole thing somewhere?  

As an aside comment, it might be better from a grammar perspective to make 
usage of BasicType2 optional, rather than have the epsilon in the BasicType2 
rule itself.  Then every rule would consume at least 1 token, and _opt is the 
only expansion shortcut needed.

In the form show, you can simplify the BasicType2 rule to

BasicType2:
  BasicType2 BasicType2_x
  epsilon



Re: problem with declaration grammar?

2009-02-19 Thread jerry quinn
Sergey Gromov Wrote:

 Thu, 19 Feb 2009 01:30:36 -0500, jerry quinn wrote:
 
  Christopher Wright Wrote:
  
  jerry quinn wrote:
  Hi there,
  
  I'm not sure if I'm missing something, but I'm having trouble seeing that 
  a simple declaration will parse correctly with the D grammar.
  
  If we take a declaration statment like:
  
  int x = 3;
  
  we have (my best guess):
  
  DeclarationStatement - Declaration
  Declaration - Decl
  Decl - BasicType Declarators ;
  BasicType - int
  Declarators - DeclaratorInitializer
  DeclaratorInitializer - Declarator = Initializer
  Declarator - BasicType2 Identifier
  BasicType2 - 
  
  I'm thinking that BasicType2 is optional here, rather than required as 
  the grammar shows.  Is that correct?
  
  Thanks
  Jerry
  
  . Declaration - Decl
  . Decl - BasicType Declarators
  . BasicType - int
  . Declarators - DeclaratorInitializer
  . DeclaratorInitializer - Declarator = Initializer
  We agree up to here.
  
  . Declarator - Identifier
  Here, you don't need BasicType2, and if you use it, you recurse, so 
  using the rule Declarator - BasicType2 Declarator here is useless.
  
  What you describe sounds like what I'd expect.
  
  Maybe I'm missing something.  The grammar shown in 
  http://www.digitalmars.com/d/2.0/declaration.html has the following rules:
  
  BasicType2:
  *
  [ ]
  [ Expression ]
  [ Expression .. Expression ]
  [ Type ]
  delegate Parameters FunctionAttributesopt
  function Parameters FunctionAttributesopt
  
  Declarator:
  BasicType2 Declarator DeclaratorSuffixesopt
  BasicType2 Identifier DeclaratorSuffixesopt
  
  With this definition, I don't see how you can get Declarator-Identifier.
  
  Jerry
 
 The grammar works the other way around:
 
 int x = 3 ;
 
 int - BasicType(int)
 // this is either Decl or Type, need more tokens, expect Declarators,
 // Declarator, or Declarator2
 -
 x - Identifier(x)
 // either DeclaratorInitializer (Declarators), Declarator,
 // IdentifierList (not expecting), StructMemberInitializer (not
 // expecting), or PrimaryExpression (not expecting)
 // therefore expecting '=' or DeclaratorSuffixes
 -
 = - = // token
 // Identifier(x) = - definitely DeclaratorInitializer, expecting
 // Initializer, that is , either void, AssignExpression,
 // ArrayInitializer, or StructInitializer

This is incorrect.  We have

 BasicType(int) Identifier(x) '= '

You're suggesting the Identifier begins DeclaratorInitializer, but it must 
start with a Declarator.  We don't have one, because Declarator must start with 
BasicType2.  This is where I think the bug in the grammar is.  If BasicType2 
were optional, then the parse would complete as you showed.

 Finita la comedia.

Unfortunately, not yet fini




Re: problem with declaration grammar?

2009-02-19 Thread jerry quinn
Ellery Newcomer Wrote:
  Maybe I'm missing something.  The grammar shown in 
  http://www.digitalmars.com/d/2.0/declaration.html has the following rules:
  
  BasicType2:
  *
  [ ]
  [ Expression ]
  [ Expression .. Expression ]
  [ Type ]
  delegate Parameters FunctionAttributesopt
  function Parameters FunctionAttributesopt
  
  Declarator:
  BasicType2 Declarator DeclaratorSuffixesopt
  BasicType2 Identifier DeclaratorSuffixesopt
  
  With this definition, I don't see how you can get Declarator-Identifier.
  
  Jerry
  
 
 You are correct. BasicType2 can match nothing. It should also be able to 
 match what it does above multiple times.

As I'm looking at this further, there seems to be more issues.  In particular, 
I don't think the grammar can parse:

int (*x)(char);

as specified.  Doing so gives (my best attempt)

Decl - BasicType Declarators ;
BasicType - int
Declarators - DeclaratorInitializer
DeclaratorInitializer - Declarator
Declarator - BasicType2 Identifier DeclaratorSuffixes
BasicType2 - NULL (assuming that the grammar should be revised like this)
Identifier - BAD PARSE




problem with declaration grammar?

2009-02-18 Thread jerry quinn
Hi there,

I'm not sure if I'm missing something, but I'm having trouble seeing that a 
simple declaration will parse correctly with the D grammar.

If we take a declaration statment like:

int x = 3;

we have (my best guess):

DeclarationStatement - Declaration
Declaration - Decl
Decl - BasicType Declarators ;
BasicType - int
Declarators - DeclaratorInitializer
DeclaratorInitializer - Declarator = Initializer
Declarator - BasicType2 Identifier
BasicType2 - 

I'm thinking that BasicType2 is optional here, rather than required as the 
grammar shows.  Is that correct?

Thanks
Jerry




Re: problem with declaration grammar?

2009-02-18 Thread jerry quinn
Christopher Wright Wrote:

 jerry quinn wrote:
  Hi there,
  
  I'm not sure if I'm missing something, but I'm having trouble seeing that a 
  simple declaration will parse correctly with the D grammar.
  
  If we take a declaration statment like:
  
  int x = 3;
  
  we have (my best guess):
  
  DeclarationStatement - Declaration
  Declaration - Decl
  Decl - BasicType Declarators ;
  BasicType - int
  Declarators - DeclaratorInitializer
  DeclaratorInitializer - Declarator = Initializer
  Declarator - BasicType2 Identifier
  BasicType2 - 
  
  I'm thinking that BasicType2 is optional here, rather than required as the 
  grammar shows.  Is that correct?
  
  Thanks
  Jerry
 
 . Declaration - Decl
 . Decl - BasicType Declarators
 . BasicType - int
 . Declarators - DeclaratorInitializer
 . DeclaratorInitializer - Declarator = Initializer
 We agree up to here.
 
 . Declarator - Identifier
 Here, you don't need BasicType2, and if you use it, you recurse, so 
 using the rule Declarator - BasicType2 Declarator here is useless.

What you describe sounds like what I'd expect.

Maybe I'm missing something.  The grammar shown in 
http://www.digitalmars.com/d/2.0/declaration.html has the following rules:

BasicType2:
*
[ ]
[ Expression ]
[ Expression .. Expression ]
[ Type ]
delegate Parameters FunctionAttributesopt
function Parameters FunctionAttributesopt

Declarator:
BasicType2 Declarator DeclaratorSuffixesopt
BasicType2 Identifier DeclaratorSuffixesopt

With this definition, I don't see how you can get Declarator-Identifier.

Jerry



Re: ref?

2009-02-15 Thread Jerry Quinn
dsimcha Wrote:

 == Quote from Andrei Alexandrescu (seewebsiteforem...@erdani.org)'s article
  Walter and I have been discussing what the regime of statically-sized
  arrays should be. In short, the behavior that's most consistent with
  everything else is to treat them as values.
  This also brings the problems of e.g. containers - should they have
  consistent value semantics (like in STL)
 
 Oh God no!  I always thought the value semantics of STL containers were 
 largely a
 kludge to work around the fact that C++ doesn't have garbage collection.  They
 make it easier to use RAII for memory management, since every object has a 
 clear
 owner.  This leads to tons and tons of copying if code is written the obvious 
 way,
 and lots of kludges to prevent it.

On the other hand, for relatively small objects, the last thing I want is to 
have everything be a reference.  This is one of the major flaws of Java, I 
think, from a memory efficiency standpoint.  I'd hate to have to have 
references to 16 byte objects if I don't need the sharing behavior.

Having classes with reference semantics in D largely solves this issue, doesn't 
it?  Use a class when you have things that make sense to share, and use structs 
for when it makes more sense to have distinct copies.

Jerry



Re: ref?

2009-02-15 Thread Jerry Quinn
Nick Sabalausky Wrote:

 Jerry Quinn jlqu...@optonline.net wrote in message 
 news:gn9i7n$2uj...@digitalmars.com...
 
  Use a class when you have things that make sense to share, and use structs 
  for when it makes more sense to have distinct copies.
 
 
 Couldn't that rule conflict with cases where you'd want distinct copies but 
 need features of classes that aren't available for structs (like 
 inheritance)? 

However, with reference semantics, you have no way to achieve objects laid out 
in a contiguous array, unless I'm missing something.

If you want inheritance, copy semantics are an issue.  For example, if you have 
struct A : B, and an array of B, you can't put an A in it, since there's not 
enough space for an A (unless A adds 0 storage).  If you have an array of A, 
inheritance isn't really buying you anything over having a B as the first 
member of A.  Any place where you'd want to use A as a B, you can get to the 
member B struct directly.





Re: random cover of a range

2009-02-14 Thread Jerry Quinn
Andrei Alexandrescu Wrote:

 Bill Baxter wrote:
  On Sat, Feb 14, 2009 at 1:03 PM, Andrei Alexandrescu 
  seewebsiteforem...@erdani.org mailto:seewebsiteforem...@erdani.org 
  wrote:
  
  bearophile wrote:
  
  Andrei Alexandrescu:
  
  Say at some point there are k available (not taken) slots out of
  n. There is a k/n chance that a random selection finds an
  unoccupied slot. The average number of random trials needed
  to find
  an unoccupied slot is proportional to 1/(k/n) = n/k. So the
  total
  number of random trials to span the entire array is quadratic.
  Multiplying that by 0.9 leaves it quadratic.
  
  
  It's like in hashing: if you want to fill 99% of the available space
  in a hash, then you take ages to find empty slots. But if you
  fill it
  only at 75-90%, then on average you need only one or two tries to
  find an empty slot. So your time is linear, with a small
  multiplicative constant. When the slots start to get mostly
  full, you
  change algorithm, copying the empty slots elsewhere.
  
  
  Well I don't buy it. If you make a point, you need to be more
  precise than such hand-waving. It's not like in hashing. It's like
  in the algorithm we discuss. If you make a clear point that your
  performance is better than O(n*n) by stopping at 90% then make it. I
  didn't go through much formalism, but my napkin says you're firmly
  in quadratic territory.
  
   
  Well he has a point that the number of trials required to find an empty 
  depends not on the absolute number of empty items, but only the ratio of 
  empties to fulls.   Even your own claim about average number of trials 
  was n/k -- not sure how you got that though.
 
 If you toss a N-side dice hoping for a specific face to show up (and 
 stopping afterwards), how many times do you have to toss it on average? 
 I recall (without being sure) that you need to toss it a number of times 
 proportional to N. Could anyone confirm or deny?

avg_trials(N) = sum(t = 1-inf) { 1/N * ((N-1)/N)^(t-1) }

where t is the number of trials.  

Experimentally, it converges to N





Re: Optimizing Immutable and Purity

2008-12-22 Thread Jerry Quinn
Walter Bright Wrote:

 I've been working on improving the optimizer to take advantage of 
 immutability and purity.
 
 http://www.reddit.com/r/reddit.com/comments/7l5x4/optimizing_immutable_and_purity/
 
 http://dobbscodetalk.com/index.php?option=com_myblogshow=Optimizing-Immutable-and-Purity.htmlItemid=29

This was an interesting read.  It would be nice to see a discussion of how 
const is going to fit in in terms of optimization potential, since you can 
always cast it away.

Jerry



Re: Initializing const member post-construction?

2008-10-28 Thread Jerry Quinn
Steven Schveighoffer Wrote:

 Jerry Quinn wrote
  Hi.  I'm trying to port a C++ program to D as an exercise in exploring D. 
  As I'm doing this, I've run into a bit of confusion with the const system.
 
  I have something like
 
  class A {}
  class B {
   const A a;
   void init(A aa) { a = aa; }
  }
 
  This doesn't work, because dmd (2.020) complains that you can't initialize 
  a const member after the constructor.  The catch is that the value of aa 
  is not available at construction time, but only later on.  However, I'd 
  still like to declare that once set, the object referred to by a is const.
 
  The C++ code used a pointer, but it seemed to me like D's references were 
  more capable than C++'s, so I'm trying to use them.
 
  To me it seems like this should still be allowed.  Even though the object 
  referred to by a is const, the reference itself shouldn't need to be. 
  This seems morally equivalent to:
 
  const(A)* a;
 
  which is allowed by dmd.  In both cases I'm trying to tell the compiler 
  that the object referred to by a is const.
 
  Is there a way to do what I'm trying to do?
 
 You can hide a behind a property:
 
 class B {
   private A aPriv;
   void init(A aa)
   {
  aPriv = aa;
   }
 
   const(A) a() const { return aPriv;}
 }
 
 Now, do not use aPriv anywhere else in your code, and you should be all set. 
 Use -inline when compiling and you should see no performance penalty.

Yes, that seems to be a reasonable workaround.  Thanks.

  What's the reason for not allowing this?
 
 I was unaware you could even set a in the constructor.  I don't think 
 there's any general 'set once' type modifier.

No, I didn't see one either.  In fact what I was asking for is not a set once.  
It's to allow a reference to a const object to be reassigned, since it wasn't 
obvious to me that the reference itself should be kept const.

Jerry



Initializing const member post-construction?

2008-10-27 Thread Jerry Quinn
Hi.  I'm trying to port a C++ program to D as an exercise in exploring D.  As 
I'm doing this, I've run into a bit of confusion with the const system.

I have something like

class A {}
class B {
  const A a;
  void init(A aa) { a = aa; }
}

This doesn't work, because dmd (2.020) complains that you can't initialize a 
const member after the constructor.  The catch is that the value of aa is not 
available at construction time, but only later on.  However, I'd still like to 
declare that once set, the object referred to by a is const.

The C++ code used a pointer, but it seemed to me like D's references were more 
capable than C++'s, so I'm trying to use them.

To me it seems like this should still be allowed.  Even though the object 
referred to by a is const, the reference itself shouldn't need to be.  This 
seems morally equivalent to:

const(A)* a;

which is allowed by dmd.  In both cases I'm trying to tell the compiler that 
the object referred to by a is const.

Is there a way to do what I'm trying to do?  What's the reason for not allowing 
this?

Thanks,
Jerry