Location of popFront (was: randomSample)
Andrei, I noticed in random.d, uniform template, that popFront is called in different locations for integral compared to floating point types: for integral types you .front first and .popFront afterwards, but for floating point types you start with .popFront and then check .front. This has a peculiar effect: for example, if you do uniform(0.0,100.0) followed by uniform(0,100) there's a big chance that the integral part of the first random number is equal to the second random number. import std.stdio, std.random; void main() { writeln(uniform(0.0,100.0)); writeln(uniform(0,100)); } I don't think this warrants a bug report, but I do think the location of .popFront should be standardized, either before or after any .front. Just sayin'. L.
Re: Can you find out where the code goes wrong?
"davidl" wrote in message news:op.uugvg5ahj5j...@my-tomato... [snip] The culprit is the on stack array. Should the compiler warn on slicing on a fixed length array? or even give an error? I find this use case can easily go wrong! You may even think this code is correct at the very first glance. Definately a bug. You should file it to bugzilla. When returning the original stack-allocated array the compiler correctly complains: test.d(25): Error: escaping reference to local v but as soon as you slice it, even "v[]", it is no longer detected. Good catch! L.
Can you find out where the code goes wrong?
import std.stdio; string func() { string s="abc"; return s; } void func1() { writefln("func1"); string v = func(); writefln("call func"); writefln(func2()); } byte[] func2() { writefln("hello!"); byte[16] v= [65,65,65,65, 65,65,65,65, 65,65,65,65, 65,65,65,65]; writefln(v[0..16]); return v[0..16]; } void main(string[] args) { func1(); } The culprit is the on stack array. Should the compiler warn on slicing on a fixed length array? or even give an error? I find this use case can easily go wrong! You may even think this code is correct at the very first glance.
Re: how to use GC as a leak detector? i.e. get some help info from GC?
> As other asked, are you using D1 Tango/Phobos? D2? In Tango/D2 you can DMD v2.030 on Linux. > enable logging in the GC (using the LOGGING version identifier). How to do it in D2? > Is your program source available? I'm gathering programs to make a D GC Sorry, no.
Re: how to use GC as a leak detector? i.e. get some help info from GC?
== Quote from Brad Roberts (bra...@puremagic.com)'s article > After enabling the gc, did you force a collection? Just enabling it won't > cause > one to occur. Yes, I called: core.memory.GC.enable(); core.memory.GC.collect(); core.memory.GC.disable();
Re: how to use GC as a leak detector? i.e. get some help info from GC?
> > I suspected the GC is buggy when mixed with manual deletes. > I personally have not experienced this. Please be more specific: > D1 or D2? D2. > If D1, Phobos or Tango? > DMD, LDC, or GDC? DMD v2.030 > Compiler version? > Also, please file a bug report, especially if you can create a concise, > reproducible test case. It's hard to isolate the code, and since the program is non-trivial I'm not 100% sure, as it could be my bug.
Re: how to use GC as a leak detector? i.e. get some help info from GC?
nobody, el 24 de mayo a las 20:03 me escribiste: > == Quote from Jason House (jason.james.ho...@gmail.com)'s article > > Why not use valgrind? With the GC disabled, it should give accurate results. > > Strange enough, indeed I have tried valgrind with the GC disabled version. It > didn't report anything useful. > > That's why I'm puzzled, does D's GC do something special? > > The GC disabled version run out of 3G memory; but the GC enabled version > stays at > ~800M throughout the run. I guess that with such amount of memory used, your program can greatly benefit from using NO_SCAN if your 800M of data are plain old data. Did you tried it? And if you never have interior pointers to that data, your program can possibly avoid a lot of false positives due to the conservativism if you use NO_INTERIOR (this is only available if you patch the GC with David Simcha's patch[1]). [1] http://d.puremagic.com/issues/show_bug.cgi?id=2927 -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) This is what you get, when you mess with us.
Re: how to use GC as a leak detector? i.e. get some help info from GC?
nobody, el 24 de mayo a las 19:05 me escribiste: > Hi, > > I'm writing a data processing program in D, which deals with large amounts of > small objects. One of the thing I found is that D's GC is horribly slow in > such situation. I tried my program with gc enable & disabled (with some manual > deletes). The GC disabled version (2 min) is ~100 times faster than the GC > enabled version (4 hours)! > > But of course the GC disabled version still leak memory, it soon exceeds the > machine memory limit when I try to process more data; while the GC enabled > version don't have such problem. > > So my plan is to use the GC disabled version with manual deletes. But it was > very hard to find all the memory leaks. I'm wondering: is there anyway to use > GC as a leak detector? can the GC enabled version give me some help > information on which objects get collected, so I can manually delete them in > my GC disabled version? Thanks! As other asked, are you using D1 Tango/Phobos? D2? In Tango/D2 you can enable logging in the GC (using the LOGGING version identifier). Is your program source available? I'm gathering programs to make a D GC benchmark suite an your programs seems like a good candidate for measuring the GC performance. Thank you. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) Que importante, entonces en estos días de globalización refregar nuestras almas, pasarle el lampazo a nuestros corazones para alcanzar un verdadero estado de babia peperianal. -- Peperino Pómoro
Re: how to use GC as a leak detector? i.e. get some help info from GC?
nobody wrote: >> One thing you could try is disabling the GC (this really just disables >> automatic >> running of the collector) and run it manually at points that you know make >> sense. >> For example, you could just insert a GC.collect() statement at the end of >> every >> run of your main loop. >> Another thing to try is avoiding appending to arrays. If you know the >> length in >> advance, you can get pretty good speedups by pre-allocating the array >> instead of >> appending using the ~= operator. >> You can safely delete specific objects manually even when the GC is enabled. >> For >> very large objects with trivial lifetimes, this is probably worth doing. >> First of >> all, the GC will run less frequently. Secondly, D's GC is partially >> conservative, >> meaning that occasionally memory will not be freed when it should be. The >> probability of this happening is proportional to the size of the memory >> block. > > I have tried all these: with GC enabled only periodically runs in the main > loop, > however the memory still grows faster than I expected when I feed more data > into > the program. Then I manually delete some specific objects. However the program > start to fail randomly. > > Has anyone experienced similar issues: i.e. with GC on, you defined you own > dtor > for certain class, and called delete manually on certain objects. > > The program fails at random stages, with some stack trace showing some GC > calls like: > > 0x0821977a in _D2gc3gcx3Gcx16fullcollectshellMFZk () > > I suspected the GC is buggy when mixed with manual deletes. After enabling the gc, did you force a collection? Just enabling it won't cause one to occur. Later, Brad
Re: how to use GC as a leak detector? i.e. get some help info from GC?
== Quote from nobody (n...@where.com)'s article > > One thing you could try is disabling the GC (this really just disables > > automatic > > running of the collector) and run it manually at points that you know make > > sense. > > For example, you could just insert a GC.collect() statement at the end of > > every > > run of your main loop. > > Another thing to try is avoiding appending to arrays. If you know the > > length in > > advance, you can get pretty good speedups by pre-allocating the array > > instead of > > appending using the ~= operator. > > You can safely delete specific objects manually even when the GC is > > enabled. For > > very large objects with trivial lifetimes, this is probably worth doing. > > First of > > all, the GC will run less frequently. Secondly, D's GC is partially > > conservative, > > meaning that occasionally memory will not be freed when it should be. The > > probability of this happening is proportional to the size of the memory > > block. > I have tried all these: with GC enabled only periodically runs in the main > loop, > however the memory still grows faster than I expected when I feed more data > into > the program. Then I manually delete some specific objects. However the program > start to fail randomly. > Has anyone experienced similar issues: i.e. with GC on, you defined you own > dtor > for certain class, and called delete manually on certain objects. > The program fails at random stages, with some stack trace showing some GC > calls like: > 0x0821977a in _D2gc3gcx3Gcx16fullcollectshellMFZk () > I suspected the GC is buggy when mixed with manual deletes. I personally have not experienced this. Please be more specific: D1 or D2? If D1, Phobos or Tango? DMD, LDC, or GDC? Compiler version? Also, please file a bug report, especially if you can create a concise, reproducible test case.
Re: how to use GC as a leak detector? i.e. get some help info from GC?
> One thing you could try is disabling the GC (this really just disables > automatic > running of the collector) and run it manually at points that you know make > sense. > For example, you could just insert a GC.collect() statement at the end of > every > run of your main loop. > Another thing to try is avoiding appending to arrays. If you know the length > in > advance, you can get pretty good speedups by pre-allocating the array instead > of > appending using the ~= operator. > You can safely delete specific objects manually even when the GC is enabled. > For > very large objects with trivial lifetimes, this is probably worth doing. > First of > all, the GC will run less frequently. Secondly, D's GC is partially > conservative, > meaning that occasionally memory will not be freed when it should be. The > probability of this happening is proportional to the size of the memory block. I have tried all these: with GC enabled only periodically runs in the main loop, however the memory still grows faster than I expected when I feed more data into the program. Then I manually delete some specific objects. However the program start to fail randomly. Has anyone experienced similar issues: i.e. with GC on, you defined you own dtor for certain class, and called delete manually on certain objects. The program fails at random stages, with some stack trace showing some GC calls like: 0x0821977a in _D2gc3gcx3Gcx16fullcollectshellMFZk () I suspected the GC is buggy when mixed with manual deletes.
Re: XML API
Michel Fortin wrote: > On 2009-05-24 12:51:43 -0400, Daniel Keep > said: (Cutting us mostly going back-and-forth on what a callback api would look like. >> ... >> >> Like I said, this seems like a lot of work to bolt a callback interface >> onto something a pull api is designed for. >> >> ... >> >> Except of course that you now can't easily control the loop, nor can do >> you do fall-through on the cases. > > Again, my definition of a callback API doesn't include an implicit loop, > just a callback. And I intend the callback to be a template argument so > it can be dispatched using function overloading and/or function > templates. So you'll have this instead: > > bool continue = true; > do > continue = pp.readNext!(callback)(); > while (continue); > > void callback(OpenElementToken t) { blah(t.name); } > void callback(CloseElementToken t) { ... } > void callback(CharacterDataToken t) { ... } > ... > > No switch statement and no inversion of control. Except that you can't define overloads of a function inside a function. Which means you have to stuff all of your code in a set of increasingly obtusely-named globals or private members. Like elemAStart, elemAData, elemAAttr, elemAClose, elemBStart, elemBData, elemBAttr, ... One problem I see here is that you're going to spaghettify the code and state. For example, let's say I'm writing code to handle a particular element. I can't put the code and state for this into a single function, I have to break it out over several. One function for each event. This means I need to have all state variables visible from each function. So I have to start shoving the state into the owning object instead of on the stack. Whoops, I can't recurse now, can I? Sucks if I'm using any sort of hierarchical structure. I can't use the call stack, so I have to invent my own. I don't want to make every state variable a stack, so I put each component of the parser into a separate object which I can instantiate and kick off. And at that point, I've just reinvented SAX. Well, almost. I have control over the loop. I still can't simply break out of it; I've got to mess around with flags to get that done. Meanwhile, if I write that code with a PullParser, it's just a collection of normal functions, one per element type with all the related code together in one place. Or, if I don't want them all bundled together, I can dispatch to smaller functions. I have a feeling you're going to head down this path irrespective, so I'll just hope you can figure out a way to make the api not suck.
Re: how to use GC as a leak detector? i.e. get some help info from GC?
== Quote from nobody (n...@where.com)'s article > Hi, > I'm writing a data processing program in D, which deals with large amounts of > small objects. One of the thing I found is that D's GC is horribly slow in > such situation. I tried my program with gc enable & disabled (with some manual > deletes). The GC disabled version (2 min) is ~100 times faster than the GC > enabled version (4 hours)! > But of course the GC disabled version still leak memory, it soon exceeds the > machine memory limit when I try to process more data; while the GC enabled > version don't have such problem. > So my plan is to use the GC disabled version with manual deletes. But it was > very hard to find all the memory leaks. I'm wondering: is there anyway to use > GC as a leak detector? can the GC enabled version give me some help > information on which objects get collected, so I can manually delete them in > my GC disabled version? Thanks! I've dealt with a bunch of somewhat similar situations in code I've written, here are some tips that others have not already mentioned, and that might be less drastic than going with fully manual memory management: One thing you could try is disabling the GC (this really just disables automatic running of the collector) and run it manually at points that you know make sense. For example, you could just insert a GC.collect() statement at the end of every run of your main loop. Another thing to try is avoiding appending to arrays. If you know the length in advance, you can get pretty good speedups by pre-allocating the array instead of appending using the ~= operator. You can safely delete specific objects manually even when the GC is enabled. For very large objects with trivial lifetimes, this is probably worth doing. First of all, the GC will run less frequently. Secondly, D's GC is partially conservative, meaning that occasionally memory will not be freed when it should be. The probability of this happening is proportional to the size of the memory block. Lastly, I've been working on a generic second stack/mark-release allocator for D2, called TempAlloc. It's useful for when you need to temporarily allocate memory in a last in, first out order, but you can't use the call stack for whatever reason. I've also implemented a few basic data structures (hash tables and hash sets) that are specifically designed for this allocator. Right now, it's coevolving with my dstats statistics lib, but if you want to try it or at least look at it and give me some feedback, I'd like to eventually get it to the point where it can be added to Phobos and/or Tango. See http://svn.dsource.org/projects/dstats/docs/alloc.html .
Re: !in operator?
Jason House wrote: Method 1: if (x !in y) foo(); else{ auto z = x in y; bar(z); } Method 2: auto z = x in y; if (z is null) foo; else bar(z); Method 1 essentially calls in twice while method 2 calls in once. But there's no requirement to look it up after finding out whether it's there or not. And how's it any different from if (x in y) { auto z = x in y; bar(z); } else { foo(); } or even if (x in y) { bar(y[x]); } else { foo(); } ? Besides, why would any decent compiler not optimise it to a single lookup? Stewart.
Re: Problem with .deb packages
Bruno Deligny Wrote: > Jesse Phillips a écrit : > > On Sat, 02 May 2009 14:57:43 +0200, Bruno Deligny wrote: > > > >> When i try to install dmd1 or dmd2 on my ubuntu i386 with the deb > >> packages on http://www.digitalmars.com/d/download.html, it says "Error : > >> incorrect Architecture « amd64 »" > >> > >> The packages were built for the amd64 architecture. > > > > I don't know how the packages were built for amd64, there only i386 > > packages. You have to provide dpkg the --force-architecture switch. > > dpkg --force-architecture -i ...deb > > The packages are still broken. I dont know who did it but we can't let > that on the website. > > It's hard to persuade people to use D if packages are broken and there > isn't Windows installer. I think a lot of people dont even try by seeing > that. Would this be worthy of a bugzilla report? I encountered this too when I tried to install DMD using the .deb packages.
Re: how to use GC as a leak detector? i.e. get some help info from GC?
nobody wrote: == Quote from Jason House (jason.james.ho...@gmail.com)'s article Why not use valgrind? With the GC disabled, it should give accurate results. Strange enough, indeed I have tried valgrind with the GC disabled version. It didn't report anything useful. That's why I'm puzzled, does D's GC do something special? The GC allocates memory directly from the OS, it doesn't use malloc/free and friends. It does this even when the GC is "disabled", which just means the collections won't happen. (Disabling the GC doesn't change the method of allocation) Valgrind probably doesn't detect those OS calls (and almost certainly doesn't know about the GC calls). If you're using Tango, you can link to the 'stub' GC instead of the normal ('basic') one. The stub GC doesn't actually collect, it passes calls on to malloc/calloc/realloc/free instead. That should make Valgrind work. (something similar probably applies if you're using D2 with druntime)
Re: how to use GC as a leak detector? i.e. get some help info from GC?
== Quote from Unknown W. Brackets (unkn...@simplemachines.org)'s article > Theoretically, you could recompile the GC to write to a log file any > time it frees anything. Is it possible to recompile Phobos to let the GC write to a log whenever it frees? I guess I also need the type info of the object being freed.
Re: how to use GC as a leak detector? i.e. get some help info from GC?
"nobody" wrote in message news:gvc5q7$2bc...@digitalmars.com... > Hi, > > I'm writing a data processing program in D, which deals with large amounts > of > small objects. One of the thing I found is that D's GC is horribly slow in > such situation. I tried my program with gc enable & disabled (with some > manual > deletes). The GC disabled version (2 min) is ~100 times faster than the GC > enabled version (4 hours)! > > But of course the GC disabled version still leak memory, it soon exceeds > the > machine memory limit when I try to process more data; while the GC enabled > version don't have such problem. > > So my plan is to use the GC disabled version with manual deletes. But it > was > very hard to find all the memory leaks. I'm wondering: is there anyway to > use > GC as a leak detector? can the GC enabled version give me some help > information on which objects get collected, so I can manually delete them > in > my GC disabled version? Thanks! > Depending how exactly your program is working, another common thing that might help is to manually manage free pools. Ie, allocate a bunch up-front, and instead of letting one get GCed when done with it, hold on to it, make note of it being available for re-use, and then reuse it instead of allocating a new one. Or, allocate one big chuck of memory and stick your small objects in there. They typically do this sort of thing for particle systems.
Re: how to use GC as a leak detector? i.e. get some help info from GC?
== Quote from Jason House (jason.james.ho...@gmail.com)'s article > Why not use valgrind? With the GC disabled, it should give accurate results. Strange enough, indeed I have tried valgrind with the GC disabled version. It didn't report anything useful. That's why I'm puzzled, does D's GC do something special? The GC disabled version run out of 3G memory; but the GC enabled version stays at ~800M throughout the run.
Re: !in operator?
Jason House: > Method 1 essentially calls in twice while method 2 calls in once. Sometimes I just want to know if something isn't present. Having !in doesn't prevent me from writing and using x = y in aa; when I want it. > PS: Please don't assume that I'm advocating not having a !in operator. I'm > just pointing out possible reasons it may have been avoided.< I think that's not a possible reason :-) Bye, bearophile
Re: how to use GC as a leak detector? i.e. get some help info from GC?
Theoretically, you could recompile the GC to write to a log file any time it frees anything. For data processing, though, you really want to try to have a fixed memory buffer. You've got to be hurting from the allocations and frees, which if at all possible you should get rid of. Also, if you're allocating buffers of memory (e.g. for the data), you can tell the GC not to scan them. This will probably solve the problem of the GC being so slow. -[Unknown] nobody wrote: Hi, I'm writing a data processing program in D, which deals with large amounts of small objects. One of the thing I found is that D's GC is horribly slow in such situation. I tried my program with gc enable & disabled (with some manual deletes). The GC disabled version (2 min) is ~100 times faster than the GC enabled version (4 hours)! But of course the GC disabled version still leak memory, it soon exceeds the machine memory limit when I try to process more data; while the GC enabled version don't have such problem. So my plan is to use the GC disabled version with manual deletes. But it was very hard to find all the memory leaks. I'm wondering: is there anyway to use GC as a leak detector? can the GC enabled version give me some help information on which objects get collected, so I can manually delete them in my GC disabled version? Thanks!
Re: OT: on IDEs and code writing on steroids
Hello Yigal, C# assemblies are analogous to C/C++/D libs. you can't create a standalone executable in D just by parsing the D source files (for all the imports) if you need to link in external libs. you need to at least specify the lib name if it's on the linker's search path or provide the full path otherwise. pagma(lib, ...); //? Same thing with assemblies. you have to provide that meta-data (lib names) anyway both in C# and D. the only difference is that C# (correctly) recognizes that this is the better default. IMHO the c# way is the /worse/ default. Based on that being my opinion, I think we have found where we will have to disagree. Part of my reasoning is that in the normal case, for practical reasons, that file will have to be maintained by an IDE, thus /requiring/ development to be in an IDE of some kind. In D, that data in can normally be part of the source code, and only in unusual cases does it need to be formally codified.
Re: Problem with .deb packages
Jason House wrote: grauzone Wrote: Daniel Keep wrote: grauzone wrote: ... Now the irony is, that Wlater wouldn't even allow Debian to redistribute a properly packaged dmd... (if Debian wanted to) Speak ye of the evil Wizard Wlater, previous servant of the dark empire of Sym'n'tek? :3 Oops. As for the distribution problem, I think it's because Walter *can't* allow it to be freely redistributed. Why not? It can't be for license reasons? Sadly, that's exactly why. The backend is under restrictions Walter can't control. For a sillier example, there's a disclaimer that the code is not intended to work after 1999. Is that really so? I would have guessed that this restriction is only for redistributing the backend source. I mean, when dmd still came without the backend source, it was shipped without the backend license.
Re: how to use GC as a leak detector? i.e. get some help info from GC?
nobody Wrote: > Hi, > > I'm writing a data processing program in D, which deals with large amounts of > small objects. One of the thing I found is that D's GC is horribly slow in > such situation. I tried my program with gc enable & disabled (with some manual > deletes). The GC disabled version (2 min) is ~100 times faster than the GC > enabled version (4 hours)! > > But of course the GC disabled version still leak memory, it soon exceeds the > machine memory limit when I try to process more data; while the GC enabled > version don't have such problem. > > So my plan is to use the GC disabled version with manual deletes. But it was > very hard to find all the memory leaks. I'm wondering: is there anyway to use > GC as a leak detector? can the GC enabled version give me some help > information on which objects get collected, so I can manually delete them in > my GC disabled version? Thanks! > > Why not use valgrind? With the GC disabled, it should give accurate results.
Re: Problem with .deb packages
grauzone Wrote: > Daniel Keep wrote: > > > > grauzone wrote: > >> ... > >> > >> Now the irony is, that Wlater wouldn't even allow Debian to redistribute > >> a properly packaged dmd... (if Debian wanted to) > > > > Speak ye of the evil Wizard Wlater, previous servant of the dark empire > > of Sym'n'tek? :3 > > Oops. > > > As for the distribution problem, I think it's because Walter *can't* > > allow it to be freely redistributed. > > Why not? It can't be for license reasons? Sadly, that's exactly why. The backend is under restrictions Walter can't control. For a sillier example, there's a disclaimer that the code is not intended to work after 1999.
Re: OT: on IDEs and code writing on steroids
BCS wrote: Hello Christopher, BCS wrote: But that's not the point. Neither make nor VS's equivalent is what this thread was about. At least not where I was involved. My point is that the design of c# *requiters* the maintenance (almost certainly by a c# specific IDE) of some kind of external metadata file that contains information that can't be derived from the source code its self, where as with D, no such metadata is *needed*. If you wanted, you could build a tool to take D source code and generate a makefile or a bash build script from it If you wanted, you could create a tool to do the same with C# source code, assuming there exists a directory containing all and only those source files that should end up in the resulting assembly. I'm /not/ willing to assume that (because all to often it's not true) and you also need the list of other assemblies that should be included. C# assemblies are analogous to C/C++/D libs. you can't create a standalone executable in D just by parsing the D source files (for all the imports) if you need to link in external libs. you need to at least specify the lib name if it's on the linker's search path or provide the full path otherwise. Same thing with assemblies. you have to provide that meta-data (lib names) anyway both in C# and D. the only difference is that C# (correctly) recognizes that this is the better default.
Re: !in operator?
Stewart Gordon Wrote: > Jason House wrote: > > > That is unfortunately a rather sticky point. The in operator does not > > return bool. I think the lack of !in is to encourage writing of efficient > > code. I'm not really sure though. > > How, exactly, does not having !in make code efficient? > > Stewart. Consider the following code snippets: Method 1: if (x !in y) foo(); else{ auto z = x in y; bar(z); } Method 2: auto z = x in y; if (z is null) foo; else bar(z); Method 1 essentially calls in twice while method 2 calls in once. PS: Please don't assume that I'm advocating not having a !in operator. I'm just pointing out possible reasons it may have been avoided.
how to use GC as a leak detector? i.e. get some help info from GC?
Hi, I'm writing a data processing program in D, which deals with large amounts of small objects. One of the thing I found is that D's GC is horribly slow in such situation. I tried my program with gc enable & disabled (with some manual deletes). The GC disabled version (2 min) is ~100 times faster than the GC enabled version (4 hours)! But of course the GC disabled version still leak memory, it soon exceeds the machine memory limit when I try to process more data; while the GC enabled version don't have such problem. So my plan is to use the GC disabled version with manual deletes. But it was very hard to find all the memory leaks. I'm wondering: is there anyway to use GC as a leak detector? can the GC enabled version give me some help information on which objects get collected, so I can manually delete them in my GC disabled version? Thanks!
Re: XML API
On 2009-05-24 14:13:31 -0400, Michel Fortin said: The reason is that if your callback api only does a single callback, all you've really done is move the switch statement inside the function call at the cost of having to define a crapload of functions outside of it. The thing is that inside the parser code there is already a separate code path for dealing with each type of token. Various callbacks can be called from these separate code paths. When you return after parsing one token, the code path isn't different anymore, so you need to add an extra swich statement that wouldn't be there with a callback called from the right code path. I suddenly noticed that I misunderstood what you meant in the paragraph above so I don't expect my answer above to fit your question. Nevertheless, I suppose the examples at the end of my previous post will clarify things: basically the callback isn't a function pointer, it's an alias template argument which can disptach to overloaded functions or template functions so you don't need a switch statement. Sorry for any confusion. -- Michel Fortin michel.for...@michelf.com http://michelf.com/
XML API
On 2009-05-24 12:51:43 -0400, Daniel Keep said: Michel Fortin wrote: ... A callback API isn't necessarily SAX. A callback API doesn't necessarily have to parse everything until completion, it could parse only the next token and call the appropriate callback. When I talk "callback api," I mean something fundamentally like SAX. SAX is defintely a popular callback API for XML, but to me a callback API just imply that some callback gets called. The reason is that if your callback api only does a single callback, all you've really done is move the switch statement inside the function call at the cost of having to define a crapload of functions outside of it. The thing is that inside the parser code there is already a separate code path for dealing with each type of token. Various callbacks can be called from these separate code paths. When you return after parsing one token, the code path isn't different anymore, so you need to add an extra swich statement that wouldn't be there with a callback called from the right code path. If I can construct a range class/struct over my callback API I'll be happy. And if I can recursively call the parser API inside a callback handler so I can reuse the call stack while parsing then I'll be very happy. I don't see how constructing a range over a callback api will work. Callback apis are inversion of control, ranges aren't. Your definition of a callback API is about inversion of control. My definition is just that it parse one token and call a function for that token. If you read what I wrote using your definition, it obviously can't work indeed. ... Like I said, this seems like a lot of work to bolt a callback interface onto something a pull api is designed for. At best, you'll end up rewriting this: foreach( tt ; pp ) { switch( tt ) { case XmlTokenType.StartElement: blah(pp.name); break; ... } } to this: pp.parse ( XmlToken(Type.StartElement, {blah(pp.name);}), ... ); Except of course that you now can't easily control the loop, nor can do you do fall-through on the cases. Again, my definition of a callback API doesn't include an implicit loop, just a callback. And I intend the callback to be a template argument so it can be dispatched using function overloading and/or function templates. So you'll have this instead: bool continue = true; do continue = pp.readNext!(callback)(); while (continue); void callback(OpenElementToken t) { blah(t.name); } void callback(CloseElementToken t) { ... } void callback(CharacterDataToken t) { ... } ... No switch statement and no inversion of control. And here's my current prototype for a range: alias Algebraic!( CharDataToken, CommentToken, PIToken, CDataSectionToken, AttrToken, XMLDeclToken, OpenElementToken, CloseElementToken, EmptyElementToken ) XMLToken; struct XMLForwardRange(Parser) { bool empty; XMLToken front; Parser parser; this(Parser parser) { this.parser = parser; popFront(); // parse first token } void popFront() { empty = !parser.readNext!(callback)(); } private void callback(T)(T token) { front = token; } } Constructing a pull parser using the same pattern should be pretty easy if you wanted to. -- Michel Fortin michel.for...@michelf.com http://michelf.com/
Re: Problem with .deb packages
grauzone wrote: > Daniel Keep wrote: >> >> grauzone wrote: >>> ... >>> >>> Now the irony is, that Wlater wouldn't even allow Debian to redistribute >>> a properly packaged dmd... (if Debian wanted to) >> >> Speak ye of the evil Wizard Wlater, previous servant of the dark empire >> of Sym'n'tek? :3 > > Oops. > >> As for the distribution problem, I think it's because Walter *can't* >> allow it to be freely redistributed. > > Why not? It can't be for license reasons? I don't think Walter has complete ownership over all of the code.
Re: Problem with .deb packages
Daniel Keep wrote: grauzone wrote: ... Now the irony is, that Wlater wouldn't even allow Debian to redistribute a properly packaged dmd... (if Debian wanted to) Speak ye of the evil Wizard Wlater, previous servant of the dark empire of Sym'n'tek? :3 Oops. As for the distribution problem, I think it's because Walter *can't* allow it to be freely redistributed. Why not? It can't be for license reasons?
Re: !in operator?
Jason House wrote: That is unfortunately a rather sticky point. The in operator does not return bool. I think the lack of !in is to encourage writing of efficient code. I'm not really sure though. How, exactly, does not having !in make code efficient? Stewart.
Re: !in operator?
Jeremie Pelletier wrote: Why is there no !in operator just like there is a !is operator? Is it because this expression evaluates to a pointer to the found element? Of course not. This compiles: void main() { char* abc; assert (!abc); } so why shouldn't !in? Stewart.
Re: Problem with .deb packages
grauzone wrote: > ... > > Now the irony is, that Wlater wouldn't even allow Debian to redistribute > a properly packaged dmd... (if Debian wanted to) Speak ye of the evil Wizard Wlater, previous servant of the dark empire of Sym'n'tek? :3 As for the distribution problem, I think it's because Walter *can't* allow it to be freely redistributed.
Re: Finalizing D2
Michel Fortin wrote: > ... > > A callback API isn't necessarily SAX. A callback API doesn't necessarily > have to parse everything until completion, it could parse only the next > token and call the appropriate callback. When I talk "callback api," I mean something fundamentally like SAX. The reason is that if your callback api only does a single callback, all you've really done is move the switch statement inside the function call at the cost of having to define a crapload of functions outside of it. > If I can construct a range class/struct over my callback API I'll be > happy. And if I can recursively call the parser API inside a callback > handler so I can reuse the call stack while parsing then I'll be very > happy. I don't see how constructing a range over a callback api will work. Callback apis are inversion of control, ranges aren't. As for using a callback api recursively, that just seems like a lot of work to replicate the way a pull api works in the first place. >> Something like Tango's PullParser is the superior API because although >> it's more verbose up-front, that's as bad as it gets. Plus, you can >> actually do stuff like call subroutines. > > All that is needed really is a callback system that parses only one > token. Then the callback can update the PullParser state, or the > token-range state, run in a loop to produce a SAX-like API, or directly > do what you want to do, which may include parsing more tokens using > different callbacks until you reach a closing tag. Like I said, this seems like a lot of work to bolt a callback interface onto something a pull api is designed for. At best, you'll end up rewriting this: > foreach( tt ; pp ) > { > switch( tt ) > { > case XmlTokenType.StartElement: blah(pp.name); break; > ... > } > } to this: > pp.parse > ( > XmlToken(Type.StartElement, {blah(pp.name);}), > ... > ); Except of course that you now can't easily control the loop, nor can do you do fall-through on the cases.
Re: OT: on IDEs and code writing on steroids
Yigal Chripun wrote: ... >> > this I completely disagree with. those are the same faulty reasons I > already answered. > an IDE does _not_ create bad programmers, and does _not_ encourage bad > code. it does encourage descriptive names which is a _good_ thing. > > writing "strcpy" ala C style is cryptic and *wrong*. code is read > hundred times more than it's written and a better name would be for > instance - "stringCopy". > it's common nowadays to have tera-byte sized HDD so why people try to > save a few bytes from their source while sacrificing readability? ... This is not what I was saying. I'm not talking about strcpy vs stringCopy. stringCopy is short. I'm talking about things like SetCompatibleTextRenderingDefault. And this example isn't even so bad. Fact is, it is easier to come up with long identifiers and there is no penalty in the form of typing cost for doing so. It's not about bad programmers (or saving bytes, that's just ridiculous), but IDE does encourage some kind of constructs because they are easier in that environment. Good programmers come up with good, descriptive names, whether they program in an IDE or not. At work I must program in VB.NET. This language is pretty verbose in describing even the most common things. It's easier to parse when you're new to the language, but after a while I find all the verbosity gets in the way of readability.
Re: Problem with .deb packages
The packages are still broken. I dont know who did it but we can't let that on the website. It's hard to persuade people to use D if packages are broken and there isn't Windows installer. I think a lot of people dont even try by seeing that. You can bet on that. It makes you wonder how whoever assembled the package tested it. Did you just go with --force-all because he couldn't figure out various things about the package system? What the heck did he do? And why the hell is it not fixed yet? Providing broken packages is as nice to the user as providing virus infected .exe files. Now the irony is, that Wlater wouldn't even allow Debian to redistribute a properly packaged dmd... (if Debian wanted to)
Re: OT: on IDEs and code writing on steroids
Hello Christopher, BCS wrote: But that's not the point. Neither make nor VS's equivalent is what this thread was about. At least not where I was involved. My point is that the design of c# *requiters* the maintenance (almost certainly by a c# specific IDE) of some kind of external metadata file that contains information that can't be derived from the source code its self, where as with D, no such metadata is *needed*. If you wanted, you could build a tool to take D source code and generate a makefile or a bash build script from it If you wanted, you could create a tool to do the same with C# source code, assuming there exists a directory containing all and only those source files that should end up in the resulting assembly. I'm /not/ willing to assume that (because all to often it's not true) and you also need the list of other assemblies that should be included.
Re: Problem with .deb packages
Jesse Phillips a écrit : On Sat, 02 May 2009 14:57:43 +0200, Bruno Deligny wrote: When i try to install dmd1 or dmd2 on my ubuntu i386 with the deb packages on http://www.digitalmars.com/d/download.html, it says "Error : incorrect Architecture « amd64 »" The packages were built for the amd64 architecture. I don't know how the packages were built for amd64, there only i386 packages. You have to provide dpkg the --force-architecture switch. dpkg --force-architecture -i ...deb The packages are still broken. I dont know who did it but we can't let that on the website. It's hard to persuade people to use D if packages are broken and there isn't Windows installer. I think a lot of people dont even try by seeing that.
Re: Finalizing D2
On 2009-05-24 03:22:47 -0400, Daniel Keep said: Callbacks are "easier" to set up, but are incredibly complicated for any sort of structured parsing. The problem is that you can't easily change the behaviour of the parser once it's started. I had to write a SAX parser for a structured data format a few years ago. I swear that 90% of the code (and it's a monstrously huge module) was just boilerplate to work around the bloody callback system. I've come to the conclusion that the SAX api is about the worse POSSIBLE way of parsing anything more complex than a flat file that shouldn't have been XML in the first place. A callback API isn't necessarily SAX. A callback API doesn't necessarily have to parse everything until completion, it could parse only the next token and call the appropriate callback. If I can construct a range class/struct over my callback API I'll be happy. And if I can recursively call the parser API inside a callback handler so I can reuse the call stack while parsing then I'll be very happy. Something like Tango's PullParser is the superior API because although it's more verbose up-front, that's as bad as it gets. Plus, you can actually do stuff like call subroutines. All that is needed really is a callback system that parses only one token. Then the callback can update the PullParser state, or the token-range state, run in a loop to produce a SAX-like API, or directly do what you want to do, which may include parsing more tokens using different callbacks until you reach a closing tag. -- Michel Fortin michel.for...@michelf.com http://michelf.com/
Re: OT: on IDEs and code writing on steroids
BCS wrote: Hello Yigal, Georg Wrede wrote: Yigal Chripun wrote: Make _is_ a build tool Yes. But since it's on every Unix since almost 40 years back, it doesn't count here. :-) Besides, it has tons of other uses, too. One might as well say that a text editor is a build tool. You construct (or erect) software with it. ;-) Nope. it does count as an external build tool OK and so can bash because it can run scripts. No, the main purpose of make is to build software. You probably wouldn't think to use a makefile to automate converting flac files to ogg files, for instance. Or look at bashburn -- it has a user interface (albeit using text menus rather than graphics). You might be able to do that with a makefile, but it would be seriously awkward, and you'd mainly be using shell scripting. And bash does not have any special features to assist in building software. But that's not the point. Neither make nor VS's equivalent is what this thread was about. At least not where I was involved. My point is that the design of c# *requiters* the maintenance (almost certainly by a c# specific IDE) of some kind of external metadata file that contains information that can't be derived from the source code its self, where as with D, no such metadata is *needed*. If you wanted, you could build a tool to take D source code and generate a makefile or a bash build script from it If you wanted, you could create a tool to do the same with C# source code, assuming there exists a directory containing all and only those source files that should end up in the resulting assembly. If you follow C# best practices, this is what you will do -- and your directory structure will match your namespaces besides. But this is not enforced.
Re: Finalizing D2
Michel Fortin wrote: > On 2009-05-23 01:25:49 -0400, Andrei Alexandrescu > said: > >> * std.xml: replace with something that moves faster than molasses. > > I started to write an XML parser using D1 and a pseudo-range > implementation a little while ago, but never finished it. (I was > undecided about the API, and that somewhat killed my interest.) > > Perhaps I should finish it and contribute to Phobos. > > The irking thing about the API was that if I expose a range for parsing > and returning tokens, I then need a switch statement to do the right > thing about each kind of these tokens (like instantiating the proper > node type) whereas with a callback API you don't need to bother saving > and then switching on a flag value telling you which kind of node you've > read (and callbacks can be aliases in templates). They are two different > compromises between speed and flexibility and I guess both should be > supported. Callbacks are "easier" to set up, but are incredibly complicated for any sort of structured parsing. The problem is that you can't easily change the behaviour of the parser once it's started. I had to write a SAX parser for a structured data format a few years ago. I swear that 90% of the code (and it's a monstrously huge module) was just boilerplate to work around the bloody callback system. I've come to the conclusion that the SAX api is about the worse POSSIBLE way of parsing anything more complex than a flat file that shouldn't have been XML in the first place. Something like Tango's PullParser is the superior API because although it's more verbose up-front, that's as bad as it gets. Plus, you can actually do stuff like call subroutines.