Re: byChunk odd behavior?
Thanks for your help everyone. I agree that the issue is due to the misusage of an InputRange but what is the semantics of 'take' when applied to an InputRange? It seems that calling it invalidates the range; in which case what is the recommended way to get a few bytes and keep on advancing. For instance, to read a ushort, I use range.read!(ushort)() Unfortunately, it reads a single value. For now, I use a loop foreach (i; 0..N) { buffer[i] = range.front; range.popFront(); } Is there a more idiomatic way to do the same thing? In Scala, 'take' consumes bytes from the iterator. So the same code would be buffer = range.take(N).toArray
Re: Using ffmpeg in command line with D
On Monday, 21 March 2016 at 17:26:09 UTC, Karabuta wrote: Will this work Yes. and is it the right approach used by video convertor front-ends? Well, yes, provisionally. When you invoke "ffmpeg" via spawnProcess, that isolates ffmpeg as its own process, obviously. From a security and maintenance standpoint, that is very, very good. None of the code in ffmpeg has to be considered when writing your own code, other than how it acts when you call it. If ffmpeg scrambles its own memory, your program won't get messed up. If your program scrambles its own memory, ffmpeg won't get corrupted, and neither will your video file. There are a few downsides though. It's expensive to set up that very restricted, isolated interface (executing a process) but considering the amount of number crunching involved in processing videos it's a pretty negligible cost. If you're doing some sort of web server that serves up a million generated pages a minute though, all that executing can bog it down. But you wouldn't use ffmpeg for that. The extreme isolation of a separate process means that you're restricted in what you can do with the video. You can do anything that ffmpeg devs write in their interface, but that's it. If they change the format of their command, all your stuff will break until you fix that, but considering how old ffmpeg is, that's probably not going to happen any time soon. In some cases, there are resources that cannot be reused between two processes, that are very expensive to set up and tear down. You wouldn't use mpv like ffmpeg for instance, because it would have to recreate the video display window every execution. Instead, mpv has a "socket" interface that you can connect to after launching one process, and use that to control the player. So, for video conversion, yes it's the right approach. Your mileage may vary if you want to display that video, or generate videos on-demand from a high performance webserver. (in which case the video processing will still be 99.999% of what slows you down, not process execution).
Re: byChunk odd behavior?
On Tuesday, 22 March 2016 at 07:17:41 UTC, Hanh wrote: input.take(3).array; foreach (char c; input) { Never use an input range twice. So, here's how to use it twice: If it's a "forward range" you can use save() to get a copy to use later (but all the std.stdio.* ranges don't implement that). You can also use "std.range.tee" to send the results to an "output range" (something implementing put(K)(K)) while iterating over them. tee can't produce two input ranges, because without caching all iterated items in memory, only one range can request items on-demand; the other must take them passively. You could write a thing that takes an InputRange and produces a ForwardRange, by caching those items in memory, but at that point you might as well use .array and get the whole thing. ByChunk is an input range (not a forward range), so there's undefined behavior when you use it twice. No bugs there, since it wasn't meant to be reused anyway. What it does is cache the last seen chunk, first iterate over that, then read more chunks from the file. So every time you iterate, you'll get that same last chunk. It's also tricky to use input ranges after mutating their underlying data structure. If you seek in the file, for instance, then a previously created ByChunk will produce the chunk it has cached, and only then start reading chunks from that exact position in the file. A range over some sort of list, if you delete the current item in the list, should the range produce the previous item? The next item? null? So, as a general rule, never use input ranges twice, and never use them after mutating the underlying data structure. Just recreate them if you want to do something twice, or use tee as mentioned above.
Re: byChunk odd behavior?
On 03/22/2016 12:17 AM, Hanh wrote: > Hi all, > > I'm trying to process a rather large file as an InputRange and run into > something strange with byChunk / take. > > void test() { > auto file = new File("test.txt"); > auto input = file.byChunk(2).joiner; > input.take(3).array; > foreach (char c; input) { > writeln(c); > } > } > > Let's say test.txt contains "123456". > > The output will be > 3 > 4 > 5 > 6 > > The "take" consumed one chunk from the file, but if I increase the chunk > size to 4, then it won't. I don't understand the issue fully but byChunk() will treat every character in the file. So, even the newline character(s) are considered. > Actually, what is the easiest way to read a large file as a stream? My > file contains a bunch of serialized messages of variable length. If it's a text file I think I would start with File.byLine (or byLineCopy). Then it depends on how the messages are layed out. One per line? Do you know the size at the start? etc. Alternatively, use (or examine) one of the great D serialization modules out there. :) (We already need something like this in the standard library, which I think some people are already working on.) Ali
Re: Something wrong with GC
On 20.03.2016 08:49, stunaep wrote: The gc throws invalid memory errors if I use Arrays from std.container. For example, this throws an InvalidMemoryOperationError: import std.stdio; import std.container; void main() { new Test(); } class Test { private Array!string test = Array!string(); this() { test.insert("test"); writeln(test[0]); } } I can reproduce the InvalidMemoryOperationError with git head dmd, but there doesn't seem to be a problem with 2.070. So I'd say this is a regression in the development version. I've filed an issue: https://issues.dlang.org/show_bug.cgi?id=15821 You're probably building dmd/phobos from git, or you're using a nightly, right? Maybe you can go back to 2.070.2 until this is sorted out.
Re: Something wrong with GC
On Tuesday, 22 March 2016 at 13:46:41 UTC, stunaep wrote: public class Example2 { private int one; private int two; public this(int one, int two) { this.one = one; this.two = two; } } in a tree map and list of some sort. Neither of the above work whether they are classes or structs and it's starting to become quite bothersome... Is there a particular reason why you don't want to use the standard ranges? public class Example2 { private int one; private int two; public this(int one, int two) { this.one = one; this.two = two; } } void main() { auto myExamplesList = [ new Example2( 6,3 ), new Example2(7,5) ]; // Note that if you do a lot of appending then using Appender is more performant than ~= myExamplesList ~= new Example2(9,1); } For trees there is also redBlackTree
Re: byChunk odd behavior?
On Tuesday, 22 March 2016 at 07:17:41 UTC, Hanh wrote: Hi all, I'm trying to process a rather large file as an InputRange and run into something strange with byChunk / take. void test() { auto file = new File("test.txt"); auto input = file.byChunk(2).joiner; input.take(3).array; foreach (char c; input) { writeln(c); } } Let's say test.txt contains "123456". The output will be 3 4 5 6 The "take" consumed one chunk from the file, but if I increase the chunk size to 4, then it won't. It looks like if "take" spans two chunks, it affects the input range otherwise it doesn't. Actually, what is the easiest way to read a large file as a stream? My file contains a bunch of serialized messages of variable length. Thanks, --h I dont know if this helps, but it looks like since take three doesn't consume the chunk it is not removed from the range. import std.stdio; import std.algorithm; import std.range; void main() { auto file = stdin; auto input = file.byChunk(2).joiner; foreach (char c; input.take(3).array) { writeln(c); } foreach (char c; input) { writeln(c); } } Produces: 1 2 3 < Got data but didn't eat the chunk. 3 4 5 6
Re: Something wrong with GC
On Monday, 21 March 2016 at 07:55:39 UTC, thedeemon wrote: On Sunday, 20 March 2016 at 07:49:17 UTC, stunaep wrote: The gc throws invalid memory errors if I use Arrays from std.container. Those arrays are for RAII-style deterministic memory release, they shouldn't be freely mixed with GC-allocated things. What happens here is while initializing Array sees it got some GC-ed value type (strings), so it tells GC to look after those strings. When your program ends runtime does a GC cycle, finds your Test object, calls its destructor that calls Array destructor that tries to tell GC not to look at its data anymore. But during a GC cycle it's currently illegal to call such GC methods, so it throws an error. Moral of this story: try not to store "managed" (collected by GC) types in Array and/or try not to have Arrays inside "managed" objects. If Test was a struct instead of a class, it would work fine. So what am I do to? Any other language can do such a thing so trivially... I also run into the same problem with emsi_containers TreeMap. It is imperative that I can store data such as public class Example1 { private File file; public this(File f) { this.file = f; } } or public class Example2 { private int one; private int two; public this(int one, int two) { this.one = one; this.two = two; } } in a tree map and list of some sort. Neither of the above work whether they are classes or structs and it's starting to become quite bothersome...
Re: pass a struct by value/ref and size of the struct
On Tuesday, 22 March 2016 at 07:35:49 UTC, ZombineDev wrote: If the object is larger than the size of a register on the target machine, it is implicitly passed by ref (i.e. struct fields are accessed by offset from the stack pointer). (Oops, sorry ZombineDev, should've read your reply first)
Re: pass a struct by value/ref and size of the struct
On Monday, 21 March 2016 at 23:31:06 UTC, ref2401 wrote: I have got a plenty of structs in my project. Their size varies from 12 bytes to 128 bytes. Is there a rule of thumb that states which structs I pass by value and which I should pass by reference due to their size? Note that the compiler may do things different from what you may have expected. For example for C code, the platform ABI may already dictate passing of your structs by pointer reference, even though your code says "by value". See: https://msdn.microsoft.com/en-us/library/zthk2dkh.aspx MSVC will pass structs that are larger than 64 bits (8 bytes) by reference in C++ code. Your D compiler may decide to do the same.
Re: Trying to use Dustmite on windows
On Tuesday, 22 March 2016 at 09:19:27 UTC, Vladimir Panteleev wrote: On Tuesday, 22 March 2016 at 09:11:52 UTC, Jerry wrote: So I want to pass my DUB project to Dustmite and use findstr For reducing dub projects, try the "dub dustmite" command, e.g. "--compiler-regex=Assertion failure". Thanks that works nice. But now my Initial run fails. Using dub dustmite ../testReduction --compiler-regex="Assertion failure" However when I navigate to the testReduction directory and runs dub I get error message: Assertion failure: '0' on line 1942 in file 'glue.c'
Re: Trying to use Dustmite on windows
On Tuesday, 22 March 2016 at 09:11:52 UTC, Jerry wrote: So I want to pass my DUB project to Dustmite and use findstr For reducing dub projects, try the "dub dustmite" command, e.g. "--compiler-regex=Assertion failure".
Trying to use Dustmite on windows
I am really not used to bash scripts. I am trying to use Dustmite on my project since I have started getting an "Assertion failure: '0' in glue.c on line 1492" and really can not find any issue about it in the issue tracker. So I want to pass my DUB project to Dustmite and use findstr bash command to figure out result. So what I come up with was this: dustmite source "dub run | findstr /b /C:\"Assertion failure\"" But findstr is failing with error message: "Can not open failure" /Jerry
Re: pass a struct by value/ref and size of the struct
On Monday, 21 March 2016 at 23:31:06 UTC, ref2401 wrote: I have got a plenty of structs in my project. Their size varies from 12 bytes to 128 bytes. Is there a rule of thumb that states which structs I pass by value and which I should pass by reference due to their size? Thanks. If the object is larger than the size of a register on the target machine, it is implicitly passed by ref (i.e. struct fields are accessed by offset from the stack pointer). So the question is: does the compiler need to create temporaries and is this an expensive operation? In C++ the problem is that there are lots of non-POD types which have expensive copy constructors (like std::vector) and that's why taking objects by const& is good guideline. In D structs are implicitly movable (can be memcpy-ed around without their postblit this(this) function called) and that's why I think that passing by value shouldn't be as large problem as in C++, especially if you are using a good optimizing compiler such as LDC or GDC. Anyway, modern hardware in combination with compiler optimizations can often suprise you, so I recommend profiling your code and doing microbenchmarks to figure out where you may have performance problems. In my experience, large amounts of small memory allocations is orders of magnitude larger problem than the copying of large value types. The next thing to look for is inefficient memory layout with lots of indirections.
Re: byChunk odd behavior?
On Tuesday, 22 March 2016 at 07:17:41 UTC, Hanh wrote: Hi all, I'm trying to process a rather large file as an InputRange and run into something strange with byChunk / take. void test() { auto file = new File("test.txt"); auto input = file.byChunk(2).joiner; input.take(3).array; foreach (char c; input) { writeln(c); } } Let's say test.txt contains "123456". The output will be 3 4 5 6 The "take" consumed one chunk from the file, but if I increase the chunk size to 4, then it won't. It looks like if "take" spans two chunks, it affects the input range otherwise it doesn't. Actually, what is the easiest way to read a large file as a stream? My file contains a bunch of serialized messages of variable length. Thanks, --h I have the feeling that it's related to the forward only nature of an InputRange. All would be ok with a take(N)+popFrontN method. I'm going to keep looking.
byChunk odd behavior?
Hi all, I'm trying to process a rather large file as an InputRange and run into something strange with byChunk / take. void test() { auto file = new File("test.txt"); auto input = file.byChunk(2).joiner; input.take(3).array; foreach (char c; input) { writeln(c); } } Let's say test.txt contains "123456". The output will be 3 4 5 6 The "take" consumed one chunk from the file, but if I increase the chunk size to 4, then it won't. It looks like if "take" spans two chunks, it affects the input range otherwise it doesn't. Actually, what is the easiest way to read a large file as a stream? My file contains a bunch of serialized messages of variable length. Thanks, --h