Re: byChunk odd behavior?

2016-03-22 Thread Hanh via Digitalmars-d-learn

Thanks for your help everyone.

I agree that the issue is due to the misusage of an InputRange 
but what is the semantics of 'take' when applied to an 
InputRange? It seems that calling it invalidates the range; in 
which case what is the recommended way to get a few bytes and 
keep on advancing.


For instance, to read a ushort, I use
range.read!(ushort)()
Unfortunately, it reads a single value.

For now, I use a loop

foreach (i; 0..N) {
  buffer[i] = range.front;
  range.popFront();
  }

Is there a more idiomatic way to do the same thing?

In Scala, 'take' consumes bytes from the iterator. So the same 
code would be

buffer = range.take(N).toArray



Re: Using ffmpeg in command line with D

2016-03-22 Thread cy via Digitalmars-d-learn

On Monday, 21 March 2016 at 17:26:09 UTC, Karabuta wrote:


Will this work


Yes.


and is it the right approach used by video convertor front-ends?


Well, yes, provisionally. When you invoke "ffmpeg" via 
spawnProcess, that isolates ffmpeg as its own process, obviously. 
From a security and maintenance standpoint, that is very, very 
good. None of the code in ffmpeg has to be considered when 
writing your own code, other than how it acts when you call it. 
If ffmpeg scrambles its own memory, your program won't get messed 
up. If your program scrambles its own memory, ffmpeg won't get 
corrupted, and neither will your video file.


There are a few downsides though. It's expensive to set up that 
very restricted, isolated interface (executing a process) but 
considering the amount of number crunching involved in processing 
videos it's a pretty negligible cost. If you're doing some sort 
of web server that serves up a million generated pages a minute 
though, all that executing can bog it down. But you wouldn't use 
ffmpeg for that.


The extreme isolation of a separate process means that you're 
restricted in what you can do with the video. You can do anything 
that ffmpeg devs write in their interface, but that's it. If they 
change the format of their command, all your stuff will break 
until you fix that, but considering how old ffmpeg is, that's 
probably not going to happen any time soon.


In some cases, there are resources that cannot be reused between 
two processes, that are very expensive to set up and tear down. 
You wouldn't use mpv like ffmpeg for instance, because it would 
have to recreate the video display window every execution. 
Instead, mpv has a "socket" interface that you can connect to 
after launching one process, and use that to control the player.


So, for video conversion, yes it's the right approach. Your 
mileage may vary if you want to display that video, or generate 
videos on-demand from a high performance webserver. (in which 
case the video processing will still be 99.999% of what slows you 
down, not process execution).


Re: byChunk odd behavior?

2016-03-22 Thread cy via Digitalmars-d-learn

On Tuesday, 22 March 2016 at 07:17:41 UTC, Hanh wrote:

input.take(3).array;
foreach (char c; input) {


Never use an input range twice. So, here's how to use it twice:

If it's a "forward range" you can use save() to get a copy to use 
later (but all the std.stdio.* ranges don't implement that). You 
can also use "std.range.tee" to send the results to an "output 
range" (something implementing put(K)(K)) while iterating over 
them.


tee can't produce two input ranges, because without caching all 
iterated items in memory, only one range can request items 
on-demand; the other must take them passively.


You could write a thing that takes an InputRange and produces a 
ForwardRange, by caching those items in memory, but at that point 
you might as well use .array and get the whole thing.


ByChunk is an input range (not a forward range), so there's 
undefined behavior when you use it twice. No bugs there, since it 
wasn't meant to be reused anyway. What it does is cache the last 
seen chunk, first iterate over that, then read more chunks from 
the file. So every time you iterate, you'll get that same last 
chunk.


It's also tricky to use input ranges after mutating their 
underlying data structure. If you seek in the file, for instance, 
then a previously created ByChunk will produce the chunk it has 
cached, and only then start reading chunks from that exact 
position in the file. A range over some sort of list, if you 
delete the current item in the list, should the range produce the 
previous item? The next item? null?


So, as a general rule, never use input ranges twice, and never 
use them after mutating the underlying data structure. Just 
recreate them if you want to do something twice, or use tee as 
mentioned above.


Re: byChunk odd behavior?

2016-03-22 Thread Ali Çehreli via Digitalmars-d-learn

On 03/22/2016 12:17 AM, Hanh wrote:
> Hi all,
>
> I'm trying to process a rather large file as an InputRange and run into
> something strange with byChunk / take.
>
> void test() {
>  auto file = new File("test.txt");
>  auto input = file.byChunk(2).joiner;
>  input.take(3).array;
>  foreach (char c; input) {
>  writeln(c);
>  }
> }
>
> Let's say test.txt contains "123456".
>
> The output will be
> 3
> 4
> 5
> 6
>
> The "take" consumed one chunk from the file, but if I increase the chunk
> size to 4, then it won't.

I don't understand the issue fully but byChunk() will treat every 
character in the file. So, even the newline character(s) are considered.


> Actually, what is the easiest way to read a large file as a stream? My
> file contains a bunch of serialized messages of variable length.

If it's a text file I think I would start with File.byLine (or 
byLineCopy). Then it depends on how the messages are layed out. One per 
line? Do you know the size at the start? etc.


Alternatively, use (or examine) one of the great D serialization modules 
out there. :)


(We already need something like this in the standard library, which I 
think some people are already working on.)


Ali



Re: Something wrong with GC

2016-03-22 Thread ag0aep6g via Digitalmars-d-learn

On 20.03.2016 08:49, stunaep wrote:

The gc throws invalid memory errors if I use Arrays from std.container.
For example, this throws an InvalidMemoryOperationError:

import std.stdio;
import std.container;

void main() {
new Test();
}

class Test {

private Array!string test = Array!string();

this() {
test.insert("test");
writeln(test[0]);
}
}


I can reproduce the InvalidMemoryOperationError with git head dmd, but 
there doesn't seem to be a problem with 2.070. So I'd say this is a 
regression in the development version.


I've filed an issue: https://issues.dlang.org/show_bug.cgi?id=15821

You're probably building dmd/phobos from git, or you're using a nightly, 
right? Maybe you can go back to 2.070.2 until this is sorted out.


Re: Something wrong with GC

2016-03-22 Thread Edwin van Leeuwen via Digitalmars-d-learn

On Tuesday, 22 March 2016 at 13:46:41 UTC, stunaep wrote:

public class Example2 {

private int one;
private int two;

public this(int one, int two) {
this.one = one;
this.two = two;
}
}


in a tree map and list of some sort. Neither of the above work 
whether they are classes or structs and it's starting to become 
quite bothersome...


Is there a particular reason why you don't want to use the 
standard ranges?


public class Example2 {

private int one;
private int two;

public this(int one, int two) {
this.one = one;
this.two = two;
}
}

void main()
{
auto myExamplesList = [ new Example2( 6,3 ), new Example2(7,5) ];

// Note that if you do a lot of appending then using 
Appender is more performant than ~=

myExamplesList ~= new Example2(9,1);
}

For trees there is also redBlackTree




Re: byChunk odd behavior?

2016-03-22 Thread Taylor Hillegeist via Digitalmars-d-learn

On Tuesday, 22 March 2016 at 07:17:41 UTC, Hanh wrote:

Hi all,

I'm trying to process a rather large file as an InputRange and 
run into something strange with byChunk / take.


void test() {
auto file = new File("test.txt");
auto input = file.byChunk(2).joiner;
input.take(3).array;
foreach (char c; input) {
writeln(c);
}
}

Let's say test.txt contains "123456".

The output will be
3
4
5
6

The "take" consumed one chunk from the file, but if I increase 
the chunk size to 4, then it won't.


It looks like if "take" spans two chunks, it affects the input 
range otherwise it doesn't.


Actually, what is the easiest way to read a large file as a 
stream? My file contains a bunch of serialized messages of 
variable length.


Thanks,
--h


I dont know if this helps, but it looks like since take three 
doesn't consume the chunk it is not removed from the range.


import std.stdio;
import std.algorithm;
import std.range;

void main() {
auto file = stdin;
auto input = file.byChunk(2).joiner;

foreach (char c; input.take(3).array) {
writeln(c);
}

foreach (char c; input) {
writeln(c);
}
}

Produces:
1
2
3 < Got data but didn't eat the chunk.
3
4
5
6


Re: Something wrong with GC

2016-03-22 Thread stunaep via Digitalmars-d-learn

On Monday, 21 March 2016 at 07:55:39 UTC, thedeemon wrote:

On Sunday, 20 March 2016 at 07:49:17 UTC, stunaep wrote:
The gc throws invalid memory errors if I use Arrays from 
std.container.


Those arrays are for RAII-style deterministic memory release, 
they shouldn't be freely mixed with GC-allocated things. What 
happens here is while initializing Array sees it got some GC-ed 
value type (strings), so it tells GC to look after those 
strings. When your program ends runtime does a GC cycle, finds 
your Test object, calls its destructor that calls Array 
destructor that tries to tell GC not to look at its data 
anymore. But during a GC cycle it's currently illegal to call 
such GC methods, so it throws an error.
Moral of this story: try not to store "managed" (collected by 
GC) types in Array and/or try not to have Arrays inside 
"managed" objects. If Test was a struct instead of a class, it 
would work fine.


So what am I do to? Any other language can do such a thing so 
trivially... I also run into the same problem with 
emsi_containers TreeMap. It is imperative that I can store data 
such as



public class Example1 {

private File file;

public this(File f) {
this.file = f;  
}
}


or


public class Example2 {

private int one;
private int two;

public this(int one, int two) {
this.one = one;
this.two = two;
}
}


in a tree map and list of some sort. Neither of the above work 
whether they are classes or structs and it's starting to become 
quite bothersome...




Re: pass a struct by value/ref and size of the struct

2016-03-22 Thread Johan Engelen via Digitalmars-d-learn

On Tuesday, 22 March 2016 at 07:35:49 UTC, ZombineDev wrote:


If the object is larger than the size of a register on the 
target machine, it is implicitly passed by ref (i.e. struct 
fields are accessed by offset from the stack pointer).


(Oops, sorry ZombineDev, should've read your reply first)


Re: pass a struct by value/ref and size of the struct

2016-03-22 Thread Johan Engelen via Digitalmars-d-learn

On Monday, 21 March 2016 at 23:31:06 UTC, ref2401 wrote:
I have got a plenty of structs in my project. Their size varies 
from 12 bytes to 128 bytes.
Is there a rule of thumb that states which structs I pass by 
value and which I should pass by reference due to their size?


Note that the compiler may do things different from what you may 
have expected. For example for C code, the platform ABI may 
already dictate passing of your structs by pointer reference, 
even though your code says "by value". See:

https://msdn.microsoft.com/en-us/library/zthk2dkh.aspx
MSVC will pass structs that are larger than 64 bits (8 bytes) by 
reference in C++ code. Your D compiler may decide to do the same.


Re: Trying to use Dustmite on windows

2016-03-22 Thread Jerry via Digitalmars-d-learn
On Tuesday, 22 March 2016 at 09:19:27 UTC, Vladimir Panteleev 
wrote:

On Tuesday, 22 March 2016 at 09:11:52 UTC, Jerry wrote:

So I want to pass my DUB project to Dustmite and use findstr


For reducing dub projects, try the "dub dustmite" command, e.g.
 "--compiler-regex=Assertion failure".


Thanks that works nice. But now my Initial run fails.
Using
dub dustmite ../testReduction --compiler-regex="Assertion failure"

However when I navigate to the testReduction directory and runs 
dub I get error message:

Assertion failure: '0' on line 1942 in file 'glue.c'


Re: Trying to use Dustmite on windows

2016-03-22 Thread Vladimir Panteleev via Digitalmars-d-learn

On Tuesday, 22 March 2016 at 09:11:52 UTC, Jerry wrote:

So I want to pass my DUB project to Dustmite and use findstr


For reducing dub projects, try the "dub dustmite" command, e.g.  
"--compiler-regex=Assertion failure".




Trying to use Dustmite on windows

2016-03-22 Thread Jerry via Digitalmars-d-learn

I am really not used to bash scripts.
I am trying to use Dustmite on my project since I have started 
getting an
"Assertion failure: '0' in glue.c on line 1492" and really can 
not find any issue about it in the issue tracker.


So I want to pass my DUB project to Dustmite and use findstr bash 
command to figure out result. So what I come up with was this:


dustmite source "dub run | findstr /b /C:\"Assertion failure\""


But findstr is failing with error message:

"Can not open failure"

/Jerry


Re: pass a struct by value/ref and size of the struct

2016-03-22 Thread ZombineDev via Digitalmars-d-learn

On Monday, 21 March 2016 at 23:31:06 UTC, ref2401 wrote:
I have got a plenty of structs in my project. Their size varies 
from 12 bytes to 128 bytes.
Is there a rule of thumb that states which structs I pass by 
value and which I should pass by reference due to their size?


Thanks.


If the object is larger than the size of a register on the target 
machine, it is implicitly passed by ref (i.e. struct fields are 
accessed by offset from the stack pointer). So the question is: 
does the compiler need to create temporaries and is this an 
expensive operation? In C++ the problem is that there are lots of 
non-POD types which have expensive copy constructors (like 
std::vector) and that's why taking objects by const& is good 
guideline. In D structs are implicitly movable (can be memcpy-ed 
around without their postblit this(this) function called) and 
that's why I think that passing by value shouldn't be as large 
problem as in C++, especially if you are using a good optimizing 
compiler such as LDC or GDC.


Anyway, modern hardware in combination with compiler 
optimizations can often suprise you, so I recommend profiling 
your code and doing microbenchmarks to figure out where you may 
have performance problems. In my experience, large amounts of 
small memory allocations is orders of magnitude larger problem 
than the copying of large value types. The next thing to look for 
is inefficient memory layout with lots of indirections.


Re: byChunk odd behavior?

2016-03-22 Thread Hanh via Digitalmars-d-learn

On Tuesday, 22 March 2016 at 07:17:41 UTC, Hanh wrote:

Hi all,

I'm trying to process a rather large file as an InputRange and 
run into something strange with byChunk / take.


void test() {
auto file = new File("test.txt");
auto input = file.byChunk(2).joiner;
input.take(3).array;
foreach (char c; input) {
writeln(c);
}
}

Let's say test.txt contains "123456".

The output will be
3
4
5
6

The "take" consumed one chunk from the file, but if I increase 
the chunk size to 4, then it won't.


It looks like if "take" spans two chunks, it affects the input 
range otherwise it doesn't.


Actually, what is the easiest way to read a large file as a 
stream? My file contains a bunch of serialized messages of 
variable length.


Thanks,
--h


I have the feeling that it's related to the forward only nature 
of an InputRange. All would be ok with a take(N)+popFrontN 
method. I'm going to keep looking.


byChunk odd behavior?

2016-03-22 Thread Hanh via Digitalmars-d-learn

Hi all,

I'm trying to process a rather large file as an InputRange and 
run into something strange with byChunk / take.


void test() {
auto file = new File("test.txt");
auto input = file.byChunk(2).joiner;
input.take(3).array;
foreach (char c; input) {
writeln(c);
}
}

Let's say test.txt contains "123456".

The output will be
3
4
5
6

The "take" consumed one chunk from the file, but if I increase 
the chunk size to 4, then it won't.


It looks like if "take" spans two chunks, it affects the input 
range otherwise it doesn't.


Actually, what is the easiest way to read a large file as a 
stream? My file contains a bunch of serialized messages of 
variable length.


Thanks,
--h