from:"Jon D via Digitalmars\\\-d"

Re: The Case Against Autodecode

2016-05-15 Thread Jon D via Digitalmars-d

On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote:

On 5/12/2016 9:29 AM, Andrei Alexandrescu wrote:
> I am as unclear about the problems of autodecoding as I am
about the necessity
> to remove curl. Whenever I ask I hear some arguments that
work well emotionally
> but are scant on reason and engineering. Maybe it's time to
rehash them? I just
> did so about curl, no solid argument seemed to come together.
I'd be curious of
> a crisp list of grievances about autodecoding. -- Andrei

Given the importance of performance in the auto-decoding topic, 
it seems reasonable to quantify it. I took a stab at this. It 
would of course be prudent to have others conduct similar 
analysis rather than rely on my numbers alone.

Measurements were done using an artificial scenario, counting 
lower-case ascii letters. This had the effect of calling 
front/popFront many times on a long block of text. Runs were done 
both treating the text as char[] and ubyte[] and comparing the 
run times. (char[] performs auto-decoding, ubyte[] does not.)

Timings were done with DMD and LDC, and on two different data 
sets. One data set was a mix of latin languages (e.g. German, 
English, Finnish, etc.), the other non-Latin languages (e.g. 
Japanese, Chinese, Greek, etc.). The goal being to distinguish 
between scenarios with high and low Ascii character content.

The result: For DMD, auto-decoding showed a 1.6x to 2.6x cost. 
For LDC, a 12.2x to 12.9x cost.

Details:
- Test program: https://dpaste.dzfl.pl/67c7be11301f
- DMD 2.071.0. Options: -release -O -boundscheck=off -inline
- LDC 1.0.0-beta1 (based on DMD v2.070.2). Options: -release -O 
-boundscheck=off

- Machine: Macbook Pro (2.8 GHz Intel I7, 16GB ram)

Runs for each combination were done five times and the median 
times used. The median times and the char[] to ubyte[] ratio are 
below:

|  |   |char[] |   ubyte[] |
| Compiler | Text type | time (ms) | time (ms) | ratio |
|--+---+---+---+---|
| DMD  | Latin |  7261 |  4513 |   1.6 |
| DMD  | Non-latin | 10240 |  3928 |   2.6 |
| LDC  | Latin | 11773 |   913 |  12.9 |
| LDC  | Non-latin | 10756 |   883 |  12.2 |

Note: The numbers above don't provide enough info to derive a 
front/popFront rate. The program artificially makes multiple 
loops to increase the run-times. (For these runs, the program's 
repeat-count was set to 20).

Characteristics of the two data sets:
|   | | | | Bytes per |
| Text type |   Bytes |  DChars | Ascii Chars | DChar | Pct 
Ascii |

|---+-+-+-+---+---|
| Latin | 4156697 | 4059016 | 3965585 | 1.024 | 
97.7% |
| Non-latin | 4061554 | 1949290 |  348164 | 2.084 | 
17.9% |

Run-to-run variability - The run times recorded were quite 
stable. The largest delta between minimum and median time for any 
group was 17 milliseconds.

Re: Command line parsing

2016-05-14 Thread Jon D via Digitalmars-d

On Saturday, 14 May 2016 at 13:17:05 UTC, Andrei Alexandrescu 
wrote:


I showed a fellow programmer std.getopt. We were both on 
laptops. He wanted to show me how good Python's argparse is and 
how D should copy it. By the end of the chat it was obvious 
argparse was much more verbose and less pleasant to use than 
getopt. Like you have to create an object (?!?!) to parse the 
command line and many other lines of nonsense.


I've found D's getopt package to be pretty good. There are a 
number of small things that could make it quite a bit better. To 
me these generally appear more the result of limited usage rather 
than anything fundamentally wrong with the design.


For example, error text produced when a run-time argument doesn't 
match the option spec is often not helpful to the user who 
entered the command, and I've found I need to take steps to 
address this. A package like Perl's Getopt::Long tends to a bit 
more mature in some of these details.


--Jon

Re: Intermediate level D and open source projects to study

2016-05-11 Thread Jon D via Digitalmars-d-learn


On Wednesday, 11 May 2016 at 18:41:47 UTC, xtreak wrote:

Hi,

I am a D newbie. I worked through D programming language and 
programming in D books. I primarily use Python daily. I will be 
happy to know how I can go to intermediate level in D. It will 
be hepful to have projects in D of high quality and also 
beginner friendly code that I can study to improve my D.

[snip]

Might not be exactly what you are looking for, but I recently 
open-sourced some command line utilities you could take look at. 
They are real apps in that they take command line arguments, have 
help, error handling, etc. But, they are doing relatively 
straightforward tasks, things you might do in Python also. A 
caution: I'm relatively new to D as well, and there are likely 
places where the code could be more idiomatic D.


Utilities are at: https://github.com/eBay/tsv-utils-dlang. The 
readme has a section labeled "The code" that describes the code 
structure.

Re: Compiler benchmarks for an alternative to std.uni.asLowerCase.

2016-05-08 Thread Jon D via Digitalmars-d


On Monday, 9 May 2016 at 00:15:03 UTC, Peter Häggman wrote:

On Sunday, 8 May 2016 at 23:38:31 UTC, Jon D wrote:
I did a performance study on speeding up case conversion in 
std.uni.asLowerCase. Specifics for asLowerCase have been added 
to issue https://issues.dlang.org/show_bug.cgi?id=11229. 
Publishing here as some of the more general observations may 
be of wider interest.


[...]


Nice, it seems that you would have enough material to advocate 
a pull request in phobos then ;)


Thanks! I haven't yet taken the time to go through the 'becoming 
a contributor' steps, when I have the time I'll do that. In this 
case, I'd want to start by validating with the library designers 
that the approach makes sense. It by-passes what appears to a 
basic primitive, std.uni.toCaser. There may be reasons this is 
not desirable.

Compiler benchmarks for an alternative to std.uni.asLowerCase.

2016-05-08 Thread Jon D via Digitalmars-d

I did a performance study on speeding up case conversion in 
std.uni.asLowerCase. Specifics for asLowerCase have been added to 
issue https://issues.dlang.org/show_bug.cgi?id=11229. Publishing 
here as some of the more general observations may be of wider 
interest.


Background - Case conversion can generally be sped up by checking 
if a character is ascii before invoking a full unicode case 
conversion. The single character std.uni.toLower does this 
optimization, but std.uni.asLowerCase does not. asLowerCase does 
a lazy conversion of a range. For the test, I created a 
replacement for asLowerCase which uses map and toLower. In 
essence, `map!(x => x.toLower)` or `map!(x => x.byDchar.toLower)`.


Testing was with DMD (2.071) and LDC 1.0.0-beta1 (Phobos 2.070) 
on OSX. Compiler settings were `-release -O -boundscheck=off`. 
DMD was tested with and without `-inline`. LDC turns on inlining 
(-enable-inlining=1) by default with -O, but DMD does not. Texts 
tried were in Japanese, Chinese, Finnish, English, German, and 
Spanish. Timing was done both including and excluding decoding 
from utf-8 to dchar.


Performance delta including decoding to dchar:
  | Language group  | Pct Ascii | LDC gain   | DMD gain  | DMD no 
inline  |
  
|-+---++---+|
  | Latin   |95-99% | 64% (2.7x) | 93% (14x) | 48% 
(1.9x) |

  | Asian (Jpn/Chn) |  2.4-3.7% | 36% (1.6x) | 80% (5x)  | -1%

Performance delta excluding decoding to dchar:
  | Language group  | Pct Ascii | LDC gain   | DMD gain  | DMD no 
inline |
  
|-+---++---+---|
  | Latin   |95-99% | 60% (2.5x) | 95% (20x) | 60% 
(2.5x)|

  | Asian (Jpn/Chn) |  2.4-3.7% | 50% (2x)   | 95% (20x) | -2%

Observations:
* mapAsLowerCase was faster than asLowerCase across the board. 
That it was better for Asian texts suggests the improvement 
involved more just the ascii check optimization.
* Performance varied widely between compilers, and for DMD, 
whether the -inline flag was included. The performance delta 
between asLowerCase and the mapAsLowerCase replacement was very 
dependent on these choices. Similarly, the delta between 
inclusion and exclusion of auto-decoding was highly dependent on 
these selections.
* DMD improvement by using -inline: 30% for asLowerCase (1.5x), 
90% for mapAsLowerCase (10x).
* DMD (-inline) vs LDC: For asLowerCase, LDC was 65-85% faster. 
For mapAsLowerCase, DMD was 10-40% faster. There were changes to 
the map implementation in 2.071, so these were not equivalent, 
but still, it's interesting that DMD beat LDC in this case.


Thoughts:
* The large variances between compiler settings imply extra 
diligence when performance tuning at the source code level, 
especially for code intended for multiple compilers.
* Perhaps DMD -O should also turn on -inline. This would present 
a better performance picture to new users. It's also helpful when 
the different compilers agree on rough meaning of compiler 
switches.
* Auto-decoding is an oft discussed concern. It doesn't show up 
in the table above, but the data I looked at suggests the 
cost/penalty may vary quite a bit depending on usage context and 
compiler/settings. I wasn't studying aspect explicitly. It may be 
worth its own analysis.


Other details:
* Code for mapAsLowerCase and the timing program is at: 
https://dpaste.dzfl.pl/a0e2fa1c71fd
* Texts used for timing were books in several languages from the 
Project Gutenberg site (http://www.gutenberg.org/), with 
boilerplate text removed.


--Jon

Re: Can't use std.algorithm.remove on a char[]?

2016-04-30 Thread Jon D via Digitalmars-d-learn


On Saturday, 30 April 2016 at 19:21:30 UTC, ag0aep6g wrote:

On 30.04.2016 21:08, Jon D wrote:
If an initial step is to fix the documentation, it would be 
helpful to
include specifically that it doesn't work with characters. 
It's not

obvious that characters don't meet the requirement.


Characters are not the problem. remove works fine on a range of 
chars, when the elements are assignable lvalues. char[] as a 
range has neither assignable elements, nor lvalue elements. 
That is, lines 3 and 4 here don't compile:


import std.range: front;
char[] a = ['f', 'o', 'o'];
a.front = 'g';
auto ptr = 



I didn't mean to suggest making the documentation technically 
incorrect. Just that it be helpful in important cases that won't 
necessarily be obvious. To me, char[] is an important case, one 
that's not made obvious by listing the hasLvalueElements 
constraint by itself.


--Jon

Re: Can't use std.algorithm.remove on a char[]?

2016-04-30 Thread Jon D via Digitalmars-d-learn


On Saturday, 30 April 2016 at 18:32:32 UTC, ag0aep6g wrote:

On 30.04.2016 18:44, TheGag96 wrote:
I was just writing some code trying to remove a value from a 
character
array, but the compiler complained "No overload matches for 
remove", and
if I specifically say use std.algorithm.remove() the compiler 
doesn't
think it fits any definition. For reference, this would be all 
I'm doing:


char[] thing = ['a', 'b', 'c'];
thing = thing.remove(1);

Is this a bug? std.algorithm claims remove() works on any 
forward range...


The documentation is wrong.

1) remove requires a bidirectional range. The constraints and 
parameter documentation correctly say so. char[] is a 
bidirectional range, though.


2) remove requires lvalue elements. char[] fails this, as the 
range primitives decode the chars on-the-fly to dchars.


Pull request to fix the documentation:
https://github.com/dlang/phobos/pull/4271

By the way, I think requiring lvalues is too restrictive. It 
should work with assignable elements. Also, it has apparently 
been missed that const/immutable can make non-assignable 
lvalues.


There's a ticket open related to the lvalue element requirement: 
https://issues.dlang.org/show_bug.cgi?id=8930


Personally, I think this example is more compelling than the one 
in the ticket. It seems very reasonable to expect that 
std.algorithm.remove will work regardless of whether the elements 
are characters, integers, ubytes, etc.


If an initial step is to fix the documentation, it would be 
helpful to include specifically that it doesn't work with 
characters. It's not obvious that characters don't meet the 
requirement.


--Jon

Re: So, to print or not to print?

2016-04-26 Thread Jon D via Digitalmars-d


On Tuesday, 26 April 2016 at 16:30:22 UTC, Jonathan M Davis wrote:
On Tuesday, April 26, 2016 12:18:11 cym13 via Digitalmars-d 
wrote:
Finally it doesn't bring much. One learns writeln, laments a 
bit that it doesn't put spaces itself then just accepts it.


I confess that I was very surprised to find out that writeln 
worked with multiple arguments.


In my initial look at D I would have appreciated print. However, 
at least part of the reason is that it was a while before I knew 
writefln existed. After finding it (and discovering that writeln 
takes multiple arguments), having the functionality of print was 
less an issue.


It's not easy to reconstruct why it took me a while to discover 
writefln, but perhaps finding places to show it off in 
introductory material would help others find it more quickly.


--Jon

Re: Is there a way to disable 'dub test' for applications?

2016-04-18 Thread Jon D via Digitalmars-d-learn


On Monday, 18 April 2016 at 11:47:42 UTC, Dicebot wrote:

On Monday, 18 April 2016 at 04:25:25 UTC, Jon D wrote:
I have an dub config file specifying a targetType of 
'executable'. There is only one file, the file containing 
main(), and no unit tests.


When I run 'dub test', dub builds and runs the executable. 
This is not really desirable. Is there a way to set up the dub 
configuration file to disable running the test?


configuration "unittest" {
excludedSourceFiles "path/to/main.d"
}


Very nice, thank you. What also seems to work is:
configuration "unittest" {
targetType "none"
}

Then 'dub test' produces the message:
Configuration 'unittest' has target type "none". Skipping 
test.

Re: Is there a way to disable 'dub test' for applications?

2016-04-17 Thread Jon D via Digitalmars-d-learn


On Monday, 18 April 2016 at 05:30:21 UTC, Jonathan M Davis wrote:
On Monday, April 18, 2016 04:25:25 Jon D via 
Digitalmars-d-learn wrote:
I have an dub config file specifying a targetType of 
'executable'. There is only one file, the file containing 
main(), and no unit tests.


When I run 'dub test', dub builds and runs the executable. 
This is not really desirable. Is there a way to set up the dub 
configuration file to disable running the test?


Note: What I'd really like to do is run a custom shell command 
when 'dub test' is done, I haven't seen anything suggesting 
that's an option. However, disabling would still be useful.


What's the point of even running dub test if you have no unit 
tests? Just do dub build, and then use the resulting 
executable, or if you want to build and run in one command, 
then use dub run.


- Jonathan M Davis


I should have supplied more context. A few days ago I announced 
open-sourcing a D package consisting of several executables. 
Multiple comments recommended making it available via the Dub 
repository. I wasn't using Dub to build, and there are a number 
of loose ends when working with Dub and multiple executables. 
I've been trying to limit the number of issues others might 
encounter if they pulled the package and ran typical commands, 
like 'dub test'. It's not a big deal, but if there's an easy way 
to provide a handler, I will.


Also, the reason for a custom shell command is that there are 
tests, it's just that they are run against the built executable 
rather than via the unittest framework.


--Jon

Is there a way to disable 'dub test' for applications?

2016-04-17 Thread Jon D via Digitalmars-d-learn

I have an dub config file specifying a targetType of 
'executable'. There is only one file, the file containing main(), 
and no unit tests.


When I run 'dub test', dub builds and runs the executable. This 
is not really desirable. Is there a way to set up the dub 
configuration file to disable running the test?


Note: What I'd really like to do is run a custom shell command 
when 'dub test' is done, I haven't seen anything suggesting 
that's an option. However, disabling would still be useful.


--Jon

Specifying a minimum Phobos version in dub?

2016-04-17 Thread Jon D via Digitalmars-d-learn

Is there a way to specify a minimum Phobos version in a dub 
package specification?


--Jon

Re: Command line utilities for tab-separated value files

2016-04-13 Thread Jon D via Digitalmars-d-announce


On Wednesday, 13 April 2016 at 19:52:30 UTC, Walter Bright wrote:

On 4/11/2016 5:50 PM, Jon D wrote:
I'd welcome any feedback, either on the apps or the code. 
Intention is that the
code be reasonable example programs. And, I may write a blog 
post about my D
explorations at some point, they'd be referenced in such an 
article.



You've got questions on:


https://www.reddit.com/r/programming/comments/4ems6a/commandline_utilities_for_large_tabseparated/

!! As the author, it'd be nice to do an AMA there.


Thanks for posting there and letting me know. I responded and 
will watch the thread.


What do you mean by an "AMA"?

Re: Command line utilities for tab-separated value files

2016-04-13 Thread Jon D via Digitalmars-d-announce


On Wednesday, 13 April 2016 at 18:22:21 UTC, Dicebot wrote:

On Wednesday, 13 April 2016 at 17:21:58 UTC, Jon D wrote:
You don't need to put anything on path to run utils from dub 
packages. `dub run` will take care of setting necessary 
envionment (without messing with the system):


dub fetch package_with_apps
dub run package_with_apps:app1 --flags args


These are command line utilities, along the lines of unix 
'cut', 'grep', etc, intended to be used as part of unix 
pipeline. It'd be less convenient to be invoking them via dub. 
They really should be on the path themselves.


Sure, that would be beyond dub scope though. Making binary 
packages is independent of build system or source layout (and 
is highly platform-specific). The `dun run` feature is mostly 
helpful when you need to use one such tool as part of a build 
process for another dub package.


Right. So, partly what I'm wondering is if during the normal dub 
fetch/run cycle there might be an opportunity to print a message 
the user with some info to help them add the tools to their path. 
I haven't used dub much, so I'll have to look into it more. But 
there should be some way to make it reasonably easy and clear. 
It'll probably be a few days before I can get to this, but I 
would like to get them in the package registry.


--Jon

Re: Command line utilities for tab-separated value files

2016-04-13 Thread Jon D via Digitalmars-d-announce


On Wednesday, 13 April 2016 at 12:36:56 UTC, Dejan Lekic wrote:

On Tuesday, 12 April 2016 at 00:50:24 UTC, Jon D wrote:


I've open sourced a set of command line utilities for 
manipulating tab-separated value files.


I rarely need TSV files, but I deal with CSV files every day.
- It would be nice to test your implementation against std.csv 
(it can use TAB as separator). Did you try to compare the two?


No, I didn't try using the std.csv library utilities. The 
utilities all take a delimiter, so comma can be specified, but 
that won't handle CSV escaping.


For myself, I'd be more inclined to add TSV-CSV converters rather 
than adding native CSV support to each tool, but if you're 
working with CSV all the time that'd be nuisance.


If you want, you can try rewriting the inner loop of one of the 
tools to use csvNextToken rather than algorithm.splitter. 
tsv-select would be the easiest of the tools to try. It'd also be 
necessary to replace the writeln for the output to properly add 
CSV escapes.


--Jon

Re: Command line utilities for tab-separated value files

2016-04-13 Thread Jon D via Digitalmars-d-announce


On Wednesday, 13 April 2016 at 07:34:11 UTC, Rory McGuire wrote:
On Wed, Apr 13, 2016 at 3:41 AM, Puming via 
Digitalmars-d-announce < digitalmars-d-announce@puremagic.com> 
wrote:



On Tuesday, 12 April 2016 at 06:22:55 UTC, Puming wrote:

Here is what I know of it, using subPackages:



Just tried your suggestion and it works. I just added the below 
to the

parent project to get the apps build:
void main() {
import std.process : executeShell;
executeShell(`dub build :app1`);
executeShell(`dub build :app2`);
executeShell(`dub build :app3`);
}


Thanks Rory, Puming. I'll look into this and see how best to make 
it fit. I'm realizing also there's one additional capability it'd 
be nice to have in dub for tools like this, which in an option to 
install the executables somewhere that can be easily be put on 
the path. Still, even without this there'd be benefit to having 
them fetched via dub.


--Jon

Re: Command line utilities for tab-separated value files

2016-04-12 Thread Jon D via Digitalmars-d-announce


On Tuesday, 12 April 2016 at 06:22:55 UTC, Puming wrote:

On Tuesday, 12 April 2016 at 00:50:24 UTC, Jon D wrote:

Hi all,

I've open sourced a set of command line utilities for 
manipulating tab-separated value files. They are complementary 
to traditional unix tools like cut, grep, etc. They're useful 
for manipulating large data files. I use them when prepping 
files for R and similar tools. These tools were part of my 
'explore D' programming exercises.


[...]


Interesting, I have large csv files, and this lib will be 
useful.
Can you put it onto code.dlang.org so that we could use it with 
dub?


I'd certainly like to make it available via dub, but I wasn't 
sure how to set it up. There are two issues. One is that the 
package builds multiple executables, which dub doesn't seem to 
support easily. More problematic is that quite a bit of the test 
suite is run against the executables, which I could automate 
using make, but didn't see how to do it with dub.


If there are suggestions for setting this up in dub that'd be 
great. An example project doing something similar would be really 
helpful.


--Jon

Command line utilities for tab-separated value files

2016-04-11 Thread Jon D via Digitalmars-d-announce


Hi all,

I've open sourced a set of command line utilities for 
manipulating tab-separated value files. They are complementary to 
traditional unix tools like cut, grep, etc. They're useful for 
manipulating large data files. I use them when prepping files for 
R and similar tools. These tools were part of my 'explore D' 
programming exercises.


The tools are here: https://github.com/eBay/tsv-utils-dlang

They are likely of interest primarily to people regularly working 
with large files, though others might find the performance 
benchmarks of interest as well (included in the README).


I'd welcome any feedback, either on the apps or the code. 
Intention is that the code be reasonable example programs. And, I 
may write a blog post about my D explorations at some point, 
they'd be referenced in such an article.


--Jon

Re: Weak Purity Blog Post

2016-03-30 Thread Jon D via Digitalmars-d-announce


On Monday, 28 March 2016 at 01:44:02 UTC, sarn wrote:
D's implementation of functional purity supports "weak" purity 
- functions that can mutate arguments but are otherwise 
traditionally pure.


I wrote a post about some of the practical benefits of this 
kind of purity:


https://theartofmachinery.com/2016/03/28/dirtying_pure_functions_can_be_useful.html


Nice article. A suggestion: The point about improved testability 
when designing for purity is well made. In D, this is further 
supported by the ability to write and place unit tests alongside 
the functions themselves. That's my personal opinion at least - 
because unit test are so easy to write in D, it encourages design 
for testability.


My suggestion is to add a note about this to the post.

--Jon

Re: Pitching D to a gang of Gophers

2016-03-12 Thread Jon D via Digitalmars-d

On Saturday, 12 March 2016 at 08:09:41 UTC, Dmitry Olshansky 
wrote:

On 05-Mar-2016 14:05, Dmitry Olshansky wrote:
Obligatory slides:
http://slides.com/dmitryolshansky/deck/fullscreen/


Very nice slide deck. Thanks for publishing.  --Jon

Re: Speed kills

2016-03-09 Thread Jon D via Digitalmars-d


On Wednesday, 9 March 2016 at 20:30:10 UTC, Jon D wrote:


I seen a few cases while exploring D.

Turns out there are issues filed for each of the performance 
issues I mentioned:


* Lower casing strings:  
https://issues.dlang.org/show_bug.cgi?id=11229
* Large associative arrays:  
https://issues.dlang.org/show_bug.cgi?id=2504
* Associative arrays - Checking membership with mutable values 
(char arrays) rather strings (immutable):  
https://issues.dlang.org/show_bug.cgi?id=15038

Re: Speed kills

2016-03-09 Thread Jon D via Digitalmars-d


On Tuesday, 8 March 2016 at 14:14:25 UTC, ixid wrote:
Since I posted this thread I've learned std.algorithm.sum is 4 
times slower than a naive loop sum. Even if this is for reasons 
of accuracy this is exactly what I am talking about- this is a 
hidden iceberg of terrible performance that will reflect poorly 
on D. That's so slow the function needs a health warning.


I seen a few cases while exploring D. Not all fully researched 
(apologies for that), but since there appears to be interest in 
identification I'll list them.


* Lower-casing strings (likely upper-casing), and some character 
type checks.


Routines like to toLower and asLowerCase call functions that work 
for all unicode characters. I was able to create much faster 
versions by checking if the character was ascii, then calling 
either the ascii version or the more general version. Same is 
true for a few routines like isNumber. Some have the ascii check 
optimization built in, but not all.


If this optimization is added, it might also be useful to add a 
few common combinations (or a generic solution, if that's 
feasible). For example, to check if a character is alpha-numeric, 
one currently ORs two tests from the standard library, isAlpha 
and isNumber. Putting in an ascii optimization check requires 
putting it before doing the OR, rather than inside the tests 
being ORed.


* Large associative arrays

When associative arrays get beyond about 10 million entries 
performance starts to decline. I believe this is due to resizing 
the arrays. It's worse with strings as keys than integers as 
keys. Having a way to reserve capacity may help under some 
circumstances.


* Associative arrays - Converting keys to immutable versions for 
lookup


Associative arrays want immutable values as keys. Far as I can 
tell, immutable values are also required when performing a 
lookup, even if a new entry won't be stored. A couple apps I've 
written walk through large lists of text values, naturally 
available as char[] because they are read from input streams. To 
test presence in an associative array, it's necessary to copy 
them to immutable strings first. I haven't fully researched this 
one, but my experience is that copying from char[] to string 
becomes a meaningful cost.


On the surface, this appears to be an optimization opportunity, 
to create the immutable strings only when actually storing a new 
value.


--Jon

Re: Get memory usage report from GC

2016-02-19 Thread Jon D via Digitalmars-d-learn


On Saturday, 20 February 2016 at 05:34:01 UTC, tcak wrote:

On Saturday, 20 February 2016 at 05:33:00 UTC, tcak wrote:
Is there any way (I checked core.memory already) to collect 
report about memory usage from garbage collector? So, I can 
see a list of pointer and length information. Since getting 
this information would require another memory area in heap, it 
could be like logging when report is asked.


My long running but idle program starts using 41.7% of memory 
(that's close to 3GB), and it is not obvious whether the 
memory is allocated by a single variable, or many variables.


My mistake, it is close to 512MB.


Doesn't sounds like precisely what you want, but there are 
summary reports of GC activity available via the 
"--DRT-gcopt=profile:1" command line option. More info at: 
http://dlang.org/spec/garbage.html


--Jon

Re: Scala Spark-like RDD for D?

2016-02-16 Thread Jon D via Digitalmars-d-learn


On Wednesday, 17 February 2016 at 02:32:01 UTC, bachmeier wrote:

You can discuss here, but there is also a gitter room

https://gitter.im/DlangScience/public

Also, I've got a project that embeds R inside D

http://lancebachmeier.com/rdlang/

It's not quite as good a user experience as others because I 
have limited time for things not related to work. I've got an 
older project to embed D inside R, but it hasn't been updated 
in a while and it's Linux only.


https://bitbucket.org/bachmeil/dmdinline2


Excellent, thanks, I'll check these out.  --Jon

Re: Scala Spark-like RDD for D?

2016-02-16 Thread Jon D via Digitalmars-d-learn


On Tuesday, 16 February 2016 at 16:27:27 UTC, bachmeier wrote:
On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer 
wrote:


As an alternative are there plans for parallel/cluster 
computing frameworks for D?


You can use MPI:
https://github.com/DlangScience/OpenMPI


FWIW, I'm interested in the wider topic of incorporating D into 
data science environments also. Sounds as if there are several 
interesting projects in the area, but so far my understanding of 
them is limited. Perhaps the forum isn't the best place to 
discuss, but if there happen to be any blog posts or other 
descriptions, it'd be great to get links.


--Jon

Re: Reserving capacity in associative arrays

2016-02-16 Thread Jon D via Digitalmars-d-learn

On Tuesday, 16 February 2016 at 19:49:55 UTC, H. S. Teoh wrote:
On Tue, Feb 16, 2016 at 07:34:07PM +, Jon D via 
Digitalmars-d-learn wrote:
On Tuesday, 16 February 2016 at 16:37:07 UTC, Steven 
Schveighoffer wrote:

>On 2/14/16 10:22 PM, Jon D wrote:
>>Is there a way to reserve capacity in associative arrays?
>>[snip]
>>The underlying implementation of associative arrays appears 
>>to take
>>an initial number of buckets, and there's a private resize() 
>>method,

>>but it's not clear if there's a public way to use these.

Rehashing (aa.rehash) would resize the number of buckets, but 
if you don't already have the requisite number of keys, it 
wouldn't help.

Thanks for the reply and the detailed example for manually 
controlling GC. I haven't experimented with taking control over 
GC that way.

Regarding reserving capacity, the relevant method is aa.resize(), 
not aa.rehash(). See: 
https://github.com/D-Programming-Language/druntime/blob/master/src/rt/aaA.d#L141. This allocates space for the buckets, doesn't matter if the keys are known. Note that every time the buckets array is resized the old bucket array is walked and elements reinserted. Preallocating allocating a large bucket array would avoid this. See also the private constructor in the same file (line 51). It takes an initial size.

--Jon

Re: Reserving capacity in associative arrays

2016-02-16 Thread Jon D via Digitalmars-d-learn

On Tuesday, 16 February 2016 at 16:37:07 UTC, Steven 
Schveighoffer wrote:

On 2/14/16 10:22 PM, Jon D wrote:

Is there a way to reserve capacity in associative arrays?
[snip]
The underlying implementation of associative arrays
appears to take an initial number of buckets, and there's
a private resize() method, but it's not clear if there's a
public way to use these.


There is not a public way to access these methods unfortunately.

It would be a good addition to druntime I believe.

Recently, I added a clear method to the AA, which does not 
reduce capacity. So if you frequently build large AAs, and then 
throw them away, you could instead reuse the memory.



My programs build AAs lasting the lifetime of the program.


I would caution to be sure of this cause, however, before 
thinking it would solve the problem. The AA not only uses an 
array for buckets, but allocates a memory location for each 
element as well. I'm often wrong when I assume what the problem 
is when it comes to GC issues...


Completely agree. After posting I decided to take a more 
methodical look. Not finished yet, but I can share part of it. 
Key thing so far is noticeable step function related to GC costs 
related to AA size (likely not a surprise).


My programs work with large data sets. Size is open-ended, what 
I'm trying to do is get an idea of the data set sizes they will 
handle reasonably. For purposes of illustration, word-count is a 
reasonable proxy for what I'm doing. It was in this context that 
I saw significant performance drop-off after 'size_t[string]' AAs 
reached about 10 million entries.


I've started measuring with a simple program. Basically:

StopWatch sw;
sw.start;
size_t[size_t] counts;
foreach (i; 0..iterations)
counts[uniform(0, uniqMax)]++;
sw.stop;

Same thing with string as key ('size_t[string]') AAs. 
'iterations' and 'uniqMax' are varied between runs. GC stats are 
printed (via "--DRT-gcopt=profile:1"), plus timing and AA size. 
(Runs use LDC 17, release mode compiles, a fast 16GB MacBook).


For the integer as key case ('size_t[size_t]', there are notable 
jumps in GC total time and GC max pause time as AA size crosses 
specific size thresholds. This makes sense, as the AA needs to 
grow. Approximate steps:


| entries | gc_total (ms) | gc_max_pause (ms) |
|-+---+---|
| 2M  |30 |60 |
| 4M  |   200 |   100 |
| 12M |   650 |   330 |
| 22M |  1650 |   750 |
| 44M |  5300 |  3200 |

Iterations didn't matter, and gc total time and gc max time were 
largely flat between these jumps.


This suggests AA resize is the likely driver, and that 
preallocating a large size might address it.


To the point about being sure about cause - my programs use 
strings as keys, not integers. The performance drop-off with 
strings was quite a bit more significant than with integers. That 
analysis seems a bit trickier, I'm not done with that yet. 
Different memory allocation, perhaps effects from creating 
short-lived, temporary strings to test AA membership. Could 
easily be that string use or the combo of AAs with strings as key 
is a larger effect.


The other thing that jumps out from the table is the GC max pause 
time gets to be multiple seconds. Not an issue for my tools, 
which aren't interactive at those points, but would be 
significant issue for many interactive apps.


--Jon

Re: Reserving capacity in associative arrays

2016-02-16 Thread Jon D via Digitalmars-d-learn


On Tuesday, 16 February 2016 at 17:05:11 UTC, Basile B. wrote:
On Tuesday, 16 February 2016 at 16:37:07 UTC, Steven 
Schveighoffer wrote:
There is not a public way to access these methods 
unfortunately.


It would be a good addition to druntime I believe.

-Steve


After reading the topic i've added this enhancement proposal, 
not quite sure if it's possible:


https://issues.dlang.org/show_bug.cgi?id=15682

The idea is to concatenate smallers AA into the destination.


There is also this: https://issues.dlang.org/show_bug.cgi?id=2504

Re: Reserving capacity in associative arrays

2016-02-14 Thread Jon D via Digitalmars-d-learn


On Monday, 15 February 2016 at 05:29:23 UTC, sigod wrote:

On Monday, 15 February 2016 at 03:22:44 UTC, Jon D wrote:

Is there a way to reserve capacity in associative arrays?
[snip]


Maybe try using this: http://code.dlang.org/packages/aammm


Thanks, I wasn't aware of this package. I'll give it a try.

--Jon

Reserving capacity in associative arrays

2016-02-14 Thread Jon D via Digitalmars-d-learn

Is there a way to reserve capacity in associative arrays? In some 
programs I've been writing I've been getting reasonable 
performance up to about 10 million entries, but beyond that 
performance is impacted considerably (say, 30 million or 50 
million entries). GC stats (via the "--DRT-gcopt=profile:1" 
option) indicate dramatic increases in gc time, which I'm 
assuming comes from resizing the underlying hash table. I'm 
guessing that by preallocating a large size the performance 
degradation would not be quite so dramatic. The underlying 
implementation of associative arrays appears to take an initial 
number of buckets, and there's a private resize() method, but 
it's not clear if there's a public way to use these.


--Jon

Re: Vision for the first semester of 2016

2016-01-29 Thread Jon D via Digitalmars-d-announce

On Monday, 25 January 2016 at 02:37:40 UTC, Andrei Alexandrescu 
wrote:

Hot off the press! http://wiki.dlang.org/Vision/2016H1 -- Andrei


A couple comments:
a) No mention of targeting increased organizational participation 
(academic, corporate, etc). Not trying to suggest it should or 
shouldn't be a goal. Just that if it is goal meaningful effort 
will be directed toward in H1 then it'd be worth including in the 
writeup.


b) More specificity in the roadmap and priorities, to the extent 
they are known - As a potential D adopter, it'd be useful to have 
better insight into where the language might be a year or two 
out. For example, what forms of C++ integration might be 
available, or if the major components of the standard library are 
likely to be available nogc. However, it's hard to discern this 
from the writeup. Perhaps in many cases it would be premature to 
establish such goals, but to the extent there has been concrete 
thought it'd be useful to write it up. This comment is similar to 
a number of others suggesting a preference for more concrete 
goals.


--Jon

Difference between toLower() and asLowerCase() for strings?

2016-01-24 Thread Jon D via Digitalmars-d-learn

I'm trying to identify the preferred ways to lower case a string. 
In std.uni there are two functions that return the lower case 
form of a string: toLower() and asLowerCase(). There is also 
toLowerInPlace().


I'm having trouble figuring out what the relationship is between 
these, and when to prefer one over the other. Both take a 
strings, asLowerCase also takes range. Otherwise, I couldn't find 
the differences in the documentation. Implementations are 
apparently different, but not clear what the real difference is.


Are there reasons to prefer one over the other?

--Jon

Re: Difference between toLower() and asLowerCase() for strings?

2016-01-24 Thread Jon D via Digitalmars-d-learn


On Sunday, 24 January 2016 at 21:04:46 UTC, Adam D. Ruppe wrote:

On Sunday, 24 January 2016 at 20:56:20 UTC, Jon D wrote:
I'm trying to identify the preferred ways to lower case a 
string. In std.uni there are two functions that return the 
lower case form of a string: toLower() and asLowerCase(). 
There is also toLowerInPlace().


toLower will allocate a new string, leaving the original 
untouched.


toLowerInPlace will modify the existing string.

asLowerCase will returned the modified data as you iterate over 
it, but will not actually allocate the new string.


[snip...]

As a general rule, the asLowerCase (etc.) version should be 
your first go since it is the most efficient. But the others 
are around for convenience in cases where you need a new string 
built anyway.


Great explanation, thank you!

Re: Speed of csvReader

2016-01-21 Thread Jon D via Digitalmars-d-learn


On Thursday, 21 January 2016 at 22:20:28 UTC, H. S. Teoh wrote:
On Thu, Jan 21, 2016 at 10:09:24PM +, Jon D via 
Digitalmars-d-learn wrote: [...]
FWIW - I've been implementing a few programs manipulating 
delimited files, e.g. tab-delimited. Simpler than CSV files 
because there is no escaping inside the data. I've been trying 
to do this in relatively straightforward ways, e.g. using 
byLine rather than byChunk. (Goal is to explore the power of D 
standard libraries).


I've gotten significant speed-ups in a couple different ways:
* DMD libraries 2.068+  -  byLine is dramatically faster
* LDC 0.17 (alpha)  -  Based on DMD 2.068, and faster than the 
DMD compiler


While byLine has improved a lot, it's still not the fastest 
thing in the world, because it still performs (at least) one OS 
roundtrip per line, not to mention it will auto-reencode to 
UTF-8. If your data is already in a known encoding, reading in 
the entire file and casting to (|w|d)string then splitting it 
by line will be a lot faster, since you can eliminate a lot of 
I/O roundtrips that way.


No disagreement, but I had other goals. At a high level, I'm 
trying to learn and evaluate D, which partly involves 
understanding the strengths and weaknesses of the standard 
library. From this perspective, byLine was a logical starting 
point. More specifically, the tools I'm writing are often used in 
unix pipelines, so input can be a mixture of standard input and 
files. And, the files can be arbitrarily large. In these cases, 
reading the entire file is not always appropriate. Buffering 
usually is, and my code knows when it is dealing with files vs 
standard input and could handle these differently. However, 
standard library code could handle these distinctions as well, 
which was part of the reason for trying the straightforward 
approach.


Aside - Despite the 'learning D' motivation, the tools are real 
tools, and writing them in D has been a clear win, especially 
with the byLine performance improvements in 2.068.

Re: Speed of csvReader

2016-01-21 Thread Jon D via Digitalmars-d-learn

On Thursday, 21 January 2016 at 09:39:30 UTC, data pulverizer 
wrote:
I have been reading large text files with D's csv file reader 
and have found it slow compared to R's read.table function 
which is not known to be particularly fast.


FWIW - I've been implementing a few programs manipulating 
delimited files, e.g. tab-delimited. Simpler than CSV files 
because there is no escaping inside the data. I've been trying to 
do this in relatively straightforward ways, e.g. using byLine 
rather than byChunk. (Goal is to explore the power of D standard 
libraries).


I've gotten significant speed-ups in a couple different ways:
* DMD libraries 2.068+  -  byLine is dramatically faster
* LDC 0.17 (alpha)  -  Based on DMD 2.068, and faster than the 
DMD compiler
* Avoid utf-8 to dchar conversion - This conversion often occurs 
silently when working with ranges, but is generally not needed 
when manipulating data.
* Avoid unnecessary string copies. e.g. Don't gratuitously 
convert char[] to string.


At this point performance of the utilities I've been writing is 
quite good. They don't have direct equivalents with other tools 
(such as gnu core utils), so a head-to-head is not appropriate, 
but generally it seems the tools are quite competitive without 
needing to do my own buffer or memory management. And, they are 
dramatically faster than the same tools written in perl (which I 
was happy with).


--Jon

function argument accepting function or delegate?

2016-01-16 Thread Jon D via Digitalmars-d-learn

My underlying question is how to compose functions taking 
functions as arguments, while allowing the caller the flexibility 
to pass either a function or delegate.


Simply declaring an argument as either a function or delegate 
seems to prohibit the other. Overloading works. Are there better 
ways?


An example:

auto callIntFn (int function(int) f, int x) { return f(x); }
auto callIntDel (int delegate(int) f, int x) { return f(x); }
auto callIntFnOrDel (int delegate(int) f, int x) { return f(x); }
auto callIntFnOrDel (int function(int) f, int x) { return f(x); }

void main(string[] args) {
alias AddN = int delegate(int);
AddN makeAddN(int n) { return x => x + n; }

auto addTwo = makeAddN(2);// Delegate
int function(int) addThree = x => x + 3;  // Function

// assert(callIntFn(addTwo, 4) == 6); // Compile error
// assert(callIntDel(addThree, 4) == 7);  // Compile error
assert(callIntDel(addTwo, 4) == 6);
assert(callIntFn(addThree, 4) == 7);
assert(callIntFnOrDel(addTwo, 4) == 6);
assert(callIntFnOrDel(addThree, 4) == 7);
}

---Jon

Re: function argument accepting function or delegate?

2016-01-16 Thread Jon D via Digitalmars-d-learn


On Sunday, 17 January 2016 at 06:49:23 UTC, rsw0x wrote:

On Sunday, 17 January 2016 at 06:27:41 UTC, Jon D wrote:
My underlying question is how to compose functions taking 
functions as arguments, while allowing the caller the 
flexibility to pass either a function or delegate.


[...]


Templates are an easy way.

---
auto call(F, Args...)(F fun, auto ref Args args) { return 
fun(args); }

---
Would probably look nicer with some constraints from std.traits.


Thanks much, that works!

Re: Silicon Valley D Meetup December 17, 2015

2015-12-18 Thread Jon D via Digitalmars-d-announce

On Friday, 18 December 2015 at 16:01:48 UTC, Andrei Alexandrescu 
wrote:

On 12/17/2015 10:07 PM, Ali Cehreli wrote:
On Thursday, 17 December 2015 at 17:41:30 UTC, Ali Çehreli 
wrote:

On 12/12/2015 05:03 PM, Ali Çehreli wrote:
Our guest speaker is Steven Schveighoffer. He will present 
"Mutability

wildcards in D":


How was it? -- Andrei


From a newcomer's perspective (my 2nd meet-up) - Excellent. 
Steve's presentation improved my understanding of the language, 
and the opportunity for discussions with core members of the D 
community is fantastic. Thanks to Steve, Ali, and Truedat for 
putting this together.


--Jon

Re: We need better documentation for functions with ranges and templates

2015-12-16 Thread Jon D via Digitalmars-d


On Monday, 14 December 2015 at 19:04:46 UTC, bachmeier wrote:
Something has to be done with the documentation for Phobos 
functions that involve ranges and templates.


Many useful ideas in this thread. One I don't recall seeing - a 
standard way to denote whether a routine is lazy or eager. I'm 
finding this to be a key piece of information. Many standard 
library routines document this in the description, but presence 
and presentation is not very consistent. It'd be nice to have 
this presented in a standard way for routines operating on ranges.


--Jon

Re: Why should file names intended for executables be valid identifiers?

2015-12-14 Thread Jon D via Digitalmars-d-learn

On Tuesday, 15 December 2015 at 03:31:18 UTC, Shriramana Sharma 
wrote:


For instance, hyphens are often used as part of executable 
names on Linux, but if I do this:


$ dmd usage-printer.d

I get the following error:

usage-printer.d: Error: module usage-printer has non-identifier 
characters in filename, use module declaration instead




Try adding the line:

module usage_printer;

at the top of the file. This overrides the default module name 
(same as file name).


--Jon

Re: Reason for 'static struct'

2015-12-09 Thread Jon D via Digitalmars-d-learn


On Wednesday, 9 December 2015 at 21:23:03 UTC, Daniel Kozák wrote:

V Wed, 09 Dec 2015 21:10:43 +
Jon D via Digitalmars-d-learn 
<digitalmars-d-learn@puremagic.com>

napsáno:

There is a fair bit of range related code in the standard 
library structured like:


 auto MyRange(Range)(Range r)
 if (isInputRange!Range)
  {
 static struct Result
 {
 private Range source;
 // define empty, front, popFront, etc
 }
 return Result(r);
 }

I'm curious about what declaring the Result struct as 'static' 
does, and if there are use cases where it be better to exclude 
the static qualifier.


--Jon


It make it non-nested struct: 
https://dlang.org/spec/struct.html#nested


Thanks. So, is in the example above, would the advantage be that 
'static' avoids saving the enclosing state, which is not needed?

Reason for 'static struct'

2015-12-09 Thread Jon D via Digitalmars-d-learn

There is a fair bit of range related code in the standard library 
structured like:


auto MyRange(Range)(Range r)
if (isInputRange!Range)
 {
static struct Result
{
private Range source;
// define empty, front, popFront, etc
}
return Result(r);
}

I'm curious about what declaring the Result struct as 'static' 
does, and if there are use cases where it be better to exclude 
the static qualifier.


--Jon

Re: block file reads and lazy utf-8 decoding

2015-12-09 Thread Jon D via Digitalmars-d-learn


On Thursday, 10 December 2015 at 00:36:27 UTC, Jon D wrote:
Question I have is if there is a better way to do this. For 
example, a different way to construct the lazy 
'decodeUTF8Range' rather than writing it out in this fashion.


A further thought - The decodeUTF8Range function is basically 
constructing a lazy wrapper range around decodeFront, which is 
effectively combining a 'front' and 'popFront' operation. So 
perhaps a generic way to compose a wrapper for such functions.




auto decodeUTF8Range(Range)(Range charSource)
if (isInputRange!Range && is(Unqual!(ElementType!Range) == 
char))

{
static struct Result
{
private Range source;
private dchar next;

bool empty = false;
dchar front() @property { return next; }
void popFront() {
if (source.empty) {
empty = true;
next = dchar.init;
} else {
next = source.decodeFront;
}
}
}
auto r = Result(charSource);
r.popFront;
return r;
}

block file reads and lazy utf-8 decoding

2015-12-09 Thread Jon D via Digitalmars-d-learn

I want to combine block reads with lazy conversion of utf-8 
characters to dchars. Solution I came with is in the program 
below. This works fine. Has good performance, etc.


Question I have is if there is a better way to do this. For 
example, a different way to construct the lazy 'decodeUTF8Range' 
rather than writing it out in this fashion. There is quite a bit 
of power in the library and I'm still learning it. I'm wondering 
if I overlooked a useful alternative.


--Jon

Program:
---

import std.algorithm: each, joiner, map;
import std.conv;
import std.range;
import std.stdio;
import std.traits;
import std.utf: decodeFront;

auto decodeUTF8Range(Range)(Range charSource)
if (isInputRange!Range && is(Unqual!(ElementType!Range) == 
char))

{
static struct Result
{
private Range source;
private dchar next;

bool empty = false;
dchar front() @property { return next; }
void popFront() {
if (source.empty) {
empty = true;
next = dchar.init;
} else {
next = source.decodeFront;
}
}
}
auto r = Result(charSource);
r.popFront;
return r;
}

void main(string[] args)
{
if (args.length != 2) { writeln("Provide one file name."); 
return; }


ubyte[1024*1024] rawbuf;
auto inputStream = args[1].File();
inputStream
.byChunk(rawbuf)// Read in blocks
.joiner // Join the blocks into a single 
input char range
.map!(a => to!char(a))  // Cast ubyte to char for 
decodeFront. Any better ways?

.decodeUTF8Range// utf8 to dchar conversion.
.each;  // Real work goes here.
writeln("done");
}

Re: Wiki article: Starting as a Contributor

2015-12-03 Thread Jon D via Digitalmars-d


On Tuesday, 1 December 2015 at 18:58:37 UTC, Jack Stouffer wrote:
On Monday, 3 August 2015 at 21:25:35 UTC, Andrei Alexandrescu 
wrote:
I had to set up dmd and friends on a fresh Ubuntu box, so I 
thought I'd document the step-by-step process:


http://wiki.dlang.org/Starting_as_a_Contributor


Due to a realization that there were three places were 
contributing info was held on the wiki, I have merged the pages 
into this one as best as I could. This page now holds 
everything someone should need to get started.


I suggest also having the description of the legal aspects of 
contributing identified in an easier to find location. There is a 
brief summary of copyright assignment in the Starting as a 
Contributor page 
(http://wiki.dlang.org/Starting_as_a_Contributor#Copyright_assignment), but it's not particularly easy to find.


Similarly regarding licensing. I was able to find two statements 
in the FAQ page ("Is D open source", "Why does the standard 
library use the boost license? Why not public domain", but wasn't 
especially easy to find these.


Could be I'm just looking in the wrong places for this info, but 
a clear link from the home page might be worthwhile.


--Jon

Re: copy and array length vs capacity. (Doc suggestion?)

2015-11-23 Thread Jon D via Digitalmars-d-learn

On Tuesday, 24 November 2015 at 01:00:40 UTC, Steven 
Schveighoffer wrote:

On 11/23/15 7:29 PM, Ali Çehreli wrote:

On 11/23/2015 04:03 PM, Steven Schveighoffer wrote:
 > On 11/23/15 4:29 PM, Jon D wrote:

 >> In the example I gave, what I was really wondering was if 
there is a
 >> difference between allocating with 'new' or with 
'reserve', or with
 >> 'length', for that matter. That is, is there a material 
difference

 >> between:
 >>
 >>  auto x = new int[](n);
 >>  int[] y; y.length = n;
 >
 > There is no difference at all, other than the function that 
is called
 > (the former will call an allocation function, the latter 
will call a
 > length setting function, which then will determine if more 
data is

 > needed, and finding it is, call the allocation function).

Although Jon's example above does not compare reserve, I have 
to ask:
How about non-trivial types? Both cases above would set all 
elements to
..init, right? So, I think reserve would be faster if copy() 
knew how to
take advantage of capacity. It could emplace elements instead 
of

copying, no?

I think the cost of looking up the array metadata is more than 
the initialization of elements to .init. However, using an 
Appender would likely fix all these problems.

You could also use 
https://dlang.org/phobos/std_array.html#uninitializedArray to 
create the array before copying. There are quite a few options, 
actually :)

A delegate is also surprisingly considered an output range! 
Because why not? So you can do this too as a crude substitute 
for appender (or for testing performance):

import std.range; // for iota
import std.algorithm;

void main()
{
   int[] arr;
   arr.reserve(100);

   iota(100).copy((int a) { arr ~= a;});
}

-Steve

Thanks. I was also wondering if that initial allocation could be 
avoided. Code I was writing involved repeatedly using a buffer in 
a loop. I was trying out taskPool.amap, which needs a random 
access range. This meant copying from the input range being read. 
Something like:

auto input = anInfiniteRange();
auto bufsize = workPerThread * taskPool.size();
auto workbuf = new int[](bufsize);
auto results = new int[](bufsize);
while (true) {
input.take(bufsize).copy(workbuf);
input.popFront(bufsize);
taskPool.amap!expensiveCalc(workbuf, workPerThread, 
results);

results.doSomething();
}

I'm just writing a toy example, but it is where these questions 
came from. For this example, the next step would be to allow the 
buffer size to change while iterating.

--Jon

Re: copy and array length vs capacity. (Doc suggestion?)

2015-11-23 Thread Jon D via Digitalmars-d-learn

On Monday, 23 November 2015 at 15:19:08 UTC, Steven Schveighoffer 
wrote:

On 11/21/15 10:19 PM, Jon D wrote:
On Sunday, 22 November 2015 at 00:31:53 UTC, Jonathan M Davis 
wrote:


Honestly, arrays suck as output ranges. They don't get 
appended to;
they get filled, and for better or worse, the documentation 
for copy
is probably assuming that you know that. If you want your 
array to be
appended to when using it as an output range, then you need 
to use

std.array.Appender.

Hi Jonathan, thanks for the reply and the info about 
std.array.Appender.
I was actually using copy to fill an array, not append. 
However, I also
wanted to preallocate the space. And, since I'm mainly trying 
to
understand the language, I was also trying to figure out the 
difference
between these two forms of creating a dynamic array with an 
initial size:


auto x = new int[](n);
int[] y;  y.reserve(n);


If you want to change the size of the array, use length:

y.length = n;

This will extend y to the correct length, automatically 
reserving a block of data that can hold it, and allow you to 
write to the array.


All reserve does is to make sure there is enough space so you 
can append that much data to it. It is not relevant to your use 
case.


The obvious difference is that first initializes n values, the 
second

form does not. I'm still unclear if there are other material
differences, or when one might be preferred over the other :) 
It's was
in this context the behavior of copy surprised me, that it 
wouldn't
operate on the second form without first filling in the 
elements. If
this seems unclear, I can provide a slightly longer sample 
showing what

I was doing.


extending length affects the given array, extending if 
necessary. reserve is ONLY relevant if you are using appending 
(arr ~= x). It doesn't actually affect the "slice" or the 
variable you are using, at all (except to possibly point it at 
newly allocated space).


copy uses an "output range" as it's destination. The output 
range supports taking elements and putting them somewhere. In 
the case of a simple array, putting them somewhere means 
assigning to the first element, and then moving to the next one.


-Steve


Thanks for the reply. And for your article (which Jonathan 
recommended). It clarified a number of things.


In the example I gave, what I was really wondering was if there 
is a difference between allocating with 'new' or with 'reserve', 
or with 'length', for that matter. That is, is there a material 
difference between:


auto x = new int[](n);
int[] y; y.length = n;

I can imagine that the first might be faster, but otherwise there 
appears no difference. As the article stresses, the question is 
the ownership model. If I'm understanding, both cause an 
allocation into the runtime managed heap.


--Jon

Re: copy and array length vs capacity. (Doc suggestion?)

2015-11-21 Thread Jon D via Digitalmars-d-learn


On Sunday, 22 November 2015 at 00:10:07 UTC, Ali Çehreli wrote:
May I suggest that you improve that page. ;) If you don't 
already have a clone o the repo, you can do it easily by 
clicking the "Improve this page" button on that page.


Hi Ali, thanks for the quick response. And point taken :)  I 
hadn't noticed those buttons on the doc pages, looks very 
convenient. There are a couple formalities I need to look into 
before making contributions, even small ones, but I'll check into 
these.


Regarding why copy() cannot use the capacity of the slice, it 
is because slices don't know about each other, so, copy could 
not let other slices know that the capacity has just been used 
by this particular slice.


Thanks for the explanation, very helpful understanding what's 
going on.


--Jon

copy and array length vs capacity. (Doc suggestion?)

2015-11-21 Thread Jon D via Digitalmars-d-learn

Something I found confusing was the relationship between array 
capacity and copy(). A short example:


void main()
{
import std.algorithm: copy;

auto a = new int[](3);
assert(a.length == 3);
[1, 2, 3].copy(a); // Okay

int[] b;
b.reserve(3);
assert(b.capacity >= 3);
assert(b.length == 0);
[1, 2, 3].copy(b); // Error
}

I had expected that copy() would work if the target had 
sufficient capacity, but that's not the case. Target has to have 
sufficient length.


If I've understood this correctly, a small change to the 
documentation for copy() might make this clearer. In particular, 
the "precondition" section:


Preconditions:
target shall have enough room to accomodate the entirety of 
source.


Clarifying that "enough room" means 'length' rather than 
'capacity' might be beneficial.

Re: copy and array length vs capacity. (Doc suggestion?)

2015-11-21 Thread Jon D via Digitalmars-d-learn

On Sunday, 22 November 2015 at 00:31:53 UTC, Jonathan M Davis 
wrote:


Honestly, arrays suck as output ranges. They don't get appended 
to; they get filled, and for better or worse, the documentation 
for copy is probably assuming that you know that. If you want 
your array to be appended to when using it as an output range, 
then you need to use std.array.Appender.


Hi Jonathan, thanks for the reply and the info about 
std.array.Appender. I was actually using copy to fill an array, 
not append. However, I also wanted to preallocate the space. And, 
since I'm mainly trying to understand the language, I was also 
trying to figure out the difference between these two forms of 
creating a dynamic array with an initial size:


   auto x = new int[](n);
   int[] y;  y.reserve(n);

The obvious difference is that first initializes n values, the 
second form does not. I'm still unclear if there are other 
material differences, or when one might be preferred over the 
other :) It's was in this context the behavior of copy surprised 
me, that it wouldn't operate on the second form without first 
filling in the elements. If this seems unclear, I can provide a 
slightly longer sample showing what I was doing.


--Jon

compatible types for chains of different lengths

2015-11-17 Thread Jon D via Digitalmars-d-learn

I'd like to chain several ranges and operate on them. However, if 
the chains are different lengths, the data type is different. 
This makes it hard to use in a general way. There is likely an 
alternate way to do this that I'm missing.


A short example:

$ cat chain.d
import std.stdio;
import std.range;
import std.algorithm;

void main(string[] args)
{
auto x1 = ["abc", "def", "ghi"];
auto x2 = ["jkl", "mno", "pqr"];
auto x3 = ["stu", "vwx", "yz"];
auto chain1 = (args.length > 1) ? chain(x1, x2) : chain(x1);
auto chain2 = (args.length > 1) ? chain(x1, x2, x3) : 
chain(x1, x2);

chain1.joiner(", ").writeln;
chain2.joiner(", ").writeln;
}
$ dmd chain.d
chain.d(10): Error: incompatible types for ((chain(x1, x2)) : 
(chain(x1))): 'Result' and 'string[]'
chain.d(11): Error: incompatible types for ((chain(x1, x2, x3)) : 
(chain(x1, x2))): 'Result' and 'Result'


Is there a different way to do this?

--Jon

Re: compatible types for chains of different lengths

2015-11-17 Thread Jon D via Digitalmars-d-learn


On Tuesday, 17 November 2015 at 23:22:58 UTC, Brad Anderson wrote:


One solution:

 [snip]



Thanks for the quick response. Extending your example, here's 
another style that works and may be nicer in some cases.


import std.stdio;
import std.range;
import std.algorithm;

void main(string[] args)
{
auto x1 = ["abc", "def", "ghi"];
auto x2 = ["jkl", "mno", "pqr"];
auto x3 = ["stu", "vwx", "yz"];

auto y1 = (args.length > 1) ? x1 : [];
auto y2 = (args.length > 2) ? x2 : [];
auto y3 = (args.length > 3) ? x3 : [];

chain(y1, y2, y3).joiner(", ").writeln;
}

Preferred behavior of take() with ranges (value vs reference range)

2015-11-08 Thread Jon D via Digitalmars-d-learn


Just started looking at D, very promising!

One of the first programs I constructed involved infinite 
sequences. A design question  that showed up is whether to 
construct the range as a struct/value, or class/reference. It 
appears that structs/values are more the norm, but there are 
exceptions, notably refRange. I'm wondering if there are any 
community best practices or guidelines in this area.


One key difference is the behavior of take(). If the range is a 
value/struct, take() does not consume elements. If it's a 
ref/class, it does consume elements. From a consistency 
perspective, it'd seem useful if the behavior was consistent as 
much as possible.


Here's an example of the behavior differences below. It uses 
refRange, but same behavior occurs if the range is created as a 
class rather than a struct.


import std.range;
import std.algorithm;

void main() {
auto fib1 = recurrence!((a,n) => a[n-1] + a[n-2])(1, 1);
auto fib2 = recurrence!((a,n) => a[n-1] + a[n-2])(1, 1);
auto fib3 = refRange();

// Struct/value based range - take() does not consume elements
assert(fib1.take(7).equal([1, 1, 2, 3, 5, 8, 13]));
assert(fib1.take(7).equal([1, 1, 2, 3, 5, 8, 13]));
fib1.popFrontN(7);
assert(fib1.take(7).equal([21, 34, 55, 89, 144, 233, 377]));

// Reference range (fib3) - take() consumes elements
assert(fib2.take(7).equal([1, 1, 2, 3, 5, 8, 13]));
assert(fib3.take(7).equal([1, 1, 2, 3, 5, 8, 13]));
assert(fib3.take(7).equal([21, 34, 55, 89, 144, 233, 377]));
assert(fib2.take(7).equal([610, 987, 1597, 2584, 4181, 6765, 
10946]));
assert(fib2.take(7).equal([610, 987, 1597, 2584, 4181, 6765, 
10946]));

}

--Jon

Re: Preferred behavior of take() with ranges (value vs reference range)

2015-11-08 Thread Jon D via Digitalmars-d-learn


On Monday, 9 November 2015 at 02:44:48 UTC, TheFlyingFiddle wrote:

On Monday, 9 November 2015 at 02:14:58 UTC, Jon D wrote:
Here's an example of the behavior differences below. It uses 
refRange, but same behavior occurs if the range is created as 
a class rather than a struct.

--Jon


This is an artifact of struct based ranges being value types. 
When you use take the range get's copied into another structure 
that is also a range but limits the number of elements you take 
from that range.


...

If you want a more indepth explanation there were two talks at 
Dconf this year that (in part) discussed this topic. 
(https://www.youtube.com/watch?v=A8Btr8TPJ8c, 
https://www.youtube.com/watch?v=QdMdH7WX2ew=PLEDeq48KhndP-mlE-0Bfb_qPIMA4RrrKo=14)


Thanks for the quick reply. The two videos were very helpful. I 
understood what was happening underneath (mostly), but the videos 
made it clear there are a number of open questions regarding 
reference and value ranges and how best to use them.

54 matches

Mail list logo