Re: sort, .array and folding on immutable data (finding most common character in column of matrix)

2016-12-19 Thread Ali via Digitalmars-d-learn

On Monday, 19 December 2016 at 14:09:47 UTC, pineapple wrote:
This is a shortcoming of Phobos - here is a package of sorting 
algorithms including some that do not require their inputs to 
be mutable, random access, and/or finite:


https://github.com/pineapplemachine/mach.d/tree/master/mach/range/sort


Oh nice! Will take a looksies

It's worth noting that giving up eagerness, random access, etc. 
often comes with a speed penalty. It may be more efficient just 
to copy the lazy things into memory first and then sort 
in-place, as you have been doing.


True dat!




Re: sort, .array and folding on immutable data (finding most common character in column of matrix)

2016-12-19 Thread Ali via Digitalmars-d-learn
On Monday, 19 December 2016 at 12:45:48 UTC, Nicholas Wilson 
wrote:


Ah, oh well. It was nice in theory.



Indeed. Thank you for trying Nicholas :)



auto word = data.map!(reduce!max).array.map!"a[1]".array;

you want

auto word = data.map!"a[1]".map!(reduce!max).array;



Problem max has to performed on the frequency part of the tuple 
and "word" has to result in a concatenation of all the character 
parts of the highest frequencied letters. So that up there will 
result in "word" = 364782 // or something.




Re: sort, .array and folding on immutable data (finding most common character in column of matrix)

2016-12-19 Thread pineapple via Digitalmars-d-learn

On Monday, 19 December 2016 at 09:24:38 UTC, Ali wrote:
Ok so laziness stops as soon as sort is required on a range 
then? Ahh, because in place algorithms? Are there any plans in 
D to make is to that you can output copies to collections so 
that you could do something like filter.transpose.sort and just 
have it output a modified copy?


This is a shortcoming of Phobos - here is a package of sorting 
algorithms including some that do not require their inputs to be 
mutable, random access, and/or finite:


https://github.com/pineapplemachine/mach.d/tree/master/mach/range/sort

The library is permissively licensed; feel free to take out 
whatever you need.


It's worth noting that giving up eagerness, random access, etc. 
often comes with a speed penalty. It may be more efficient just 
to copy the lazy things into memory first and then sort in-place, 
as you have been doing.




Re: sort, .array and folding on immutable data (finding most common character in column of matrix)

2016-12-19 Thread Nicholas Wilson via Digitalmars-d-learn

On Monday, 19 December 2016 at 11:42:55 UTC, Ali wrote:
On Monday, 19 December 2016 at 10:03:34 UTC, Nicholas Wilson 
wrote:

[...]


The seedless version without the typeof(a)(b[0], b[1]) hack 
(with default inited T) seems to crap out with:


[...]


Ah, oh well. It was nice in theory.



[...]


auto word = data.map!(reduce!max).array.map!"a[1]".array;

you want

auto word = data.map!"a[1]".map!(reduce!max).array;

which will try to take the max of a single variable, not a tuple, 
which should succeed.


Re: sort, .array and folding on immutable data (finding most common character in column of matrix)

2016-12-19 Thread Ali via Digitalmars-d-learn
On Monday, 19 December 2016 at 10:03:34 UTC, Nicholas Wilson 
wrote:

On Monday, 19 December 2016 at 09:24:38 UTC, Ali wrote:
On Monday, 19 December 2016 at 00:11:49 UTC, Nicholas Wilson 
wrote:

[...]


Ok so laziness stops as soon as sort is required on a range 
then?


No. Because a lazy range is not random access, and therefore 
does not meet sorts requirement.



Ahh, because in place algorithms?


Yes


Are there any plans in
D to make is to that you can output copies to collections so 
that you could do something like filter.transpose.sort and 
just have it output a modified copy?


None that I know.



[...]


Hmm, for the other problem you could do

chain(only(T(dchar.max, uint.max)),range)

and still use the seedless version.


Thanks for input so far!


The seedless version without the typeof(a)(b[0], b[1]) hack (with 
default inited T) seems to crap out with:


"Unable to deduce an acceptable seed type for __lambda2 with 
element type immutable(Tuple!(dchar, uint))."



BTW: Your chain fix does indeed make the T(dchar.init, uint.init) 
seeded fold work on a seedless version of fold :D. Only problem 
is the need for the typeof hack again though.



Some more info: The following alternate version of this program 
works great.


// Swapping the tuple elements that are produced by "group" 
around is necessary for reduce!max to work.
auto data = 
import("data_06.txt").split("\n").transposed.map!`a.array.sort().group.map!"tuple(a[1], a[0])".array`.array;

void main() {
auto word = data.map!(reduce!max).array.map!"a[1]".array;
word.writeln;
}

But this stops working as soon as you take the "word" variable 
outside the scope of main and in to global scope. If you make 
everything static immutable then you get:


"Error: static assert  "Unable to deduce an acceptable seed type 
for std.algorithm.comparison.max with element type 
immutable(Tuple!(uint, dchar))."


Re: sort, .array and folding on immutable data (finding most common character in column of matrix)

2016-12-19 Thread Nicholas Wilson via Digitalmars-d-learn

On Monday, 19 December 2016 at 09:24:38 UTC, Ali wrote:
On Monday, 19 December 2016 at 00:11:49 UTC, Nicholas Wilson 
wrote:

[...]


Ok so laziness stops as soon as sort is required on a range 
then?


No. Because a lazy range is not random access, and therefore does 
not meet sorts requirement.



Ahh, because in place algorithms?


Yes


Are there any plans in
D to make is to that you can output copies to collections so 
that you could do something like filter.transpose.sort and just 
have it output a modified copy?


None that I know.



[...]


Hmm, for the other problem you could do

chain(only(T(dchar.max, uint.max)),range)

and still use the seedless version.


Thanks for input so far!


Re: sort, .array and folding on immutable data (finding most common character in column of matrix)

2016-12-19 Thread Ali via Digitalmars-d-learn
On Monday, 19 December 2016 at 00:11:49 UTC, Nicholas Wilson 
wrote:

On Sunday, 18 December 2016 at 22:26:50 UTC, Ali wrote:


1. The first line with the splitting, I need to use .array 
three times. The last one I understand is because on "line 2" 
I alias T as the type of the data, and if I don't call .array 
then it's a MapResult type which has no opIndex. Yes?


2. On "line 1" I have to call .array.sort(). Why can't I just 
call .sort() on the transposed rows?




Because sort requires a random access range.


Ok so laziness stops as soon as sort is required on a range then? 
Ahh, because in place algorithms? Are there any plans in D to 
make is to that you can output copies to collections so that you 
could do something like filter.transpose.sort and just have it 
output a modified copy?


Unfortunately arrays have a builtin property sort that for some 
reason takes precedence of std.algorithm.sort when called 
without (). The compiler error is probably because .sort is a 
mutating operation and you're working with immutable data.


Ah ok. I think it's just that array.sort doesn't have CTFE 
capabilities then: "Error: _adSort cannot be interpreted at 
compile time, because it has no available source code"




Try
 })(ImmutableOf!T.init);
or use the seedless version of fold.
may need to import std.traits(?)


Same error message :(



does `data.map!fold!"a[1] > b[1] ? a : b".map!"a[0]".array;` 
work?


Actually I guess what's stopping this is my problem above maybe? 
Since I can't jut return b because of that error. I can't use the 
seedless version of fold either because Another version of the 
problem requires the seed be T.init(dchar.max, uint.max).


Thanks for input so far!




Re: sort, .array and folding on immutable data (finding most common character in column of matrix)

2016-12-18 Thread Nicholas Wilson via Digitalmars-d-learn

On Sunday, 18 December 2016 at 22:26:50 UTC, Ali wrote:
Hey, so I have this data file that has a list of a string of 
characters separated by new lines. The task is to find the most 
common letter in each column. Ie if file is:


abc
axy
cxc

Then the letters are a (column 1), x and c.

I've written the code to do this at compile time. But I have a 
few questions about sorting, immutablity casting and the need 
to use .array. The code follows:


==
import std.stdio, std.range, std.algorithm;

// Line 1
static immutable data = 
import("data_06.txt").split("\n").transposed.map!"a.array.sort().group.array".array;


// Line 2
alias T = typeof(data[0][0]);

auto redux(T[] charData) {
return charData.fold!((a, b) {
return a[1] > b[1] ? a : typeof(a)(b[0], b[1]);
})(T.init);
}

// Line 3
static immutable word = data.map!redux.array.map!"a[0]".array;

void main() {
word.writeln;
}
==

Well mostly I'm looking for how to simplify this even further. 
So any hints there would be awesome.


And, there're a few questions I have about the code (I guess 
mainly because of my of my lack of understanding of the type 
system)


1. The first line with the splitting, I need to use .array 
three times. The last one I understand is because on "line 2" I 
alias T as the type of the data, and if I don't call .array 
then it's a MapResult type which has no opIndex. Yes?


2. On "line 1" I have to call .array.sort(). Why can't I just 
call .sort() on the transposed rows?




Because sort requires a random access range.

3. If I call .sort (without parenthesis) I get a compiler 
error. What up with that?


Unfortunately arrays have a builtin property sort that for some 
reason takes precedence of std.algorithm.sort when called without 
(). The compiler error is probably because .sort is a mutating 
operation and you're working with immutable data.




4. On "line 3" I call map with the free function "redux". In 
this function, if I return just "a", it's all good. If I return 
"b", then I get "Incompatible function/seed/element: 
__lambda2/Tuple!(dchar, uint)/immutable(Tuple!(dchar, uint))". 
So my seed is not immutable, but the elements of "data" are. I 
can understand that. But how do I work around it without having 
to do "typeof(a)(b[0], b[1])" ?


Try
 })(ImmutableOf!T.init);
or use the seedless version of fold.
may need to import std.traits(?)


5. Is there anyway to get rid of the the "alias" I have in the 
code and just use an inline lambda in my map call on "line 3"?


does `data.map!fold!"a[1] > b[1] ? a : b".map!"a[0]".array;` work?



sort, .array and folding on immutable data (finding most common character in column of matrix)

2016-12-18 Thread Ali via Digitalmars-d-learn
Hey, so I have this data file that has a list of a string of 
characters separated by new lines. The task is to find the most 
common letter in each column. Ie if file is:


abc
axy
cxc

Then the letters are a (column 1), x and c.

I've written the code to do this at compile time. But I have a 
few questions about sorting, immutablity casting and the need to 
use .array. The code follows:


==
import std.stdio, std.range, std.algorithm;

// Line 1
static immutable data = 
import("data_06.txt").split("\n").transposed.map!"a.array.sort().group.array".array;


// Line 2
alias T = typeof(data[0][0]);

auto redux(T[] charData) {
return charData.fold!((a, b) {
return a[1] > b[1] ? a : typeof(a)(b[0], b[1]);
})(T.init);
}

// Line 3
static immutable word = data.map!redux.array.map!"a[0]".array;

void main() {
word.writeln;
}
==

Well mostly I'm looking for how to simplify this even further. So 
any hints there would be awesome.


And, there're a few questions I have about the code (I guess 
mainly because of my of my lack of understanding of the type 
system)


1. The first line with the splitting, I need to use .array three 
times. The last one I understand is because on "line 2" I alias T 
as the type of the data, and if I don't call .array then it's a 
MapResult type which has no opIndex. Yes?


2. On "line 1" I have to call .array.sort(). Why can't I just 
call .sort() on the transposed rows?


3. If I call .sort (without parenthesis) I get a compiler error. 
What up with that?


4. On "line 3" I call map with the free function "redux". In this 
function, if I return just "a", it's all good. If I return "b", 
then I get "Incompatible function/seed/element: 
__lambda2/Tuple!(dchar, uint)/immutable(Tuple!(dchar, uint))". So 
my seed is not immutable, but the elements of "data" are. I can 
understand that. But how do I work around it without having to do 
"typeof(a)(b[0], b[1])" ?


5. Is there anyway to get rid of the the "alias" I have in the 
code and just use an inline lambda in my map call on "line 3"?