Re: varying results: local VS cluster

2016-04-11 Thread Stephan Ewen
Just to make sure: Most numeric programs produce varying results across
different execution. If the algorithm is correct, they should converge
towards the same solution, but it is very common that the exact solution
differs.




On Mon, Apr 11, 2016 at 10:16 AM, Aljoscha Krettek 
wrote:

> Hi,
> could you please provide a minimal example input and maybe also the output
> for parallelism=5 and parallelism=1 so that we can check.
>
> --
> aljoscha
>
> On Mon, 4 Apr 2016 at 09:52 Lydia Ickler  wrote:
>
>> Hi all,
>>
>> I have an issue regarding execution on 1 machine VS 5 machines.
>> If I execute the following code the results are not the same though I
>> would expect them to be since the input file is the same.
>> Do you have any suggestions?
>>
>> Thanks in advance!
>> Lydia
>>
>> ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
>> env.getConfig().setGlobalJobParameters(parameters);
>>
>> //read input file
>> DataSet> matrixA = readMatrix(env, 
>> parameters.get("input"));
>> //Approximate EigenVector by PowerIteration
>> //get initial vector - which equals matrixA * [1, ... , 1]
>> DataSet> initial0 = 
>> (matrixA.groupBy(0)).sum(2);
>> DataSet> maximum = initial0.maxBy(2);
>> //normalize by maximum value
>> DataSet> initial= 
>> (initial0.cross(maximum)).map(new normalizeByMax());
>>
>> //BulkIteration to find dominant eigenvector
>> IterativeDataSet> iteration = 
>> initial.iterate(1);
>>
>> DataSet> intermediate = 
>> ((matrixA.join(iteration).where(1).equalTo(0))
>> .map(new ProjectJoinResultMapper())).groupBy(0, 
>> 1)).sum(2)).groupBy(0)).sum(2)).
>> cross(matrixA.join(iteration).where(1).equalTo(0))
>> .map(new ProjectJoinResultMapper())).groupBy(0, 
>> 1)).sum(2))).groupBy(0)).sum(2)).sum(2)))
>> .map(new normalizeByMax());
>>
>> DataSet> diffs = 
>> (iteration.join(intermediate).where(0).equalTo(0)).with(new deltaFilter());
>> DataSet> eigenVector  = 
>> iteration.closeWith(intermediate,diffs);
>>
>> eigenVector.writeAsCsv(parameters.get("output"));
>> env.execute("Power Iteration");
>>
>>
>>


Re: varying results: local VS cluster

2016-04-11 Thread Aljoscha Krettek
Hi,
could you please provide a minimal example input and maybe also the output
for parallelism=5 and parallelism=1 so that we can check.

--
aljoscha

On Mon, 4 Apr 2016 at 09:52 Lydia Ickler  wrote:

> Hi all,
>
> I have an issue regarding execution on 1 machine VS 5 machines.
> If I execute the following code the results are not the same though I
> would expect them to be since the input file is the same.
> Do you have any suggestions?
>
> Thanks in advance!
> Lydia
>
> ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
> env.getConfig().setGlobalJobParameters(parameters);
>
> //read input file
> DataSet> matrixA = readMatrix(env, 
> parameters.get("input"));
> //Approximate EigenVector by PowerIteration
> //get initial vector - which equals matrixA * [1, ... , 1]
> DataSet> initial0 = 
> (matrixA.groupBy(0)).sum(2);
> DataSet> maximum = initial0.maxBy(2);
> //normalize by maximum value
> DataSet> initial= 
> (initial0.cross(maximum)).map(new normalizeByMax());
>
> //BulkIteration to find dominant eigenvector
> IterativeDataSet> iteration = 
> initial.iterate(1);
>
> DataSet> intermediate = 
> ((matrixA.join(iteration).where(1).equalTo(0))
> .map(new ProjectJoinResultMapper())).groupBy(0, 
> 1)).sum(2)).groupBy(0)).sum(2)).
> cross(matrixA.join(iteration).where(1).equalTo(0))
> .map(new ProjectJoinResultMapper())).groupBy(0, 
> 1)).sum(2))).groupBy(0)).sum(2)).sum(2)))
> .map(new normalizeByMax());
>
> DataSet> diffs = 
> (iteration.join(intermediate).where(0).equalTo(0)).with(new deltaFilter());
> DataSet> eigenVector  = 
> iteration.closeWith(intermediate,diffs);
>
> eigenVector.writeAsCsv(parameters.get("output"));
> env.execute("Power Iteration");
>
>
>


varying results: local VS cluster

2016-04-04 Thread Lydia Ickler
Hi all,

I have an issue regarding execution on 1 machine VS 5 machines.
If I execute the following code the results are not the same though I would 
expect them to be since the input file is the same.
Do you have any suggestions?

Thanks in advance!
Lydia
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
env.getConfig().setGlobalJobParameters(parameters);

//read input file
DataSet> matrixA = readMatrix(env, 
parameters.get("input"));
//Approximate EigenVector by PowerIteration
//get initial vector - which equals matrixA * [1, ... , 1]
DataSet> initial0 = 
(matrixA.groupBy(0)).sum(2);
DataSet> maximum = initial0.maxBy(2);
//normalize by maximum value
DataSet> initial= 
(initial0.cross(maximum)).map(new normalizeByMax());

//BulkIteration to find dominant eigenvector
IterativeDataSet> iteration = 
initial.iterate(1);

DataSet> intermediate = 
((matrixA.join(iteration).where(1).equalTo(0))
.map(new ProjectJoinResultMapper())).groupBy(0, 
1)).sum(2)).groupBy(0)).sum(2)).
cross(matrixA.join(iteration).where(1).equalTo(0))
.map(new ProjectJoinResultMapper())).groupBy(0, 
1)).sum(2))).groupBy(0)).sum(2)).sum(2)))
.map(new normalizeByMax());

DataSet> diffs = 
(iteration.join(intermediate).where(0).equalTo(0)).with(new deltaFilter());
DataSet> eigenVector  = 
iteration.closeWith(intermediate,diffs);

eigenVector.writeAsCsv(parameters.get("output"));
env.execute("Power Iteration");