Imran Younus created SYSTEMML-1156:
--------------------------------------

             Summary: problem with MLContext and QR
                 Key: SYSTEMML-1156
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1156
             Project: SystemML
          Issue Type: Bug
          Components: Runtime
         Environment: spark 1.6.2
centOS7

            Reporter: Imran Younus


I'm trying to run this simple code to get QR
{code}
X = rand(rows=4, cols=2)

[H, R] = qr(X)

print(toString(H))
print ("X is of size : " + nrow(X) + "," + ncol(X))
print ("H is of size : " + nrow(H) + "," + ncol(H))
print ("R is of size : " + nrow(R) + "," + ncol(R))

n = ncol(H)

for( j in n:1 ) {
    print(j);
    V = H[,j];
    print ("V is of size : " + nrow(V) + "," + ncol(V))
    VTV = t(V) %*% V
    print(toString(VTV))
}
{code}

I ran this in CP mode and in hybrid spark mode.

In the CP mode this works perfectly fine.

But, when I run this with spark then the behavior is strange.
The problem is that inside the  for loop, when I assign {{H\[,j\]}} to {{V}}, 
it becomes {{H}} instead of just a column of {{H}}. So, {{VTV}} then becomes a 
matrix instead of just a number which I want. This only happens inside the for 
loop. If I do this  without for loop then there is no problem. Also, this is 
occurs only for matrix {{H}}. If I replace {{H}} with {{X}} instead, then there 
is no problem. Here is the out of the code when I run it with spark:

{code}
16/12/16 11:53:27 INFO api.DMLScript: BEGIN DML run 12/16/2016 11:53:27
16/12/16 11:53:27 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
X is of size : 4,2
H is of size : 4,2
R is of size : 4,2
1.526 0.000
0.459 1.905
0.280 -0.202
0.659 0.373

2
V is of size : 4,1
3.051 1.064
1.064 3.811

1
V is of size : 4,1
3.051 1.064
1.064 3.811

16/12/16 11:53:27 INFO api.DMLScript: SystemML Statistics:
Total execution time:           0.624 sec.
Number of executed Spark inst:  0.

16/12/16 11:53:27 INFO api.DMLScript: END DML run 12/16/2016 11:53:27
{code}

As you can see from the output, the size of {{V}} is correct. Its supposed to 
be a column vector. But, {{VTV}} is a 2x2 matrix instead of a number because 
{{V}} is just {{H}}. We print {{V}} and see that.

Here is correct output form CP mode:

{code}
================================================================================
================================================================================
16/12/16 11:54:56 INFO api.DMLScript: BEGIN DML run 12/16/2016 11:54:56
16/12/16 11:54:57 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
X is of size : 4,2
H is of size : 4,2
R is of size : 4,2
1.575 0.000
0.476 1.591
0.296 -0.772
0.596 0.233

2
V is of size : 4,1
3.182

1
V is of size : 4,1
3.151

16/12/16 11:54:57 INFO api.DMLScript: SystemML Statistics:
Total execution time:           0.199 sec.
Number of executed MR Jobs:     0.

16/12/16 11:54:57 INFO api.DMLScript: END DML run 12/16/2016 11:54:57
{code}

[~mboehm7] [~niketanpansare]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to