Sorry, "singular" was the wrong word. I meant that the matrix does not have 
full rank: the columns are not linearly independent.

# Linearly independent columns
X1 = [1 2 3;
      1 2 5
      1 2 -8
      1 4 23] 
# Linearly dependent columns (4th row changed
X2 = [1 2 3;
      1 2 5
      1 2 -8
      2 4 23]
y = [2.3, 5.6, 9.2, 10.5]

X1 \ y
>3-element Array{Float64,1}:

> -8.18265 
>  6.94133 
> -0.394898

X2 \ y
>LAPACKException(2)


This is because there is no unique solution to the linear regression 
problem when the matrix does not have full rank. The usual solution is 
regularization (eg. ridge regression), but selecting a linearly independent 
subset of the columns seems like a valid strategy too, if only prediction 
accuracy matters.

Hence: isn't that a valid non-pedagogical use of rref?

Cédric

On Wednesday, September 16, 2015 at 12:13:55 AM UTC-4, Steven G. Johnson 
wrote:
>
>
>
> On Tuesday, September 15, 2015 at 10:22:47 AM UTC-4, Cedric St-Jean wrote:
>>
>> In the discussion of issue 9804 
>> <https://github.com/JuliaLang/julia/pull/9804#issuecomment-140404574>, 
>> the general consensus seemed to be:
>>
>> > Right. "rref" has no use in real numerical coding, but every use in 
>> pedagogy. 
>>
>>
>>  I had a linear regression yield a PosDefError (the X matrix contains 
>> only integers and was singular). I solved it by selecting a linearly 
>> independent subset of columns 
>> <http://math.stackexchange.com/questions/199125/how-to-remove-linearly-dependent-rows-cols>.
>>  
>> This uses the Row-Echelon-Form, rref. Is there a simpler way of doing 
>> this that I'm missing?
>>
>
> A \ x, where A is a non-square matrix, finds least-square solution via QR. 
>  This is normally the "right" way to do regression.
>
> (You can also compute the QR factorization of A explicitly to get a rank 
> estimate; google "rank revealing QR" for more info on this sort of thing.)
>

Reply via email to