dadanielniel opened a new pull request #523:
URL: https://github.com/apache/madlib/pull/523


   ### Module name: Linear-Regression
   
   ### JIRA: MADlib-1460
   
   ### Description:
   
   Linear regression training results in 2 output tables (**neither are 
optional**): 
   
   The **primary** output table, that includes the computed coefficients.
   A **summary** output table, that contains a single line.
   
   #### Scenario
   
   Running the linear regression training in postgresql on an input table which 
has **more than 2^31 records** within it (even if a grouping column is 
specified), fails due to an "**integer out of range**" exception.
   
   #### Source
   
   **The summary table** has a column that stores **the total number of 
records** involved in the computation. The column's data type is a **singed 
integer**. However, the total number of records is computed as a **BIGINT**. 
Therefore, when the total number of records in the input table is beyond the 
range of a signed integer (i.e., 2^31), an "integer out of range" exception is 
thrown.
   
   ### Solution
   
   A simple solution is to change the data type of the column from a **signed 
integer** into a **BIGINT**. 
   
   ### Test
   
   We have executed the linear regression training function with and without 
the suggested modification on an input table having between 2^31-2^32 records. 
Without the modification, an integer out of range exception was thrown. After 
modifying the code as suggested, it worked perfectly. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to