[ 
https://issues.apache.org/jira/browse/MADLIB-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094110#comment-16094110
 ] 

David Chen commented on MADLIB-1136:
------------------------------------

hi Frank,

The sample data is being uploaded and the following is the table schemas for 
the test data and regression result:

CREATE TABLE xinos_plus_case_dlinterference_v2.temp_neighbor_pair_cqi_prb_nonull
(
  "time" character varying(18),
  main_lnbts_id character varying(20),
  main_lncel_id character varying(20),
  lnbts_id character varying(20),
  lncel_id character varying(20),
  average_cqi double precision,
  prb_utilization double precision
)
WITH (
  OIDS=TRUE
)
DISTRIBUTED BY ("time");
ALTER TABLE xinos_plus_case_dlinterference_v2.temp_neighbor_pair_cqi_prb_nonull
  OWNER TO gpadmin;



CREATE TABLE xinos_plus_case_dlinterference_v2.taipei_lm_result_temp
(
  main_lnbts_id character varying(20),
  main_lncel_id character varying(20),
  lnbts_id character varying(20),
  lncel_id character varying(20),
  coef double precision[],
  r2 double precision,
  std_err double precision[],
  t_stats double precision[],
  p_values double precision[],
  condition_no double precision,
  num_rows_processed bigint,
  num_missing_rows_skipped bigint,
  variance_covariance double precision[]
)
WITH (
  OIDS=FALSE
)
DISTRIBUTED BY (main_lnbts_id, main_lncel_id, lnbts_id, lncel_id);
ALTER TABLE xinos_plus_case_dlinterference_v2.taipei_lm_result_temp
  OWNER TO gpadmin;


> Getting "ERROR: plpy.SPIError: Function" when calling linregr_train function 
> with big data 
> -------------------------------------------------------------------------------------------
>
>                 Key: MADLIB-1136
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1136
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Linear Regression
>            Reporter: David Chen
>
> hi MADLib developers,
> we have been trying to use MADlib on Greenplum to in-database perform linear 
> regression calculation on a large amount of data (789,626,243 rows of data, 
> segmented in ~475,000 groups). However, after running the following SQL 
> statement for a little bit more than ten minutes, the following error message 
> occurs:
> SQL statement: 
> SELECT madlib.linregr_train(
>     'xinos_plus_case_dlinterference_v2.temp_neighbor_pair_cqi_prb_nonull',
>     'xinos_plus_case_dlinterference_v2.taipei_lm_result_temp', 
>     'average_cqi', 'array[1, prb_utilization]',
>     'main_lnbts_id,main_lncel_id,lnbts_id,lncel_id');
> Error message:
> ERROR: plpy.SPIError: Function 
> "madlib.linregr_merge_states(madlib.bytea8,madlib.bytea8)": ByteString 
> improperly aligned for alignment request in seek(). (UDF_impl.hpp:210)  (seg2 
> 59-120-199-107.HINET-IP.hinet.net:50002 pid=9137) (plpython.c:4648)
> If we downsize the input data to 269837688 rows, then the same SQL statement 
> can run with successful result.
> We are not sure if what we encountered here is a bug or an issue with how we 
> use this MADLib linear regression function and we will appreciate it a lot if 
> you could give us some pointers.
> We are willing to provide more information about input data (e.g. data 
> schema) for further investigation if needed.
> thank you very much for taking care of this issue.
> David



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to