[ https://issues.apache.org/jira/browse/MADLIB-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091845#comment-16091845 ]
Frank McQuillan commented on MADLIB-1136: ----------------------------------------- David pls send me an email directly to fmcquil...@pivotal.io and I will send you credentials for an S3 bucket you can upload the file to. Frank > Getting "ERROR: plpy.SPIError: Function" when calling linregr_train function > with big data > ------------------------------------------------------------------------------------------- > > Key: MADLIB-1136 > URL: https://issues.apache.org/jira/browse/MADLIB-1136 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Linear Regression > Reporter: David Chen > > hi MADLib developers, > we have been trying to use MADlib on Greenplum to in-database perform linear > regression calculation on a large amount of data (789,626,243 rows of data, > segmented in ~475,000 groups). However, after running the following SQL > statement for a little bit more than ten minutes, the following error message > occurs: > SQL statement: > SELECT madlib.linregr_train( > 'xinos_plus_case_dlinterference_v2.temp_neighbor_pair_cqi_prb_nonull', > 'xinos_plus_case_dlinterference_v2.taipei_lm_result_temp', > 'average_cqi', 'array[1, prb_utilization]', > 'main_lnbts_id,main_lncel_id,lnbts_id,lncel_id'); > Error message: > ERROR: plpy.SPIError: Function > "madlib.linregr_merge_states(madlib.bytea8,madlib.bytea8)": ByteString > improperly aligned for alignment request in seek(). (UDF_impl.hpp:210) (seg2 > 59-120-199-107.HINET-IP.hinet.net:50002 pid=9137) (plpython.c:4648) > If we downsize the input data to 269837688 rows, then the same SQL statement > can run with successful result. > We are not sure if what we encountered here is a bug or an issue with how we > use this MADLib linear regression function and we will appreciate it a lot if > you could give us some pointers. > We are willing to provide more information about input data (e.g. data > schema) for further investigation if needed. > thank you very much for taking care of this issue. > David -- This message was sent by Atlassian JIRA (v6.4.14#64029)