[jira] [Commented] (MADLIB-965) RF and DT should accept array input for feature vector
[ https://issues.apache.org/jira/browse/MADLIB-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997602#comment-15997602 ] Frank McQuillan commented on MADLIB-965: Perhaps the features column should handle a blob, rather than limiting it to a single array. This because one cannot mix types in PG arrays, but you often want to have a mix of categorical and continuous variables in a DT or RF model. e.g., 'col1, col2, ... col n' where col1 is a categorical TEXT variable, col2 is an array of FLOAT8, col3 is a categorical INT variable, etc. Could also affect the list_of_features_to_exclude column too. > RF and DT should accept array input for feature vector > -- > > Key: MADLIB-965 > URL: https://issues.apache.org/jira/browse/MADLIB-965 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Decision Tree, Module: Random Forest >Reporter: Rashmi Raghu >Priority: Minor > Fix For: v1.12 > > Attachments: DT and RF work1.ipynb > > > We were trying to test whether the RF module could handle a column containing > array of features as input (instead of each feature in a separate column). > The result was an error message but that message is unclear as to source of > error (i.e. is it because of the array feature input column or something > else). Example table, query and error can be found below: > {quote} > -- Executing query: > DROP TABLE IF EXISTS dt_golf; > CREATE TABLE dt_golf ( > id integer NOT NULL, > "OUTLOOK" text, > temperature double precision, > humidity double precision, > windy text, > class text > ) ; > -- Executing query: > INSERT INTO dt_golf (id,"OUTLOOK",temperature,humidity,windy,class) VALUES > (1, 'sunny', 85, 85, 'false', 'Don''t Play'), > (2, 'sunny', 80, 90, 'true', 'Don''t Play'), > (3, 'overcast', 83, 78, 'false', 'Play'), > (4, 'rain', 70, 96, 'false', 'Play'), > (5, 'rain', 68, 80, 'false', 'Play'), > (6, 'rain', 65, 70, 'true', 'Don''t Play'), > (7, 'overcast', 64, 65, 'true', 'Play'), > (8, 'sunny', 72, 95, 'false', 'Don''t Play'), > (9, 'sunny', 69, 70, 'false', 'Play'), > (10, 'rain', 75, 80, 'false', 'Play'), > (11, 'sunny', 75, 70, 'true', 'Play'), > (12, 'overcast', 72, 90, 'true', 'Play'), > (13, 'overcast', 81, 75, 'false', 'Play'), > (14, 'rain', 71, 80, 'true', 'Don''t Play'); > DROP TABLE IF EXISTS dt_golf_array; > CREATE TABLE dt_golf_array as > select id, array[temperature, humidity] as input_array, class > from dt_golf > distributed by (id); > DROP TABLE IF EXISTS train_output, train_output_group, train_output_summary; > SELECT madlib.forest_train('dt_golf_array', -- source table >'train_output',-- output model table >'id', -- id column >'class', -- response >'input_array', -- features >NULL, -- exclude columns >NULL, -- grouping columns >20::integer, -- number of trees >1::integer,-- number of random features >TRUE::boolean, -- variable importance >1::integer,-- num_permutations >8::integer,-- max depth >3::integer,-- min split >1::integer,-- min bucket >10::integer-- number of splits per > continuous variable >); > NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named > 'id' as the Greenplum Database data distribution key for this table. > HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make > sure column(s) chosen are the optimal data distribution key to minimize skew. > query result with 1 row discarded. > ERROR: plpy.SPIError: invalid array length (plpython.c:4648) > DETAIL: array_of_bigint: Size should be in [1, 1e7], 0 given > CONTEXT: Traceback (most recent call last): > PL/Python function "forest_train", line 42, in > sample_ratio > PL/Python function "forest_train", line 589, in forest_train > PL/Python function "forest_train", line 1037, in _calculate_oob_prediction > PL/Python function "forest_train" > ** Error ** > ERROR: plpy.SPIError: invalid array length (plpython.c:4648) > SQL state: XX000 > Detail: array_of_bigint: Size should be in [1, 1e7], 0 given > Context: Traceback (most recent call last): > PL/Python function "forest_train", line 42, in > sample_ratio > PL/Python function "forest_train", line 589, in forest_train > PL/Python
[jira] [Updated] (MADLIB-965) RF and DT should accept array input for feature vector
[ https://issues.apache.org/jira/browse/MADLIB-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank McQuillan updated MADLIB-965: --- Issue Type: New Feature (was: Bug) > RF and DT should accept array input for feature vector > -- > > Key: MADLIB-965 > URL: https://issues.apache.org/jira/browse/MADLIB-965 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Decision Tree, Module: Random Forest >Reporter: Rashmi Raghu >Priority: Minor > Fix For: v1.12 > > Attachments: DT and RF work1.ipynb > > > We were trying to test whether the RF module could handle a column containing > array of features as input (instead of each feature in a separate column). > The result was an error message but that message is unclear as to source of > error (i.e. is it because of the array feature input column or something > else). Example table, query and error can be found below: > {quote} > -- Executing query: > DROP TABLE IF EXISTS dt_golf; > CREATE TABLE dt_golf ( > id integer NOT NULL, > "OUTLOOK" text, > temperature double precision, > humidity double precision, > windy text, > class text > ) ; > -- Executing query: > INSERT INTO dt_golf (id,"OUTLOOK",temperature,humidity,windy,class) VALUES > (1, 'sunny', 85, 85, 'false', 'Don''t Play'), > (2, 'sunny', 80, 90, 'true', 'Don''t Play'), > (3, 'overcast', 83, 78, 'false', 'Play'), > (4, 'rain', 70, 96, 'false', 'Play'), > (5, 'rain', 68, 80, 'false', 'Play'), > (6, 'rain', 65, 70, 'true', 'Don''t Play'), > (7, 'overcast', 64, 65, 'true', 'Play'), > (8, 'sunny', 72, 95, 'false', 'Don''t Play'), > (9, 'sunny', 69, 70, 'false', 'Play'), > (10, 'rain', 75, 80, 'false', 'Play'), > (11, 'sunny', 75, 70, 'true', 'Play'), > (12, 'overcast', 72, 90, 'true', 'Play'), > (13, 'overcast', 81, 75, 'false', 'Play'), > (14, 'rain', 71, 80, 'true', 'Don''t Play'); > DROP TABLE IF EXISTS dt_golf_array; > CREATE TABLE dt_golf_array as > select id, array[temperature, humidity] as input_array, class > from dt_golf > distributed by (id); > DROP TABLE IF EXISTS train_output, train_output_group, train_output_summary; > SELECT madlib.forest_train('dt_golf_array', -- source table >'train_output',-- output model table >'id', -- id column >'class', -- response >'input_array', -- features >NULL, -- exclude columns >NULL, -- grouping columns >20::integer, -- number of trees >1::integer,-- number of random features >TRUE::boolean, -- variable importance >1::integer,-- num_permutations >8::integer,-- max depth >3::integer,-- min split >1::integer,-- min bucket >10::integer-- number of splits per > continuous variable >); > NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named > 'id' as the Greenplum Database data distribution key for this table. > HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make > sure column(s) chosen are the optimal data distribution key to minimize skew. > query result with 1 row discarded. > ERROR: plpy.SPIError: invalid array length (plpython.c:4648) > DETAIL: array_of_bigint: Size should be in [1, 1e7], 0 given > CONTEXT: Traceback (most recent call last): > PL/Python function "forest_train", line 42, in > sample_ratio > PL/Python function "forest_train", line 589, in forest_train > PL/Python function "forest_train", line 1037, in _calculate_oob_prediction > PL/Python function "forest_train" > ** Error ** > ERROR: plpy.SPIError: invalid array length (plpython.c:4648) > SQL state: XX000 > Detail: array_of_bigint: Size should be in [1, 1e7], 0 given > Context: Traceback (most recent call last): > PL/Python function "forest_train", line 42, in > sample_ratio > PL/Python function "forest_train", line 589, in forest_train > PL/Python function "forest_train", line 1037, in _calculate_oob_prediction > PL/Python function "forest_train" > {quote} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MADLIB-1098) Corrections for MADlib naming consistency
[ https://issues.apache.org/jira/browse/MADLIB-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997235#comment-15997235 ] ASF GitHub Bot commented on MADLIB-1098: Github user rvs closed the pull request at: https://github.com/apache/incubator-madlib/pull/130 > Corrections for MADlib naming consistency > - > > Key: MADLIB-1098 > URL: https://issues.apache.org/jira/browse/MADLIB-1098 > Project: Apache MADlib > Issue Type: Improvement >Reporter: Rashmi Raghu >Assignee: Rashmi Raghu >Priority: Minor > Fix For: v1.11 > > > Several locations (e.g. Read Me screen / Intro screen and others) which > contain the name MADlib should be changed to 'Apache MADlib (Incubating)'. > Based on observations from the community on dev and user mailing lists (see > below for excerpts from those discussions). > > Copying relevant excerpts from Ed's email: > http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201705.mbox/%3CCAHAuQDyn-drvZ64%2B4MYL2A%2BbDdea%3DtTew2boSz8ChdUPH2Aj_Q%40mail.gmail.com%3E > > == > > Source miscelaneous: HAWQ_Install.txt > > > > Observation: > > > > - The file references the product name as "MADlib" and not "Apache > > MADlib (Incubating). Is this file still valid? > > > > == > > CONVENIENCE BINARIES > > -- > > > > -- > > Mac Installer DMG file: apache-madlib-1.11-incubating-bin-Darwin.dmg > > -- > > > > Observation: > > > > - The DMG(apache-madlib-1.11-incubating-bin-Darwin.dmg) contains a > > pkg file named "madlib-1.11-Darwin.pkg". Shouldn't it be called > > "apache-madlib-1.11-incubating-Darwin.pkg"? > > > > Similarly, the DMG base folder name is madlib-1.11.Darwin. > > > > Mac Installer Package > > > > o Introduction screen > > > > Observation: > > > > - The introduction screen identifies the product name as > > "MADlib". Shouldn't there be a mention of the project name being > > "Apache MADlib (Incubating)". > > > > o Read Me screen > > > > Observation: > > > > - Similar to initial screen, there is no mention to the Apache > > project except for the link to the project's wiki. > > > > o Remaining screens look reasonable (with exception of no Apache > > references). > > > > o The default application window name is "Install MADlib" > > > > Observation: > > > > - Similar to Introduction sreen, should the name be "Install Apache > > MADlib (Incubating)"? > > > > - Look for other opportunities to reference the product name as > > "Apache MADlib (Incubating)". > > > > -- > > Linux RPM: apache-madlib-1.11-incubating-bin-Linux.rpm > > -- > > > > Observation: > > > > - It appears the SPEC file used (possibly generated) references the > > product name as "madlib". Again, shouldn't there be references to > > the product name as "Apache MADlib" scattered about? > > Unfortunately, I am not sure if this should change or not. It > > might help for someone on the team to review other Apache projects > > convenience binary RPMs to see if something should be > > addressed. The podling's mentor might be able to provide > > additional direction as well. > > > > This can be seen in the following "rpm -qi madlib" output: > > > > [root@e0f4d3349d2d MADlib]# rpm -qi madlib > > Name: madlib > > Version : 1.11 > > Release : 1 > > Architecture: x86_64 > > Install Date: Wed May 3 04:00:10 2017 > > Group : Development/Libraries > > Size: 83575356 > > License : ASL 2.0 > > Signature : (none) > > Source RPM : madlib-1.11-1.src.rpm > > Build Date : Tue May 2 19:03:21 2017 > > Build Host : gpdb1.eng.pivotal.io > > Relocations : /usr/local > > Vendor : MADlib > > Summary : Open-Source Library for Scalable in-Database > > Analytics > > Description : > > MADlib is an open-source library for scalable in-database > > analytics. It > > provides data-parallel implementations of mathematical, > > statistical and > > machine learning methods for structured and unstructured data. > > > > The MADlib mission: to foster widespread development of scalable > > a
[jira] [Commented] (MADLIB-1098) Corrections for MADlib naming consistency
[ https://issues.apache.org/jira/browse/MADLIB-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997225#comment-15997225 ] ASF GitHub Bot commented on MADLIB-1098: Github user rvs commented on the issue: https://github.com/apache/incubator-madlib/pull/130 Good point. Let me update the PR. > Corrections for MADlib naming consistency > - > > Key: MADLIB-1098 > URL: https://issues.apache.org/jira/browse/MADLIB-1098 > Project: Apache MADlib > Issue Type: Improvement >Reporter: Rashmi Raghu >Assignee: Rashmi Raghu >Priority: Minor > Fix For: v1.11 > > > Several locations (e.g. Read Me screen / Intro screen and others) which > contain the name MADlib should be changed to 'Apache MADlib (Incubating)'. > Based on observations from the community on dev and user mailing lists (see > below for excerpts from those discussions). > > Copying relevant excerpts from Ed's email: > http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201705.mbox/%3CCAHAuQDyn-drvZ64%2B4MYL2A%2BbDdea%3DtTew2boSz8ChdUPH2Aj_Q%40mail.gmail.com%3E > > == > > Source miscelaneous: HAWQ_Install.txt > > > > Observation: > > > > - The file references the product name as "MADlib" and not "Apache > > MADlib (Incubating). Is this file still valid? > > > > == > > CONVENIENCE BINARIES > > -- > > > > -- > > Mac Installer DMG file: apache-madlib-1.11-incubating-bin-Darwin.dmg > > -- > > > > Observation: > > > > - The DMG(apache-madlib-1.11-incubating-bin-Darwin.dmg) contains a > > pkg file named "madlib-1.11-Darwin.pkg". Shouldn't it be called > > "apache-madlib-1.11-incubating-Darwin.pkg"? > > > > Similarly, the DMG base folder name is madlib-1.11.Darwin. > > > > Mac Installer Package > > > > o Introduction screen > > > > Observation: > > > > - The introduction screen identifies the product name as > > "MADlib". Shouldn't there be a mention of the project name being > > "Apache MADlib (Incubating)". > > > > o Read Me screen > > > > Observation: > > > > - Similar to initial screen, there is no mention to the Apache > > project except for the link to the project's wiki. > > > > o Remaining screens look reasonable (with exception of no Apache > > references). > > > > o The default application window name is "Install MADlib" > > > > Observation: > > > > - Similar to Introduction sreen, should the name be "Install Apache > > MADlib (Incubating)"? > > > > - Look for other opportunities to reference the product name as > > "Apache MADlib (Incubating)". > > > > -- > > Linux RPM: apache-madlib-1.11-incubating-bin-Linux.rpm > > -- > > > > Observation: > > > > - It appears the SPEC file used (possibly generated) references the > > product name as "madlib". Again, shouldn't there be references to > > the product name as "Apache MADlib" scattered about? > > Unfortunately, I am not sure if this should change or not. It > > might help for someone on the team to review other Apache projects > > convenience binary RPMs to see if something should be > > addressed. The podling's mentor might be able to provide > > additional direction as well. > > > > This can be seen in the following "rpm -qi madlib" output: > > > > [root@e0f4d3349d2d MADlib]# rpm -qi madlib > > Name: madlib > > Version : 1.11 > > Release : 1 > > Architecture: x86_64 > > Install Date: Wed May 3 04:00:10 2017 > > Group : Development/Libraries > > Size: 83575356 > > License : ASL 2.0 > > Signature : (none) > > Source RPM : madlib-1.11-1.src.rpm > > Build Date : Tue May 2 19:03:21 2017 > > Build Host : gpdb1.eng.pivotal.io > > Relocations : /usr/local > > Vendor : MADlib > > Summary : Open-Source Library for Scalable in-Database > > Analytics > > Description : > > MADlib is an open-source library for scalable in-database > > analytics. It > > provides data-parallel implementations of mathematical, > > statistical and > > machine learning methods for structured and unstructured data. > > > > The MADlib mission: to foster widespread
[jira] [Commented] (MADLIB-1098) Corrections for MADlib naming consistency
[ https://issues.apache.org/jira/browse/MADLIB-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997170#comment-15997170 ] Roman Shaposhnik commented on MADLIB-1098: -- Please take a look at the PR I've just submitted. I agree with Ed's feedback that the metadata in binary packages needs to reflect the Incubating status of MADlib at ASF. > Corrections for MADlib naming consistency > - > > Key: MADLIB-1098 > URL: https://issues.apache.org/jira/browse/MADLIB-1098 > Project: Apache MADlib > Issue Type: Improvement >Reporter: Rashmi Raghu >Assignee: Rashmi Raghu >Priority: Minor > Fix For: v1.11 > > > Several locations (e.g. Read Me screen / Intro screen and others) which > contain the name MADlib should be changed to 'Apache MADlib (Incubating)'. > Based on observations from the community on dev and user mailing lists (see > below for excerpts from those discussions). > > Copying relevant excerpts from Ed's email: > http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201705.mbox/%3CCAHAuQDyn-drvZ64%2B4MYL2A%2BbDdea%3DtTew2boSz8ChdUPH2Aj_Q%40mail.gmail.com%3E > > == > > Source miscelaneous: HAWQ_Install.txt > > > > Observation: > > > > - The file references the product name as "MADlib" and not "Apache > > MADlib (Incubating). Is this file still valid? > > > > == > > CONVENIENCE BINARIES > > -- > > > > -- > > Mac Installer DMG file: apache-madlib-1.11-incubating-bin-Darwin.dmg > > -- > > > > Observation: > > > > - The DMG(apache-madlib-1.11-incubating-bin-Darwin.dmg) contains a > > pkg file named "madlib-1.11-Darwin.pkg". Shouldn't it be called > > "apache-madlib-1.11-incubating-Darwin.pkg"? > > > > Similarly, the DMG base folder name is madlib-1.11.Darwin. > > > > Mac Installer Package > > > > o Introduction screen > > > > Observation: > > > > - The introduction screen identifies the product name as > > "MADlib". Shouldn't there be a mention of the project name being > > "Apache MADlib (Incubating)". > > > > o Read Me screen > > > > Observation: > > > > - Similar to initial screen, there is no mention to the Apache > > project except for the link to the project's wiki. > > > > o Remaining screens look reasonable (with exception of no Apache > > references). > > > > o The default application window name is "Install MADlib" > > > > Observation: > > > > - Similar to Introduction sreen, should the name be "Install Apache > > MADlib (Incubating)"? > > > > - Look for other opportunities to reference the product name as > > "Apache MADlib (Incubating)". > > > > -- > > Linux RPM: apache-madlib-1.11-incubating-bin-Linux.rpm > > -- > > > > Observation: > > > > - It appears the SPEC file used (possibly generated) references the > > product name as "madlib". Again, shouldn't there be references to > > the product name as "Apache MADlib" scattered about? > > Unfortunately, I am not sure if this should change or not. It > > might help for someone on the team to review other Apache projects > > convenience binary RPMs to see if something should be > > addressed. The podling's mentor might be able to provide > > additional direction as well. > > > > This can be seen in the following "rpm -qi madlib" output: > > > > [root@e0f4d3349d2d MADlib]# rpm -qi madlib > > Name: madlib > > Version : 1.11 > > Release : 1 > > Architecture: x86_64 > > Install Date: Wed May 3 04:00:10 2017 > > Group : Development/Libraries > > Size: 83575356 > > License : ASL 2.0 > > Signature : (none) > > Source RPM : madlib-1.11-1.src.rpm > > Build Date : Tue May 2 19:03:21 2017 > > Build Host : gpdb1.eng.pivotal.io > > Relocations : /usr/local > > Vendor : MADlib > > Summary : Open-Source Library for Scalable in-Database > > Analytics > > Description : > > MADlib is an open-source library for scalable in-database > > analytics. It > > provides data-parallel implementations of mathematical, > > statistical and > > machine learning methods for structured and unstructured data. > > > > The
[jira] [Commented] (MADLIB-1098) Corrections for MADlib naming consistency
[ https://issues.apache.org/jira/browse/MADLIB-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997165#comment-15997165 ] ASF GitHub Bot commented on MADLIB-1098: GitHub user rvs opened a pull request: https://github.com/apache/incubator-madlib/pull/130 MADLIB-1098. Corrections for MADlib naming consistency You can merge this pull request into a Git repository by running: $ git pull https://github.com/rvs/incubator-madlib master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/130.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #130 commit eeed91b570120fe4d47cc2f2f07ed1aa304acc14 Author: Roman Shaposhnik Date: 2017-05-04T18:16:42Z MADLIB-1098. Corrections for MADlib naming consistency > Corrections for MADlib naming consistency > - > > Key: MADLIB-1098 > URL: https://issues.apache.org/jira/browse/MADLIB-1098 > Project: Apache MADlib > Issue Type: Improvement >Reporter: Rashmi Raghu >Assignee: Rashmi Raghu >Priority: Minor > Fix For: v1.11 > > > Several locations (e.g. Read Me screen / Intro screen and others) which > contain the name MADlib should be changed to 'Apache MADlib (Incubating)'. > Based on observations from the community on dev and user mailing lists (see > below for excerpts from those discussions). > > Copying relevant excerpts from Ed's email: > http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201705.mbox/%3CCAHAuQDyn-drvZ64%2B4MYL2A%2BbDdea%3DtTew2boSz8ChdUPH2Aj_Q%40mail.gmail.com%3E > > == > > Source miscelaneous: HAWQ_Install.txt > > > > Observation: > > > > - The file references the product name as "MADlib" and not "Apache > > MADlib (Incubating). Is this file still valid? > > > > == > > CONVENIENCE BINARIES > > -- > > > > -- > > Mac Installer DMG file: apache-madlib-1.11-incubating-bin-Darwin.dmg > > -- > > > > Observation: > > > > - The DMG(apache-madlib-1.11-incubating-bin-Darwin.dmg) contains a > > pkg file named "madlib-1.11-Darwin.pkg". Shouldn't it be called > > "apache-madlib-1.11-incubating-Darwin.pkg"? > > > > Similarly, the DMG base folder name is madlib-1.11.Darwin. > > > > Mac Installer Package > > > > o Introduction screen > > > > Observation: > > > > - The introduction screen identifies the product name as > > "MADlib". Shouldn't there be a mention of the project name being > > "Apache MADlib (Incubating)". > > > > o Read Me screen > > > > Observation: > > > > - Similar to initial screen, there is no mention to the Apache > > project except for the link to the project's wiki. > > > > o Remaining screens look reasonable (with exception of no Apache > > references). > > > > o The default application window name is "Install MADlib" > > > > Observation: > > > > - Similar to Introduction sreen, should the name be "Install Apache > > MADlib (Incubating)"? > > > > - Look for other opportunities to reference the product name as > > "Apache MADlib (Incubating)". > > > > -- > > Linux RPM: apache-madlib-1.11-incubating-bin-Linux.rpm > > -- > > > > Observation: > > > > - It appears the SPEC file used (possibly generated) references the > > product name as "madlib". Again, shouldn't there be references to > > the product name as "Apache MADlib" scattered about? > > Unfortunately, I am not sure if this should change or not. It > > might help for someone on the team to review other Apache projects > > convenience binary RPMs to see if something should be > > addressed. The podling's mentor might be able to provide > > additional direction as well. > > > > This can be seen in the following "rpm -qi madlib" output: > > > > [root@e0f4d3349d2d MADlib]# rpm -qi madlib > > Name: madlib > > Version : 1.11 > > Release : 1 > > Architecture: x86_64 > > Install Date: Wed May 3 04:00:10 2017 > > Group : Development/Libraries > > Size: 83575356 > > License : ASL 2.0 > > Signature : (none) > > Source RPM
[jira] [Comment Edited] (MADLIB-1098) Corrections for MADlib naming consistency
[ https://issues.apache.org/jira/browse/MADLIB-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997139#comment-15997139 ] Frank McQuillan edited comment on MADLIB-1098 at 5/4/17 6:15 PM: - >From Ed: Apache Release Audit Tool (RAT): Observation: - I happened to open the file "CMakeLists.txt" in the root directory and noticed it does not have the standard ASF header. I know there were IP issues resolved globally for the project recently. I noticed many of them are excluded in the pom.xml file. Regardless of the IP issues, shouldn't these files contain the ASF header? Comment from Frank: CMakeLists.txt * does not need the standard ASF header since it existed before the move to ASF, as per ASF legal guidance https://issues.apache.org/jira/browse/LEGAL-293 >From Ed: == Source miscelaneous: HAWQ_Install.txt Observation: - The file references the product name as "MADlib" and not "Apache MADlib (Incubating). Is this file still valid? Comment from Frank: The file HAWQ_Install.txt is still valid but could be updated: https://github.com/apache/incubator-madlib/blob/master/HAWQ_Install.txt remove this whole paragraph since it is too old: “Upgrading HAWQ from 1.1 to 1.2 -- In HAWQ 1.1 a portion of MADlib v0.5 came preinstalled. These functions in their original form are incompatible with HAWQ 1.2 and will be removed as part of the HAWQ 1.2 upgrade. Dependencies on MADlib 0.5 should be removed from the installation before performing the HAWQ 1.2 upgrade. When the HAWQ upgrade is complete, install MADlib 1.5 or higher and then reinstall the MADlib database objects using the madpack utility.” Change first occurrence of MADlib, write as “Apache MADlib (incubating)” with verbage on what “incubating” means. was (Author: fmcquillan): >From Ed: Apache Release Audit Tool (RAT): Observation: - I happened to open the file "CMakeLists.txt" in the root directory and noticed it does not have the standard ASF header. I know there were IP issues resolved globally for the project recently. I noticed many of them are excluded in the pom.xml file. Regardless of the IP issues, shouldn't these files contain the ASF header? Comment: CMakeLists.txt * does not need the standard ASF header since it existed before the move to ASF, as per ASF legal guidance https://issues.apache.org/jira/browse/LEGAL-293 >From Ed: == Source miscelaneous: HAWQ_Install.txt Observation: - The file references the product name as "MADlib" and not "Apache MADlib (Incubating). Is this file still valid? Comment: The file HAWQ_Install.txt is still valid but could be updated: https://github.com/apache/incubator-madlib/blob/master/HAWQ_Install.txt remove this whole paragraph since it is too old: “Upgrading HAWQ from 1.1 to 1.2 -- In HAWQ 1.1 a portion of MADlib v0.5 came preinstalled. These functions in their original form are incompatible with HAWQ 1.2 and will be removed as part of the HAWQ 1.2 upgrade. Dependencies on MADlib 0.5 should be removed from the installation before performing the HAWQ 1.2 upgrade. When the HAWQ upgrade is complete, install MADlib 1.5 or higher and then reinstall the MADlib database objects using the madpack utility.” Change first occurrence of MADlib, write as “Apache MADlib (incubating)” with verbage on what “incubating” means. > Corrections for MADlib naming consistency > - > > Key: MADLIB-1098 > URL: https://issues.apache.org/jira/browse/MADLIB-1098 > Project: Apache MADlib > Issue Type: Improvement >Reporter: Rashmi Raghu >Assignee: Rashmi Raghu >Priority: Minor > Fix For: v1.11 > > > Several locations (e.g. Read Me screen / Intro screen and others) which > contain the name MADlib should be changed to 'Apache MADlib (Incubating)'. > Based on observations from the community on dev and user mailing lists (see > below for excerpts from those discussions). > > Copying relevant excerpts from Ed's email: > http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201705.mbox/%3CCAHAuQDyn-drvZ64%2B4MYL2A%2BbDdea%3DtTew2boSz8ChdUPH2Aj_Q%40mail.gmail.com%3E > > == > > Source miscelaneous: HAWQ_Install.txt > > > > Observation: > > > > - The file references the product name as "MADlib" and not "Apache > > MADlib (Incubating). Is this file still valid? > > > > == > > CONVENIENCE BINARIES > >
[jira] [Assigned] (MADLIB-1098) Corrections for MADlib naming consistency
[ https://issues.apache.org/jira/browse/MADLIB-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank McQuillan reassigned MADLIB-1098: --- Assignee: Rashmi Raghu > Corrections for MADlib naming consistency > - > > Key: MADLIB-1098 > URL: https://issues.apache.org/jira/browse/MADLIB-1098 > Project: Apache MADlib > Issue Type: Improvement >Reporter: Rashmi Raghu >Assignee: Rashmi Raghu >Priority: Minor > Fix For: v1.11 > > > Several locations (e.g. Read Me screen / Intro screen and others) which > contain the name MADlib should be changed to 'Apache MADlib (Incubating)'. > Based on observations from the community on dev and user mailing lists (see > below for excerpts from those discussions). > > Copying relevant excerpts from Ed's email: > http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201705.mbox/%3CCAHAuQDyn-drvZ64%2B4MYL2A%2BbDdea%3DtTew2boSz8ChdUPH2Aj_Q%40mail.gmail.com%3E > > == > > Source miscelaneous: HAWQ_Install.txt > > > > Observation: > > > > - The file references the product name as "MADlib" and not "Apache > > MADlib (Incubating). Is this file still valid? > > > > == > > CONVENIENCE BINARIES > > -- > > > > -- > > Mac Installer DMG file: apache-madlib-1.11-incubating-bin-Darwin.dmg > > -- > > > > Observation: > > > > - The DMG(apache-madlib-1.11-incubating-bin-Darwin.dmg) contains a > > pkg file named "madlib-1.11-Darwin.pkg". Shouldn't it be called > > "apache-madlib-1.11-incubating-Darwin.pkg"? > > > > Similarly, the DMG base folder name is madlib-1.11.Darwin. > > > > Mac Installer Package > > > > o Introduction screen > > > > Observation: > > > > - The introduction screen identifies the product name as > > "MADlib". Shouldn't there be a mention of the project name being > > "Apache MADlib (Incubating)". > > > > o Read Me screen > > > > Observation: > > > > - Similar to initial screen, there is no mention to the Apache > > project except for the link to the project's wiki. > > > > o Remaining screens look reasonable (with exception of no Apache > > references). > > > > o The default application window name is "Install MADlib" > > > > Observation: > > > > - Similar to Introduction sreen, should the name be "Install Apache > > MADlib (Incubating)"? > > > > - Look for other opportunities to reference the product name as > > "Apache MADlib (Incubating)". > > > > -- > > Linux RPM: apache-madlib-1.11-incubating-bin-Linux.rpm > > -- > > > > Observation: > > > > - It appears the SPEC file used (possibly generated) references the > > product name as "madlib". Again, shouldn't there be references to > > the product name as "Apache MADlib" scattered about? > > Unfortunately, I am not sure if this should change or not. It > > might help for someone on the team to review other Apache projects > > convenience binary RPMs to see if something should be > > addressed. The podling's mentor might be able to provide > > additional direction as well. > > > > This can be seen in the following "rpm -qi madlib" output: > > > > [root@e0f4d3349d2d MADlib]# rpm -qi madlib > > Name: madlib > > Version : 1.11 > > Release : 1 > > Architecture: x86_64 > > Install Date: Wed May 3 04:00:10 2017 > > Group : Development/Libraries > > Size: 83575356 > > License : ASL 2.0 > > Signature : (none) > > Source RPM : madlib-1.11-1.src.rpm > > Build Date : Tue May 2 19:03:21 2017 > > Build Host : gpdb1.eng.pivotal.io > > Relocations : /usr/local > > Vendor : MADlib > > Summary : Open-Source Library for Scalable in-Database > > Analytics > > Description : > > MADlib is an open-source library for scalable in-database > > analytics. It > > provides data-parallel implementations of mathematical, > > statistical and > > machine learning methods for structured and unstructured data. > > > > The MADlib mission: to foster widespread development of scalable > > analytic skills, by harnessing efforts from commercial practice, > > academic research, and open-source development.
[jira] [Commented] (MADLIB-1098) Corrections for MADlib naming consistency
[ https://issues.apache.org/jira/browse/MADLIB-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997139#comment-15997139 ] Frank McQuillan commented on MADLIB-1098: - >From Ed: Apache Release Audit Tool (RAT): Observation: - I happened to open the file "CMakeLists.txt" in the root directory and noticed it does not have the standard ASF header. I know there were IP issues resolved globally for the project recently. I noticed many of them are excluded in the pom.xml file. Regardless of the IP issues, shouldn't these files contain the ASF header? Comment: CMakeLists.txt * does not need the standard ASF header since it existed before the move to ASF, as per ASF legal guidance https://issues.apache.org/jira/browse/LEGAL-293 >From Ed: == Source miscelaneous: HAWQ_Install.txt Observation: - The file references the product name as "MADlib" and not "Apache MADlib (Incubating). Is this file still valid? Comment: The file HAWQ_Install.txt is still valid but could be updated: https://github.com/apache/incubator-madlib/blob/master/HAWQ_Install.txt remove this whole paragraph since it is too old: “Upgrading HAWQ from 1.1 to 1.2 -- In HAWQ 1.1 a portion of MADlib v0.5 came preinstalled. These functions in their original form are incompatible with HAWQ 1.2 and will be removed as part of the HAWQ 1.2 upgrade. Dependencies on MADlib 0.5 should be removed from the installation before performing the HAWQ 1.2 upgrade. When the HAWQ upgrade is complete, install MADlib 1.5 or higher and then reinstall the MADlib database objects using the madpack utility.” Change first occurrence of MADlib, write as “Apache MADlib (incubating)” with verbage on what “incubating” means. > Corrections for MADlib naming consistency > - > > Key: MADLIB-1098 > URL: https://issues.apache.org/jira/browse/MADLIB-1098 > Project: Apache MADlib > Issue Type: Improvement >Reporter: Rashmi Raghu >Priority: Minor > Fix For: v1.11 > > > Several locations (e.g. Read Me screen / Intro screen and others) which > contain the name MADlib should be changed to 'Apache MADlib (Incubating)'. > Based on observations from the community on dev and user mailing lists (see > below for excerpts from those discussions). > > Copying relevant excerpts from Ed's email: > http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201705.mbox/%3CCAHAuQDyn-drvZ64%2B4MYL2A%2BbDdea%3DtTew2boSz8ChdUPH2Aj_Q%40mail.gmail.com%3E > > == > > Source miscelaneous: HAWQ_Install.txt > > > > Observation: > > > > - The file references the product name as "MADlib" and not "Apache > > MADlib (Incubating). Is this file still valid? > > > > == > > CONVENIENCE BINARIES > > -- > > > > -- > > Mac Installer DMG file: apache-madlib-1.11-incubating-bin-Darwin.dmg > > -- > > > > Observation: > > > > - The DMG(apache-madlib-1.11-incubating-bin-Darwin.dmg) contains a > > pkg file named "madlib-1.11-Darwin.pkg". Shouldn't it be called > > "apache-madlib-1.11-incubating-Darwin.pkg"? > > > > Similarly, the DMG base folder name is madlib-1.11.Darwin. > > > > Mac Installer Package > > > > o Introduction screen > > > > Observation: > > > > - The introduction screen identifies the product name as > > "MADlib". Shouldn't there be a mention of the project name being > > "Apache MADlib (Incubating)". > > > > o Read Me screen > > > > Observation: > > > > - Similar to initial screen, there is no mention to the Apache > > project except for the link to the project's wiki. > > > > o Remaining screens look reasonable (with exception of no Apache > > references). > > > > o The default application window name is "Install MADlib" > > > > Observation: > > > > - Similar to Introduction sreen, should the name be "Install Apache > > MADlib (Incubating)"? > > > > - Look for other opportunities to reference the product name as > > "Apache MADlib (Incubating)". > > > > -- > > Linux RPM: apache-madlib-1.11-incubating-bin-Linux.rpm > > -- > > > > Observation: > > > > - It appears the SPEC file used (possibly generated) references the > > product name as "madlib". Again, shouldn't there be
[jira] [Updated] (MADLIB-1098) Corrections for MADlib naming consistency
[ https://issues.apache.org/jira/browse/MADLIB-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rashmi Raghu updated MADLIB-1098: - Priority: Minor (was: Major) > Corrections for MADlib naming consistency > - > > Key: MADLIB-1098 > URL: https://issues.apache.org/jira/browse/MADLIB-1098 > Project: Apache MADlib > Issue Type: Improvement >Reporter: Rashmi Raghu >Priority: Minor > Fix For: v1.11 > > > Several locations (e.g. Read Me screen / Intro screen and others) which > contain the name MADlib should be changed to 'Apache MADlib (Incubating)'. > Based on observations from the community on dev and user mailing lists (see > below for excerpts from those discussions). > > Copying relevant excerpts from Ed's email: > http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201705.mbox/%3CCAHAuQDyn-drvZ64%2B4MYL2A%2BbDdea%3DtTew2boSz8ChdUPH2Aj_Q%40mail.gmail.com%3E > > == > > Source miscelaneous: HAWQ_Install.txt > > > > Observation: > > > > - The file references the product name as "MADlib" and not "Apache > > MADlib (Incubating). Is this file still valid? > > > > == > > CONVENIENCE BINARIES > > -- > > > > -- > > Mac Installer DMG file: apache-madlib-1.11-incubating-bin-Darwin.dmg > > -- > > > > Observation: > > > > - The DMG(apache-madlib-1.11-incubating-bin-Darwin.dmg) contains a > > pkg file named "madlib-1.11-Darwin.pkg". Shouldn't it be called > > "apache-madlib-1.11-incubating-Darwin.pkg"? > > > > Similarly, the DMG base folder name is madlib-1.11.Darwin. > > > > Mac Installer Package > > > > o Introduction screen > > > > Observation: > > > > - The introduction screen identifies the product name as > > "MADlib". Shouldn't there be a mention of the project name being > > "Apache MADlib (Incubating)". > > > > o Read Me screen > > > > Observation: > > > > - Similar to initial screen, there is no mention to the Apache > > project except for the link to the project's wiki. > > > > o Remaining screens look reasonable (with exception of no Apache > > references). > > > > o The default application window name is "Install MADlib" > > > > Observation: > > > > - Similar to Introduction sreen, should the name be "Install Apache > > MADlib (Incubating)"? > > > > - Look for other opportunities to reference the product name as > > "Apache MADlib (Incubating)". > > > > -- > > Linux RPM: apache-madlib-1.11-incubating-bin-Linux.rpm > > -- > > > > Observation: > > > > - It appears the SPEC file used (possibly generated) references the > > product name as "madlib". Again, shouldn't there be references to > > the product name as "Apache MADlib" scattered about? > > Unfortunately, I am not sure if this should change or not. It > > might help for someone on the team to review other Apache projects > > convenience binary RPMs to see if something should be > > addressed. The podling's mentor might be able to provide > > additional direction as well. > > > > This can be seen in the following "rpm -qi madlib" output: > > > > [root@e0f4d3349d2d MADlib]# rpm -qi madlib > > Name: madlib > > Version : 1.11 > > Release : 1 > > Architecture: x86_64 > > Install Date: Wed May 3 04:00:10 2017 > > Group : Development/Libraries > > Size: 83575356 > > License : ASL 2.0 > > Signature : (none) > > Source RPM : madlib-1.11-1.src.rpm > > Build Date : Tue May 2 19:03:21 2017 > > Build Host : gpdb1.eng.pivotal.io > > Relocations : /usr/local > > Vendor : MADlib > > Summary : Open-Source Library for Scalable in-Database > > Analytics > > Description : > > MADlib is an open-source library for scalable in-database > > analytics. It > > provides data-parallel implementations of mathematical, > > statistical and > > machine learning methods for structured and unstructured data. > > > > The MADlib mission: to foster widespread development of scalable > > analytic skills, by harnessing efforts from commercial practice, > > academic research, and open-source development. > > > > To more information, pleas
[jira] [Updated] (MADLIB-1098) Corrections for MADlib naming consistency
[ https://issues.apache.org/jira/browse/MADLIB-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rashmi Raghu updated MADLIB-1098: - Priority: Major (was: Minor) Description: Several locations (e.g. Read Me screen / Intro screen and others) which contain the name MADlib should be changed to 'Apache MADlib (Incubating)'. Based on observations from the community on dev and user mailing lists (see below for excerpts from those discussions). Copying relevant excerpts from Ed's email: http://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201705.mbox/%3CCAHAuQDyn-drvZ64%2B4MYL2A%2BbDdea%3DtTew2boSz8ChdUPH2Aj_Q%40mail.gmail.com%3E > == > Source miscelaneous: HAWQ_Install.txt > > Observation: > > - The file references the product name as "MADlib" and not "Apache > MADlib (Incubating). Is this file still valid? > > == > CONVENIENCE BINARIES > -- > > -- > Mac Installer DMG file: apache-madlib-1.11-incubating-bin-Darwin.dmg > -- > > Observation: > > - The DMG(apache-madlib-1.11-incubating-bin-Darwin.dmg) contains a > pkg file named "madlib-1.11-Darwin.pkg". Shouldn't it be called > "apache-madlib-1.11-incubating-Darwin.pkg"? > > Similarly, the DMG base folder name is madlib-1.11.Darwin. > > Mac Installer Package > > o Introduction screen > > Observation: > > - The introduction screen identifies the product name as > "MADlib". Shouldn't there be a mention of the project name being > "Apache MADlib (Incubating)". > > o Read Me screen > > Observation: > > - Similar to initial screen, there is no mention to the Apache > project except for the link to the project's wiki. > > o Remaining screens look reasonable (with exception of no Apache > references). > > o The default application window name is "Install MADlib" > > Observation: > > - Similar to Introduction sreen, should the name be "Install Apache > MADlib (Incubating)"? > > - Look for other opportunities to reference the product name as > "Apache MADlib (Incubating)". > > -- > Linux RPM: apache-madlib-1.11-incubating-bin-Linux.rpm > -- > > Observation: > > - It appears the SPEC file used (possibly generated) references the > product name as "madlib". Again, shouldn't there be references to > the product name as "Apache MADlib" scattered about? > Unfortunately, I am not sure if this should change or not. It > might help for someone on the team to review other Apache projects > convenience binary RPMs to see if something should be > addressed. The podling's mentor might be able to provide > additional direction as well. > > This can be seen in the following "rpm -qi madlib" output: > > [root@e0f4d3349d2d MADlib]# rpm -qi madlib > Name: madlib > Version : 1.11 > Release : 1 > Architecture: x86_64 > Install Date: Wed May 3 04:00:10 2017 > Group : Development/Libraries > Size: 83575356 > License : ASL 2.0 > Signature : (none) > Source RPM : madlib-1.11-1.src.rpm > Build Date : Tue May 2 19:03:21 2017 > Build Host : gpdb1.eng.pivotal.io > Relocations : /usr/local > Vendor : MADlib > Summary : Open-Source Library for Scalable in-Database > Analytics > Description : > MADlib is an open-source library for scalable in-database > analytics. It > provides data-parallel implementations of mathematical, > statistical and > machine learning methods for structured and unstructured data. > > The MADlib mission: to foster widespread development of scalable > analytic skills, by harnessing efforts from commercial practice, > academic research, and open-source development. > > To more information, please see the MADlib wiki at > https://cwiki.apache.org/confluence/display/MADLIB > > -- was:Several locations (e.g. Read Me screen / Intro screen and others) which contain the name MADlib should be changed to 'Apache MADlib (Incubating)'. > Corrections for MADlib naming consistency > - > > Key: MADLIB-1098 > URL: https://issues.apache.org/jira/browse/MADLIB-1098 > Project: Apache MADlib >
[jira] [Updated] (MADLIB-1098) Corrections for MADlib naming consistency
[ https://issues.apache.org/jira/browse/MADLIB-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank McQuillan updated MADLIB-1098: Priority: Minor (was: Major) > Corrections for MADlib naming consistency > - > > Key: MADLIB-1098 > URL: https://issues.apache.org/jira/browse/MADLIB-1098 > Project: Apache MADlib > Issue Type: Improvement >Reporter: Rashmi Raghu >Priority: Minor > Fix For: v1.11 > > > Several locations (e.g. Read Me screen / Intro screen and others) which > contain the name MADlib should be changed to 'Apache MADlib (Incubating)'. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (MADLIB-1098) Corrections for MADlib naming consistency
Rashmi Raghu created MADLIB-1098: Summary: Corrections for MADlib naming consistency Key: MADLIB-1098 URL: https://issues.apache.org/jira/browse/MADLIB-1098 Project: Apache MADlib Issue Type: Improvement Reporter: Rashmi Raghu Fix For: v1.11 Several locations (e.g. Read Me screen / Intro screen and others) which contain the name MADlib should be changed to 'Apache MADlib (Incubating)'. -- This message was sent by Atlassian JIRA (v6.3.15#6346)