[jira] [Commented] (MADLIB-1086) Unnest 2-D array by one level (i.e. into rows of 1-D arrays)

2017-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971914#comment-15971914
 ] 

ASF GitHub Bot commented on MADLIB-1086:


GitHub user rashmi815 opened a pull request:

https://github.com/apache/incubator-madlib/pull/116

Unnest 2d array

Array Operations: Add function to unnest 2-D arrays into rows of 1-D arrays

JIRA:  MADLIB-1086

Function to unnest 2-D array by one level (i.e. into rows of 1-D arrays).
This is needed, for instance, in K-means, so that we can get one centroid 
per row for follow on operations.
- Added function to array operations
- Added an example in k-means to demonstrate usage

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rashmi815/incubator-madlib unnest_2d_array

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-madlib/pull/116.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #116


commit 18e562813702d12d620594598f471161a990fbbd
Author: Rashmi Raghu 
Date:   2017-04-15T00:08:17Z

Unnest function, install-check tests completed. Initial docs included

commit 2a4baffa29c8f976d3260931c1790cfc125e91f4
Author: Rashmi Raghu 
Date:   2017-04-15T06:20:01Z

Refactored names of function output columns

commit a3eae964adc84382fa674e4d95c486f472b14099
Author: Rashmi Raghu 
Date:   2017-04-17T23:45:32Z

Updated docs (array_ops and k-means) and minor update to install-check tests




> Unnest 2-D array by one level (i.e. into rows of 1-D arrays)
> 
>
> Key: MADLIB-1086
> URL: https://issues.apache.org/jira/browse/MADLIB-1086
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Utilities
>Reporter: Frank McQuillan
>Assignee: Rashmi Raghu
>Priority: Minor
> Fix For: v1.11
>
>
> Context
> Currently k-means returns the following
> {code}
> centroids| 
> {{13.75333,1.905,2.425,16.06667,90.3,2.805,2.98,0.29,2.005,5.406633,1.041667,
>  3.318333,1020.833},
>
> {14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}}
> cluster_variance | {122999.110416013,30561.74805}
> objective_fn | 153560.858466013
> frac_reassigned  | 0
> num_iterations   | 3
> {code}
> Story
> As a data scientist, I want to unnest 2-D array by one level (i.e. into rows 
> of 1-D arrays) in K-means, so that I can get one centroid per row for follow 
> on operations.
> Acceptance
> 1) Add function to array operations
> http://madlib.incubator.apache.org/docs/latest/group__grp__array.html
> 2) Add an example in k-means
>  http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html
> to demonstrate usage



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MADLIB-1089) Install check errors on HAWQ 2.2 when install MADlib on non-default schema

2017-04-17 Thread Frank McQuillan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MADLIB-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1089:

Fix Version/s: (was: v1.11)
   v1.12

> Install check errors on HAWQ 2.2 when install MADlib on non-default schema
> --
>
> Key: MADLIB-1089
> URL: https://issues.apache.org/jira/browse/MADLIB-1089
> Project: Apache MADlib
>  Issue Type: Bug
>  Components: All Modules
>Reporter: Frank McQuillan
>Priority: Minor
> Fix For: v1.12
>
> Attachments: k-means-IC-fail-on-hawq-2dot2, 
> linalg-IC-fail-on-hawq-2dot2
>
>
> Running install-check on a non-default schema in HAWQ 2.2 results in errors 
> for lining and means.
> {code}
> MADlib version: 1.10.0, git revision: rel/v1.9.1-58-ga3863b6, cmake 
> configuration time: Wed Mar  8 19:49:45 UTC 2017, build type: Release, bui
> ld system: Linux-2.6.18-238.27.1.el5.hotfix.bz516490, C compiler: gcc 4.4.0, 
> C++ compiler: g++ 4.4.0
>  PostgreSQL 8.2.15 (Greenplum Database 4.2.0 build 1) (HAWQ 2.2.0.0 build 
> 4141) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.8.5 20
> 150623 (Red Hat 4.8.5-11) compiled on Mar 30 2017 21:45:26
> {code}
> See attached log files and summaries below:
> linalg.sql_in
> {code}
> psql:/tmp/madlib.sGu72l/linalg/test/linalg.sql_in.tmp:165: ERROR:  Function 
> "closest_column(double precision[],double precision[],text)": Inval
> id distance metric provided: madlib1.squared_dist_norm2. Currently only 
> madlib provided distance functions are supported.
> {code}
> kmeans.sql_in
> {code}
> psql:/tmp/madlib.sGu72l/kmeans/test/kmeans.sql_in.tmp:117: ERROR:  
> plpy.SPIError: Function "closest_column(double precision[],double precision[
> ],text)": Invalid distance metric provided: madlib1.squared_dist_norm2. 
> Currently only madlib provided distance functions are supported.  (seg1
>  ip-10-32-127-188.ore6.vpc.pivotal.io:4 pid=483012) (plpython.c:4663)
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "internal_compute_kmeanspp_seeding", line 22, in 
> return kmeans.compute_kmeanspp_seeding(**globals())
>   PL/Python function "internal_compute_kmeanspp_seeding", line 154, in 
> compute_kmeanspp_seeding
>   PL/Python function "internal_compute_kmeanspp_seeding", line 415, in update
> PL/Python function "internal_compute_kmeanspp_seeding"
> SQL statement "SELECT  ( SELECT madlib1.internal_compute_kmeanspp_seeding( 
> '_madlib_kmeanspp_args', '_madlib_kmeanspp_state', textin(regclassou
> t( $1 )),  $2 ) )"
> PL/pgSQL function "kmeanspp_seeding" line 83 at assignment
> SQL statement "SELECT  madlib1.kmeans(  $1 ,  $2 , madlib1.kmeanspp_seeding( 
> $1 ,  $2 ,  $3 ,  $4 , NULL,  $5 ),  $4 ,  $6 ,  $7 ,  $8 )"
> PL/pgSQL function "kmeanspp" line 4 at assignment
> SQL statement "SELECT  madlib1.kmeanspp( $1 ,  $2 ,  $3 , 
> 'madlib1.squared_dist_norm2'::VARCHAR, 'madlib1.avg'::VARCHAR, 20::INTEGER, 
> 0.001::DO
> UBLE PRECISION, 1.0::DOUBLE PRECISION)"
> PL/pgSQL function "kmeanspp" line 4 at assignment
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MADLIB-1077) Double check binary distribution

2017-04-17 Thread Frank McQuillan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MADLIB-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan reassigned MADLIB-1077:
---

Assignee: Roman Shaposhnik  (was: Frank McQuillan)

> Double check binary distribution
> 
>
> Key: MADLIB-1077
> URL: https://issues.apache.org/jira/browse/MADLIB-1077
> Project: Apache MADlib
>  Issue Type: Task
>  Components: All Modules
>Reporter: Frank McQuillan
>Assignee: Roman Shaposhnik
>Priority: Minor
> Fix For: v1.11
>
>
> Double check that binary distribution licensing issues are all OK.
> For example, see comments from Ed Espino on 1.10 RC-2 review on thread
> https://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201703.mbox/%3CCAHAuQDzarS7K4u-rOsLLhbwSHCyFn5cKSyjLinE%2BZ%3DjSpU59qw%40mail.gmail.com%3E
> {code}
>  I was performing the build from a simple perspective. Download
>   source, configure, make and glance at docs (in this order).
>   As we have dealt with auto-downloaded files in the HAWQ project, I
>   was a surprised that the following packages were automatically
>   downloaded for me. On the HAWQ project we were instructed to require
>   these as pre-requisites and or make them optional included via
>   command line options (configure).  I'm guessing other packages would
>   have been automatically downloaded if they were not found on system
>   (eg: boost).
>   Automatically downloaded packages:
>   https://github.com/madlib/eigen/archive/branches/3.2.tar.gz
>   http://sourceforge.net/projects/pyxb/files/PyXB-1.2.4.tar.gz
>   
> Issue: As "make" was running, the following message was a bit alarming:
>PyXB: Removing GPL component from code base
>   
> This comes from the script src/patch/PyXB.sh run after PyXB source
> is downloaded.
> 
>   ...
>   echo "PyXB: Removing GPL component from code base"
>   rm -f doc/extapi.py
>   rm -f doc/extapi.pyc
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MADLIB-1081) Graph - add grouping to shortest path

2017-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971444#comment-15971444
 ] 

ASF GitHub Bot commented on MADLIB-1081:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-madlib/pull/113


> Graph - add grouping to shortest path
> -
>
> Key: MADLIB-1081
> URL: https://issues.apache.org/jira/browse/MADLIB-1081
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Module: Graph
>Reporter: Frank McQuillan
>Assignee: Orhan Kislal
>Priority: Minor
> Fix For: v1.11
>
>
> * Add a GROUP BY column to the edge table
> * Because wants to run SSSP on the different server graphs defined for users, 
> i.e., group by userID



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (MADLIB-1081) Graph - add grouping to shortest path

2017-04-17 Thread Frank McQuillan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MADLIB-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan closed MADLIB-1081.
---

> Graph - add grouping to shortest path
> -
>
> Key: MADLIB-1081
> URL: https://issues.apache.org/jira/browse/MADLIB-1081
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Module: Graph
>Reporter: Frank McQuillan
>Assignee: Orhan Kislal
>Priority: Minor
> Fix For: v1.11
>
>
> * Add a GROUP BY column to the edge table
> * Because wants to run SSSP on the different server graphs defined for users, 
> i.e., group by userID



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (MADLIB-1081) Graph - add grouping to shortest path

2017-04-17 Thread Frank McQuillan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MADLIB-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan resolved MADLIB-1081.
-
Resolution: Fixed

> Graph - add grouping to shortest path
> -
>
> Key: MADLIB-1081
> URL: https://issues.apache.org/jira/browse/MADLIB-1081
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Module: Graph
>Reporter: Frank McQuillan
>Assignee: Orhan Kislal
>Priority: Minor
> Fix For: v1.11
>
>
> * Add a GROUP BY column to the edge table
> * Because wants to run SSSP on the different server graphs defined for users, 
> i.e., group by userID



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)