IMO 1) Option to define resulting column names. Please see pdltools implementation - the ability to pass in a function is especially useful ( http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html) 2) Option to dummy code only the top n most frequently occurring values in any column 3) Option to create numeric column names (E.g. pivotcol_val1, pivotcol_val2 ...) instead of values in column names + secondary mapping table 4) Option to exclude original column from results table
(1) & (2) are much higher priority than (3) & (4). Agreed that these could also be applied to Pivoting (especially 1). Jarrod Vawdrey Sr. Data Scientist Data Science & Engineering | Pivotal (650) 315-8905 https://pivotal.io/ On Wed, Oct 19, 2016 at 4:47 PM, Frank McQuillan <fmcquil...@pivotal.io> wrote: > Thanks for those suggestions, Jarrod. They all sound pretty useful - > would you mind taking a crack at numbering them 1,2,3... etc, in the order > of priority as you see it? > > Also it seems like some of these could be applied to the Pivot function as > well, e.g., UDF for column naming. > > Frank > > > > On Fri, Oct 14, 2016 at 1:02 PM, Jarrod Vawdrey <jvawd...@pivotal.io> > wrote: > >> Hey Frank, >> >> How are special character values handled today? It is often not ideal to >> end up with column names that require double quotes to call due to >> downstream scripts. >> >> A couple of features that would be useful >> >> * Option to define resulting column names. Please see pdltools >> implementation - the ability to pass in a function is especially useful ( >> http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html) >> * Option to dummy code only the top n most frequently occurring values in >> any column >> * Option to exclude original column from results table >> * Option to create numeric column names (E.g. pivotcol_val1, >> pivotcol_val2 ...) instead of values in column names + secondary mapping >> table >> >> Thank you >> >> Jarrod Vawdrey >> Sr. Data Scientist >> Data Science & Engineering | Pivotal >> (650) 315-8905 >> https://pivotal.io/ >> >> On Fri, Oct 14, 2016 at 3:35 PM, Frank McQuillan <fmcquil...@pivotal.io> >> wrote: >> >>> For the module encoding categorical variables >>> http://madlib.incubator.apache.org/docs/latest/group__grp__d >>> ata__prep.html >>> does anyone have any suggestions on improvements that we could make? >>> >>> Here is a video on how encoding categorical variables works for those not >>> familiar with it >>> https://www.youtube.com/watch?v=zxGgGMGJZRo&index=7&list=PL6 >>> 2pIycqXx-Qf6EXu5FDxUgXW23BHOtcQ >>> >> >> >