Hi, I'm working on updating the online docs for the DML transform() function since a couple things didn't copy over in the conversion to markdown. However, I've run into an issue when I execute the transform() example. In summary, is the "scale" transformation no longer allowed, and "bin" is allowed?
I did the following: I created data.csv: zipcode,district,sqft,numbedrooms,numbathrooms,floors,view,saleprice,askingprice 95141,south,3002,6,3,2,FALSE,929,934 NA,west,1373,,1,3,FALSE,695,698 91312,south,NA,6,2,2,FALSE,902, 94555,NA,1835,3,,3,,888,892 95141,west,2770,5,2.5,,TRUE,812,816 95141,east,2833,6,2.5,2,TRUE,927, 96334,NA,1339,6,3,1,FALSE,672,675 96334,south,2742,6,2.5,2,FALSE,872,876 96334,north,2195,5,2.5,2,FALSE,799,803 I created data.csv.mtd: { "data_type": "frame", "format": "csv", "sep": ",", "header": true, "na.strings": [ "NA", "" ] } I created data.spec.json: { "omit": [ "zipcode" ] ,"impute": [ { "name": "district" , "method": "constant", "value": "south" } ,{ "name": "numbedrooms" , "method": "constant", "value": 2 } ,{ "name": "numbathrooms", "method": "constant", "value": 1 } ,{ "name": "floors" , "method": "constant", "value": 1 } ,{ "name": "view" , "method": "global_mode" } ,{ "name": "askingprice" , "method": "global_mean" } ] ,"recode": [ "zipcode", "district", "numbedrooms", "numbathrooms", "floors", "view" ] ,"bin": [ { "name": "saleprice" , "method": "equi-width", "numbins": 3 } ,{ "name": "sqft" , "method": "equi-width", "numbins": 4 } ] ,"dummycode": [ "district", "numbathrooms", "floors", "view", "saleprice", "sqft" ] ,"scale": [ { "name": "sqft", "method": "mean-subtraction" } ,{ "name": "saleprice", "method": "z-score" } ,{ "name": "askingprice", "method": "z-score" } ] } I executed the following DML: D = read("data.csv"); tfD = transform(target=D, transformSpec="data.spec.json", transformPath="example-transform"); s = sum(tfD); print("Sum = " + s); This generated the following error: java.lang.IllegalArgumentException: Invalid transformations on column ID 3. A column can not be binned and scaled. So, I removed the "scale" from data.spec.json: { "omit": [ "zipcode" ] ,"impute": [ { "name": "district" , "method": "constant", "value": "south" } ,{ "name": "numbedrooms" , "method": "constant", "value": 2 } ,{ "name": "numbathrooms", "method": "constant", "value": 1 } ,{ "name": "floors" , "method": "constant", "value": 1 } ,{ "name": "view" , "method": "global_mode" } ,{ "name": "askingprice" , "method": "global_mean" } ] ,"recode": [ "zipcode", "district", "numbedrooms", "numbathrooms", "floors", "view" ] ,"bin": [ { "name": "saleprice" , "method": "equi-width", "numbins": 3 } ,{ "name": "sqft" , "method": "equi-width", "numbins": 4 } ] ,"dummycode": [ "district", "numbathrooms", "floors", "view", "saleprice", "sqft" ] } This generated: java.lang.RuntimeException: Encountered "NA" in column ID "3", when expecting a numeric value. Consider adding "NA" to na.strings, along with an appropriate imputation method. So, I set "sqft" to be "global_mean" in the "impute" section of the spec. { "omit": [ "zipcode" ] ,"impute": [ { "name": "district" , "method": "constant", "value": "south" } ,{ "name": "numbedrooms" , "method": "constant", "value": 2 } ,{ "name": "numbathrooms", "method": "constant", "value": 1 } ,{ "name": "floors" , "method": "constant", "value": 1 } ,{ "name": "view" , "method": "global_mode" } ,{ "name": "askingprice" , "method": "global_mean" } ,{ "name": "sqft" , "method": "global_mean" } ] ,"recode": [ "zipcode", "district", "numbedrooms", "numbathrooms", "floors", "view" ] ,"bin": [ { "name": "saleprice" , "method": "equi-width", "numbins": 3 } ,{ "name": "sqft" , "method": "equi-width", "numbins": 4 } ] ,"dummycode": [ "district", "numbathrooms", "floors", "view", "saleprice", "sqft" ] } This allowed the DML to execute successfully. So, is "scale" not allowed anymore? And "bin" is allowed (despite the message saying it isn't allowed)? Thank you, Deron