Re: Feature generation / aggregate functions / timeseries

2017-12-14 Thread Georg Heiler
Also the rdd stat counter will already conpute most of your desired metrics as well as df.describe https://databricks.com/blog/2015/06/02/statistical-and-mathematical-functions-with-dataframes-in-spark.html Georg Heiler schrieb am Do. 14. Dez. 2017 um 19:40: > Look at

Re: Feature generation / aggregate functions / timeseries

2017-12-14 Thread Georg Heiler
Look at custom UADF functions. schrieb am Do. 14. Dez. 2017 um 09:31: > Hi dear spark community ! > > I want to create a lib which generates features for potentially very > large datasets, so I believe spark could be a nice tool for that. > Let me explain what I need to do

Feature generation / aggregate functions / timeseries

2017-12-14 Thread julio . cesare
Hi dear spark community ! I want to create a lib which generates features for potentially very large datasets, so I believe spark could be a nice tool for that. Let me explain what I need to do : Each file 'F' of my dataset is composed of at least : - an id ( string or int ) - a timestamp (