Hi dear spark community !
I want to create a lib which generates features for potentially very
large datasets, so I believe spark could be a nice tool for that.
Let me explain what I need to do :
Each file 'F' of my dataset is composed of at least :
- an id ( string or int )
- a timestamp (
Hello,
I want to create a lib which generates features for potentially very
large datasets.
Each file 'F' of my dataset is composed of at least :
- an id ( string or int )
- a timestamp ( or a long value )
- a value ( int or string )
I want my tool to :
- compute aggregate function for many