subject:"Re\: How to preserve\/preset partition information when load time series data\?"

Re: How to preserve/preset partition information when load time series data?

2015-03-16 Thread Imran Rashid

Hi Shuai, It should certainly be possible to do it that way, but I would recommend against it. If you look at HadoopRDD, its doing all sorts of little book-keeping that you would most likely want to mimic. eg., tracking the number of bytes records that are read, setting up all the hadoop

Re: How to preserve/preset partition information when load time series data?

2015-03-11 Thread Imran Rashid

It should be *possible* to do what you want ... but if I understand you right, there isn't really any very easy way to do it. I think you would need to write your own subclass of RDD, which has its own logic on how the input files get put divided among partitions. You can probably subclass