Hi Akash, In this desing document you haven't mentioned how to handle data loading for timeseries datamap for older segments[Existing table]. If the customer's main table data is also stored based on time[increasing time] in different segments,he can use this feature as well.
We can discuss and finalize the solution. -Regards Kumar Vishal On Mon, Sep 30, 2019 at 2:42 PM Akash Nilugal <akashnilu...@gmail.com> wrote: > Hi Ajantha, > > Thanks for the queries and suggestions > > 1. Yes, this is a good suggestion, i ll include this change. Both date and > timestamp columns are supported, will be updated in document. > 2. yes, you are right. > 3. you are right, if the day level is not available, then we will try to > get the whole day data from hour level, if not availaible, as explained in > design document, we will get the data from datamap UNION data from main > table based on user query. > > Regards, > Akash R Nilugal > > > On 2019/09/30 06:56:45, Ajantha Bhat <ajanthab...@gmail.com> wrote: > > + 1 , > > > > I have some suggestions and questions. > > > > 1. In DMPROPERTIES, instead of 'timestamp_column' suggest to use > > 'timeseries_column'. > > so that it won't give an impression that only time stamp datatype is > > supported and update the document with all the datatype supported. > > > > 2. Querying on this datamap table is also supported right ? supporting > > changing plan for main table to refer datamap table is for user to avoid > > changing his query or any other reason ? > > > > 3. If user has not created day granularity datamap, but just created hour > > granularity datamap. When query has day granularity, data will be fetched > > form hour granularity datamap and aggregated ? or data is fetched from > main > > table ? > > > > Thanks, > > Ajantha > > > > On Mon, Sep 30, 2019 at 11:46 AM Akash Nilugal <akashnilu...@gmail.com> > > wrote: > > > > > Hi xuchuanyin, > > > > > > Thanks for the comments/Suggestions > > > > > > 1. Preaggregate is productized, but not the timeseries with > preaggregate, > > > i think you got confused with that, if im right. > > > 2. Limitations like, auto sampling or rollup, which we will be > supporting > > > now. Retention policies. etc > > > 3. segmentTimestampMin, this i will consider in design. > > > 4. RP is added as a separate task, i thought instead of maintaining two > > > variables better to maintabin one and parse it. But i will consider > your > > > point based on feasibility during implementation. > > > 5. We use an accumulator which takes list, so before writing index > files > > > we take the min max of the timestamp column and fill in accumulator and > > > then we can access accumulator.value in driver after load is finished. > > > > > > Regards, > > > Akash R Nilugal > > > > > > On 2019/09/28 10:46:31, xuchuanyin <xuchuan...@apache.org> wrote: > > > > Hi akash, glad to see the feature proposed and I have some questions > > > about > > > > this. Please notice that some of the following descriptions are > comments > > > > followed by '===' described in the design document attached in the > > > > corresponding jira. > > > > > > > > 1. > > > > "Currently carbondata supports timeseries on preaggregate datamap, > but > > > its > > > > an alpha feature" > > > > === > > > > It has been some time since the preaggregate datamap was introduced > and > > > it > > > > is still **alpha**, why it is still not product-ready? Will the new > > > feature > > > > also come into the similar situation? > > > > > > > > 2. > > > > "there are so many limitations when we compare and analyze the > existing > > > > timeseries database or projects which supports time series like > apache > > > druid > > > > or influxdb" > > > > === > > > > What are the actual limitations? Besides, please give an example of > this. > > > > > > > > 3. > > > > "Segment_Timestamp_Min" > > > > === > > > > Suggest using camel-case style like 'segmentTimestampMin' > > > > > > > > 4. > > > > "RP is way of telling the system, for how long the data should be > kept" > > > > === > > > > Since the function is simple, I'd suggest using 'retentionTime'=15 > and > > > > 'timeUnit'='day' instead of 'RP'='15_days' > > > > > > > > 5. > > > > "When the data load is called for main table, use an spark > accumulator to > > > > get the maximum value of timestamp in that load and return to the > load." > > > > === > > > > How can you get the spark accumulator? The load is launched using > > > > loading-by-dataframe not using global-sort-by-spark. > > > > > > > > 6. > > > > For the rest of the content, still reading. > > > > > > > > > > > > > > > > > > > > -- > > > > Sent from: > > > > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > > > > > > > > > >