Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto
Hi Jacky, In spark integration we have two approaches one with very deep integration and one with shallow integration using the sparks fileformat. One with deep integration we use the datasource name as carbondata, this name also registered to java services so anything which comes with this datasource name uses the deep integration path. Another with shallow integration we use datasource name as carbon and extracted this to spark-datasource module. So any table with this carbon datasource name comes to the fileformat flow. This datasource names are nothing do with transactional and non transactional. It is about the spark datasource implementations. Basically I am trying to tell is with carbondata datasource can read both transactional and non transactional data. We introduced carbon datasource name only for the sake of spark to identify the type of implementation flow it should choose. Regards, Ravindra. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto
+1, It will better if we can unify "carbon" and "carbondata", SparkCarbonFileFormat uses carbon and SparkCarbonTableFormat use carbondata. SDK should support transactional table and non-transactional table. DataFrame also should support different type carbon data. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto
Thanks. Can we do the same for spark integration also, I see there are two datasource now: “carbon” and “carbondata” It is not easy for user to differentiate when to use which one. Since we are discussing “support transactional table in SDK”, so I think we can make unify “carbon” and “carbondata”, for example, we can make “carbondata” is an alias to “carbon”. I prefer this way since “carbon” is shorter :) What do you think? Regards, Jacky > 在 2018年12月10日,下午11:18,ravipesala 写道: > > +1 > > Yes Jacky, he is not going add any new plugin. Depending on the folder > structure and table status he considers whether it is transactional or > non-transactional inside the same plugin. PR > https://github.com/apache/carbondata/pull/2982/ already raised for it. > > Regards, > Ravindra. > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ >
Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto
+1 Yes Jacky, he is not going add any new plugin. Depending on the folder structure and table status he considers whether it is transactional or non-transactional inside the same plugin. PR https://github.com/apache/carbondata/pull/2982/ already raised for it. Regards, Ravindra. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto
Hi Ajantha, Currently for carbon-presto integration, there is a plugin called “carbondata”. I wonder will you introduce new plugin into the project? I suggest we re-use the same plugin and decide the read path within the plugin. What do you think? Regards, Jacky > 在 2018年12月10日,下午2:31,Ajantha Bhat 写道: > > Currently, carbon SDK files output (files without metadata folder and its > contents) are read by spark using an external table with carbon session. > But presto carbon integration doesn't support that. It can currently read > only the transactional table output files. > > Hence we can enhance presto to read SDK output files. This will increase > the use cases for presto-carbon integration. > > The above scenario can be achieved by inferring schema if metadata folder > not exists and > setting read committed scope to LatestFilesReadCommittedScope, if > non-transctional table output files are present. > > > Thanks, > Ajantha >
[carbondata-presto enhancements] support reading carbon SDK writer output in presto
Currently, carbon SDK files output (files without metadata folder and its contents) are read by spark using an external table with carbon session. But presto carbon integration doesn't support that. It can currently read only the transactional table output files. Hence we can enhance presto to read SDK output files. This will increase the use cases for presto-carbon integration. The above scenario can be achieved by inferring schema if metadata folder not exists and setting read committed scope to LatestFilesReadCommittedScope, if non-transctional table output files are present. Thanks, Ajantha