Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

2018-12-10 Thread ravipesala
Hi Jacky,

In spark integration we have two approaches one with very deep integration
and one with shallow integration using the sparks fileformat. One with deep
integration we use the datasource name as carbondata, this name also
registered to java services so anything which comes with this datasource
name uses the deep integration path. 
Another with shallow integration we use datasource name as carbon and
extracted this to spark-datasource module. So any table with this carbon
datasource name comes to the fileformat flow.

This datasource names are nothing do with transactional and non
transactional. It is about the spark datasource implementations. Basically I
am trying to tell is with carbondata datasource can read both transactional
and non transactional data. We introduced carbon datasource name only for
the sake of spark to identify the type of implementation flow it should
choose.


Regards,
Ravindra.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

2018-12-10 Thread xubo245
+1, It will better if we can unify "carbon" and "carbondata",
SparkCarbonFileFormat uses carbon and SparkCarbonTableFormat use carbondata.
SDK should support transactional table and non-transactional table.
DataFrame also should support different type carbon data.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

2018-12-10 Thread Jacky Li
Thanks. 
Can we do the same for spark integration also, I see there are two datasource 
now:  “carbon” and “carbondata”
It is not easy for user to differentiate when to use which one.

Since we are discussing “support transactional table in SDK”, so I think we can 
make unify “carbon” and “carbondata”, for example, we can make “carbondata” is 
an alias to “carbon”. I prefer this way since “carbon” is shorter :)

What do you think?

Regards,
Jacky

> 在 2018年12月10日,下午11:18,ravipesala  写道:
> 
> +1 
> 
> Yes Jacky, he is not going add any new plugin. Depending on the folder
> structure and table status he considers whether it is transactional or
> non-transactional inside the same plugin. PR
> https://github.com/apache/carbondata/pull/2982/ already raised for it.
> 
> Regards,
> Ravindra.
> 
> 
> --
> Sent from: 
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> 





Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

2018-12-10 Thread ravipesala
+1 

Yes Jacky, he is not going add any new plugin. Depending on the folder
structure and table status he considers whether it is transactional or
non-transactional inside the same plugin. PR
https://github.com/apache/carbondata/pull/2982/ already raised for it.

Regards,
Ravindra.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [carbondata-presto enhancements] support reading carbon SDK writer output in presto

2018-12-10 Thread Jacky Li
Hi Ajantha,

Currently for carbon-presto integration, there is a plugin called “carbondata”. 
I wonder will you introduce new plugin into the project?
I suggest we re-use the same plugin and decide the read path within the plugin.
What do you think?

Regards,
Jacky


> 在 2018年12月10日,下午2:31,Ajantha Bhat  写道:
> 
> Currently, carbon SDK files output (files without metadata folder and its
> contents) are read by spark using an external table with carbon session.
> But presto carbon integration doesn't support that. It can currently read
> only the transactional table output files.
> 
> Hence we can enhance presto to read SDK output files. This will increase
> the use cases for presto-carbon integration.
> 
> The above scenario can be achieved by inferring schema if metadata folder
> not exists and
> setting read committed scope to LatestFilesReadCommittedScope, if
> non-transctional table output files are present.
> 
> 
> Thanks,
> Ajantha
> 



[carbondata-presto enhancements] support reading carbon SDK writer output in presto

2018-12-09 Thread Ajantha Bhat
Currently, carbon SDK files output (files without metadata folder and its
contents) are read by spark using an external table with carbon session.
But presto carbon integration doesn't support that. It can currently read
only the transactional table output files.

Hence we can enhance presto to read SDK output files. This will increase
the use cases for presto-carbon integration.

The above scenario can be achieved by inferring schema if metadata folder
not exists and
setting read committed scope to LatestFilesReadCommittedScope, if
non-transctional table output files are present.


Thanks,
Ajantha