Thank you for the clarification. On 6 Oct 2017 05:08, "Jacky Li" <jacky.li...@qq.com> wrote:
> I mean, either spark.sql.warehouse.dir is set by user, > the carbon core should not be aware of database and related > path construction logic should be kept in spark or > spark-integration module only. We should achieve that > inside carbon, it should only know the upper layer is > specified a table location to write table data. > > All database concept and commands should be managed by > upper layer. This is not conflicting with your requirement. > > Regards, > Jacky > > 发自坚果 Pro > Mohammad Shahid Khan <mohdshahidkhan1...@gmail.com> 于 2017年10月5日 > 下午11:56写道: > > n case of spark the where to write the table content is decided in two > ways. > > 1. If a table is created under a database for which loaction attribute is > not configured by the end user, > then the content of table will be written in the > "spark.sql.warehouse.dir". > For example: if spark.sql.warehouse.dir = /opt/hive/warehouse/ > then table content will be writetn at /opt/hive/warehouse/ > > 2. But if while database creation the location attribute is set by the > user > then > the table created under such database the content of table will be > written to the configured location. > For example if for database x user set location '/user/cutom/warehouse' > > then table created under the database x will be writen at > '/user/cutom/warehouse' > > > Currently for carbon we have cutomized the store writting and always > writting to the > fixed path i.e. 'spark.sql.warehouse.dir'. > > This is to address the same issue. > > > > Q 1. Compute engine can manage the carbon table location the same way as > ORC and parquet table. > And user uses same API or SQL syntax to create carbondata table, like > `df.format(“carbondata”).save(“path”) ` > using spark dataframe API. There should be no carbon storePath > involved. > > A. I think this differnt requirement it does not even consider the > database and table. This is something the table content > will be written at the desired location in the specified format. > > Q 2. User should be able to save table in HDFS location or S3 location > in > the same context. > Since there are several carbon property involved when determining the > FS > type, such as LOCK file, > etc, it is not possible to create tables on HDFS and on S3 in same > context, which also break the table level abstraction. > > A. This requirement is for viewfs file system, where differnt database > can lie in differnt nameservices. > > > On Wed, Oct 4, 2017 at 7:09 PM, Jacky Li <jacky.li...@qq.com> wrote: > > > Hi, > > > > What carbon provides are two level concepts: > > 1. File format, which can be used by compute engine to write and read > > data. CarbonData is a self-describing and type-aware columnar file > format > > for Hadoop environement which is just as what orc, parquet provides. > > > > 2. Table level storage, which include not just file format but also > > aggregated index file (datamap), global dictionary, and segment > metadata. > > It provides more functionality regarding segment management and SQL > > optimization (like lazy decode) through deep integration with compute > > engine (currently only spark deep integration is supported). > > > > In my opinion, these two level abstraction are the core of carbondata > > project. But the database concept should be of compute engine which > > managing the store level metadata, since spark, hive, presto they all > have > > these part in their layer. > > > > I think what currently carbondata missing is that for table level > storage, > > user should be enable to specify the table location to save the table > data. > > This is to achieve: > > 1. Compute engine can manage the carbon table location the same way as > ORC > > and parquet table. And user uses same API or SQL syntax to create > > carbondata table, like `df.format(“carbondata”).save(“path”) ` using > > spark dataframe API. There should be no carbon storePath involved. > > > > 2. User should be able to save table in HDFS location or S3 location in > > the same context. Since there are several carbon property involved when > > determining the FS type, such as LOCK file, etc, it is not possible to > > create tables on HDFS and on S3 in same context, which also break the > table > > level abstraction. > > > > Regards, > > Jacky > > > > > 在 2017年10月3日,下午10:36,Mohammad Shahid Khan < > mohdshahidkhan1...@gmail.com> > > 写道: > > > > > > Hi Dev, > > > Please find the design document for Support Database Location > > Configuration while Creating Database. > > > > > > Regards, > > > Shahid > > > <Support Database Location.docx> > > > > > > > > > >