Re: [DISCUSSION] Support Database Location Configuration while Creating Database

Mohammad Shahid Khan Thu, 05 Oct 2017 20:55:15 -0700

Thank you for the clarification.

On 6 Oct 2017 05:08, "Jacky Li" <jacky.li...@qq.com> wrote:


> I mean, either spark.sql.warehouse.dir is set by user,
> the carbon core should not be aware of database and related
> path construction logic should be kept in spark or
> spark-integration module only. We should achieve that
> inside carbon, it should only know the upper layer is
> specified a table location to write table data.
>
> All database concept and commands should be managed by
> upper layer. This is not conflicting with your requirement.
>
> Regards,
> Jacky
>
> 发自坚果 Pro
> Mohammad Shahid Khan <mohdshahidkhan1...@gmail.com> 于 2017年10月5日
> 下午11:56写道：
>
> n case of spark the where to write the table content is decided in two
> ways.
>
> 1. If a table is created under a database for which loaction attribute is
> not configured by the end user,
>    then the content of table will be written in the
> "spark.sql.warehouse.dir".
>    For example: if spark.sql.warehouse.dir = /opt/hive/warehouse/
>      then table content will be writetn at /opt/hive/warehouse/
>
> 2. But if while database creation the location attribute is set by the
> user
> then
>    the table created under such database the content of table will be
> written to the configured location.
>    For example if for database x user set location '/user/cutom/warehouse'
>
>    then table created under the database x will be writen at
> '/user/cutom/warehouse'
>
>
>    Currently for carbon we have cutomized the store writting and always
> writting to the
>    fixed path i.e. 'spark.sql.warehouse.dir'.
>
>    This is to address the same issue.
>
>
>
>   Q 1. Compute engine can manage the carbon table location the same way as
> ORC and parquet table.
>    And user uses same API or SQL syntax to create carbondata table, like
> `df.format(“carbondata”).save(“path”) `
>    using spark dataframe API. There should be no carbon storePath
> involved.
>
>    A. I think this differnt requirement it does not even consider the
> database and table. This is something the table content
>       will be written at the desired location in the specified format.
>
>   Q 2. User should be able to save table in HDFS location or S3 location
> in
> the same context.
>    Since there are several carbon property involved when determining the
> FS
> type, such as LOCK file,
>    etc, it is not possible to create tables on HDFS and on S3 in same
> context, which also break the table level abstraction.
>
>    A. This requirement is for viewfs file system, where differnt database
> can lie in differnt nameservices.
>
>
> On Wed, Oct 4, 2017 at 7:09 PM, Jacky Li <jacky.li...@qq.com> wrote:
>
> > Hi,
> >
> > What carbon provides are two level concepts:
> > 1. File format, which can be used by compute engine to write and read
> > data. CarbonData is a self-describing and type-aware columnar file
> format
> > for Hadoop environement which is just as what orc, parquet provides.
> >
> > 2. Table level storage, which include not just file format but also
> > aggregated index file (datamap), global dictionary, and segment
> metadata.
> > It provides more functionality regarding segment management and SQL
> > optimization (like lazy decode) through deep integration with compute
> > engine (currently only spark deep integration is supported).
> >
> > In my opinion, these two level abstraction are the core of carbondata
> > project. But the database concept should be of compute engine which
> > managing the store level metadata, since spark, hive, presto they all
> have
> > these part in their layer.
> >
> > I think what currently carbondata missing is that for table level
> storage,
> > user should be enable to specify the table location to save the table
> data.
> > This is to achieve:
> > 1. Compute engine can manage the carbon table location the same way as
> ORC
> > and parquet table. And user uses same API or SQL syntax to create
> > carbondata table, like `df.format(“carbondata”).save(“path”) ` using
> > spark dataframe API. There should be no carbon storePath involved.
> >
> > 2. User should be able to save table in HDFS location or S3 location in
> > the same context. Since there are several carbon property involved when
> > determining the FS type, such as LOCK file, etc, it is not possible to
> > create tables on HDFS and on S3 in same context, which also break the
> table
> > level abstraction.
> >
> > Regards,
> > Jacky
> >
> > > 在 2017年10月3日，下午10:36，Mohammad Shahid Khan <
> mohdshahidkhan1...@gmail.com>
> > 写道：
> > >
> > > Hi Dev,
> > > Please find the design document for Support Database Location
> > Configuration while Creating Database.
> > >
> > > Regards,
> > > Shahid
> > > <Support Database Location.docx>
> >
> >
> >
> >
>
>

Re: [DISCUSSION] Support Database Location Configuration while Creating Database

Reply via email to