n case of spark the where to write the table content is decided in two ways.

1. If a table is created under a database for which loaction attribute is
not configured by the end user,
   then the content of table will be written in the
"spark.sql.warehouse.dir".
   For example: if spark.sql.warehouse.dir = /opt/hive/warehouse/
     then table content will be writetn at /opt/hive/warehouse/

2. But if while database creation the location attribute is set by the user
then
   the table created under such database the content of table will be
written to the configured location.
   For example if for database x user set location '/user/cutom/warehouse'

   then table created under the database x will be writen at
'/user/cutom/warehouse'


   Currently for carbon we have cutomized the store writting and always
writting to the
   fixed path i.e. 'spark.sql.warehouse.dir'.

   This is to address the same issue.



  Q 1. Compute engine can manage the carbon table location the same way as
ORC and parquet table.
   And user uses same API or SQL syntax to create carbondata table, like
`df.format(“carbondata”).save(“path”) `
   using spark dataframe API. There should be no carbon storePath involved.

   A. I think this differnt requirement it does not even consider the
database and table. This is something the table content
      will be written at the desired location in the specified format.

  Q 2. User should be able to save table in HDFS location or S3 location in
the same context.
   Since there are several carbon property involved when determining the FS
type, such as LOCK file,
   etc, it is not possible to create tables on HDFS and on S3 in same
context, which also break the table level abstraction.

   A. This requirement is for viewfs file system, where differnt database
can lie in differnt nameservices.


On Wed, Oct 4, 2017 at 7:09 PM, Jacky Li <jacky.li...@qq.com> wrote:

> Hi,
>
> What carbon provides are two level concepts:
> 1. File format, which can be used by compute engine to write and read
> data. CarbonData is a self-describing and type-aware columnar file format
> for Hadoop environement which is just as what orc, parquet provides.
>
> 2. Table level storage, which include not just file format but also
> aggregated index file (datamap), global dictionary, and segment metadata.
> It provides more functionality regarding segment management and SQL
> optimization (like lazy decode) through deep integration with compute
> engine (currently only spark deep integration is supported).
>
> In my opinion, these two level abstraction are the core of carbondata
> project. But the database concept should be of compute engine which
> managing the store level metadata, since spark, hive, presto they all have
> these part in their layer.
>
> I think what currently carbondata missing is that for table level storage,
> user should be enable to specify the table location to save the table data.
> This is to achieve:
> 1. Compute engine can manage the carbon table location the same way as ORC
> and parquet table. And user uses same API or SQL syntax to create
> carbondata table, like `df.format(“carbondata”).save(“path”) ` using
> spark dataframe API. There should be no carbon storePath involved.
>
> 2. User should be able to save table in HDFS location or S3 location in
> the same context. Since there are several carbon property involved when
> determining the FS type, such as LOCK file, etc, it is not possible to
> create tables on HDFS and on S3 in same context, which also break the table
> level abstraction.
>
> Regards,
> Jacky
>
> > 在 2017年10月3日,下午10:36,Mohammad Shahid Khan <mohdshahidkhan1...@gmail.com>
> 写道:
> >
> > Hi Dev,
> > Please find the design document for Support Database Location
> Configuration while Creating Database.
> >
> > Regards,
> > Shahid
> > <Support Database Location.docx>
>
>
>
>

Reply via email to