[Discussion] Segment management enhance

David CaiQiang Thu, 03 Sep 2020 20:09:16 -0700

[Background]
1. In some scenes, two loading/compaction jobs maybe write data to the same
segment, it will result in some data confusion and impact some features
which will not work fine again.  
2. Loading/compaction/update/delete operations need  to clean stale data
before execution. Cleaning stale data is a high-risk operation, if it has
some exception, it will delete valid data. If the system doesn't clean stale
data,   in some scenes, it will be added into a new merged index file and
can be queried.
3. Loading/compaction takes a long time and lock will keep a long time also
in some scenes.


[Motivation & Goal]
We should avoid data confusion and the risk of clean stale data. Maybe we
can use UUID as a segment id to avoid these troubles. Even if we can do
loading/compaction without the segment/compaction lock.

[Modification]
1. segment id 
  Using UUID as segment id instead of the unique numeric value.

2. segment layout
 a) move segment data folder into the table folder
 b) move carbonindexmerge file into Metadata/segments folder, 

 tableFolder
    UUID1
     |_xxx.carbondata
     |_xxx.carobnindex
    UUID2
    Metadata
     |_segemnts
        |_UUID1_timestamp1.segment (segment index summary)
        |_UUID1_timestamp1.carbonindexmerge (segment index detail)
     |_schema
     |_tablestatus
    LockFiles

  partitionTableFolder
    partkey=value1
     |_xxx.carbondata
     |_xxx.carobnindex
    partkey=value2
    Metadata
     |_segemnts
        |_UUID1_timestamp1.segment (segment index summary)
        |_partkey=value1
          |_UUID1_timestamp1.carbonindexmerge (segment index detail)
        |_partkey=value2
     |_schema
     |_tablestatus
    LockFiles

3. segment management
Extracting segment interface, it can support open/close, read/write, and
segment level index pruning API.
The segment should support multiple data source types: file format(carbon,
parquet, orc...), HBase...

4. clean stale data
it will become an optional operation.



-----
Best Regards
David Cai
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

[Discussion] Segment management enhance

Reply via email to