[jira] [Updated] (CARBONDATA-1173) Streaming Ingest: Write path framework implementation
[ https://issues.apache.org/jira/browse/CARBONDATA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Adnaik updated CARBONDATA-1173: -- Affects Version/s: (was: NONE) 1.2.0 > Streaming Ingest: Write path framework implementation > - > > Key: CARBONDATA-1173 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1173 > Project: CarbonData > Issue Type: Sub-task > Components: core, data-load, hadoop-integration, spark-integration >Affects Versions: 1.2.0 >Reporter: Aniket Adnaik >Assignee: Aniket Adnaik > Fix For: 1.3.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Carbondata with Spark Structured streaming write path framework > - Carbondata StreamingOutputWriter, StreamingRecordWriter, metadata writer >classes, etc > - initial framework for streaming ingest feature -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1173) Streaming Ingest: Write path framework implementation
[ https://issues.apache.org/jira/browse/CARBONDATA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Adnaik updated CARBONDATA-1173: -- Affects Version/s: (was: 1.3.0) NONE > Streaming Ingest: Write path framework implementation > - > > Key: CARBONDATA-1173 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1173 > Project: CarbonData > Issue Type: Sub-task > Components: core, data-load, hadoop-integration, spark-integration >Affects Versions: NONE >Reporter: Aniket Adnaik >Assignee: Aniket Adnaik > Fix For: 1.3.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Carbondata with Spark Structured streaming write path framework > - Carbondata StreamingOutputWriter, StreamingRecordWriter, metadata writer >classes, etc > - initial framework for streaming ingest feature -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1173) Streaming Ingest: Write path framework implementation
[ https://issues.apache.org/jira/browse/CARBONDATA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Adnaik updated CARBONDATA-1173: -- Fix Version/s: (was: NONE) 1.3.0 > Streaming Ingest: Write path framework implementation > - > > Key: CARBONDATA-1173 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1173 > Project: CarbonData > Issue Type: Sub-task > Components: core, data-load, hadoop-integration, spark-integration >Affects Versions: NONE >Reporter: Aniket Adnaik >Assignee: Aniket Adnaik > Fix For: 1.3.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Carbondata with Spark Structured streaming write path framework > - Carbondata StreamingOutputWriter, StreamingRecordWriter, metadata writer >classes, etc > - initial framework for streaming ingest feature -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1173) Streaming Ingest: Write path framework implementation
[ https://issues.apache.org/jira/browse/CARBONDATA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Adnaik updated CARBONDATA-1173: -- Affects Version/s: 1.3.0 > Streaming Ingest: Write path framework implementation > - > > Key: CARBONDATA-1173 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1173 > Project: CarbonData > Issue Type: Sub-task > Components: core, data-load, hadoop-integration, spark-integration >Affects Versions: NONE >Reporter: Aniket Adnaik >Assignee: Aniket Adnaik > Fix For: 1.3.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Carbondata with Spark Structured streaming write path framework > - Carbondata StreamingOutputWriter, StreamingRecordWriter, metadata writer >classes, etc > - initial framework for streaming ingest feature -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1174) Streaming Ingest: Write path schema validation/inference
[ https://issues.apache.org/jira/browse/CARBONDATA-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Adnaik reassigned CARBONDATA-1174: - Assignee: Aniket Adnaik > Streaming Ingest: Write path schema validation/inference > > > Key: CARBONDATA-1174 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1174 > Project: CarbonData > Issue Type: Sub-task > Components: core, spark-integration >Affects Versions: 1.2.0 >Reporter: Aniket Adnaik >Assignee: Aniket Adnaik > Labels: features > Fix For: 1.2.0 > > > Streaming Ingest: Write path > - schema validation / schema inference from existing carbondata table > - streaming ingest allowed to existing tables only -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1072) Streaming Ingestion Feature
[ https://issues.apache.org/jira/browse/CARBONDATA-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Adnaik updated CARBONDATA-1072: -- Attachment: StreamingIngestionSupportInCarbonData.pdf > Streaming Ingestion Feature > > > Key: CARBONDATA-1072 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1072 > Project: CarbonData > Issue Type: New Feature > Components: core, data-load, data-query, examples, file-format, > spark-integration, sql >Affects Versions: NONE >Reporter: Aniket Adnaik > Fix For: NONE > > Attachments: StreamingIngestionSupportInCarbonData.pdf > > > High level break down of work Items/Implementation phases: > Design document will be attached soon. > > Phase – 1 – Spark Structured Streaming with regular Carbondata Format > > This phase will mainly focus on supporting Streaming ingestion using > Spark Structured streaming > 1.Write Path Implementation >- Integration with Spark’s Structured Streaming framework >(FileStreamSink etc) >- StreamingOutputWriter (StreamingOuputWriterFactory) >- Prepare Write (Schema Validation, Segment creation, > Streaming file creation etc) >- StreamingRecordWriter ( Data conversion from Catalyst InternalRow > to Carbondata compatible format , make use of new load path) > 2. Read Path Implementation (some overlap with phase-2) > - Modify getsplits() to read from Streaming Segment > - Read commited info from meta data to get correct offsets > - Make use of Min-Max index if available > - Use sequential scan - data is unsorted , cannot use Btree index > 3.Compaction > -Minor Compaction > -Major Compaction >4. Metadata Management > - Streaming metadata store (e.g. Offsets, timestamps etc.) > >5. Failure Recovery > - Rollback on failure > - Handle asynchronous writes to CarbonData (using hflush) > - > Phase – 2 : Spark Structured Streaming with Appendable CarbonData format > 1.Streaming File Format > - Writers use V3 file format for appending Columnar unsorted >data blockets > - Modify Readers to read from appendable streaming file format > - > Phase -3 : > 1. Inter-opertability Support > - Functionality with other features/Components > - Concurrent queries with streaming ingestion > - Concurrent operations with Streaming Ingestion (e.g. Compaction, > Alter table, Secondary Index etc.) > 2. Kafka Connect Ingestion / Carbondata connector > - Direct ingestion from Kafka Connect without Spark Structured > Streaming > - Separate Kafka Connector to receive data through network port > - Data commit and Offset management > - > Phase-4 : Support for other streaming engines > - Analysis of Streaming APIs/interface with other streaming engines > - Implementation of connectors for different streaming engines storm, >flink , flume, etc. > > Phase -5 : In-memory Streaming table (probable feature) >1. In-memory Cache for Streaming data > -Fault tolerant in-memory buffering / checkpoint with WAL > -Readers read from in-memory tables if available > -Background threads for writing streaming data ,etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1176) Streaming Ingest: Write path streaming segment/file creation
Aniket Adnaik created CARBONDATA-1176: - Summary: Streaming Ingest: Write path streaming segment/file creation Key: CARBONDATA-1176 URL: https://issues.apache.org/jira/browse/CARBONDATA-1176 Project: CarbonData Issue Type: Sub-task Components: core, data-load, hadoop-integration Affects Versions: 1.2.0 Reporter: Aniket Adnaik Fix For: 1.2.0 Streaming Ingest: Write path : segment /streaming file - Streaming segment creation and streaming file creation - resolve conflict with spark structured streaming file names. Spark structured streaming names streaming files with unique batch id to avoid overwriting - maintain spark structured streaming recover-ability as streaming file names generated by spark structured streaming are unique and recorded in spark structured streaming metadata. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1175) Streaming Ingest: Write path data conversion/transformation
Aniket Adnaik created CARBONDATA-1175: - Summary: Streaming Ingest: Write path data conversion/transformation Key: CARBONDATA-1175 URL: https://issues.apache.org/jira/browse/CARBONDATA-1175 Project: CarbonData Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Aniket Adnaik Fix For: 1.2.0 Streaming Ingest: Write path Data conversion/Transformation - input data is a byte stream in catalyst InternalRow format (row major), which needs to be converted to column format - column converter and corresponding iterators needs to be created before invoking carbon layer load path - various carbon properties needs to be set, for example SORT_SCOPE (to skip sorting), blocket size, skip global dictionary creation etc -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1174) Streaming Ingest: Write path schema validation/inference
Aniket Adnaik created CARBONDATA-1174: - Summary: Streaming Ingest: Write path schema validation/inference Key: CARBONDATA-1174 URL: https://issues.apache.org/jira/browse/CARBONDATA-1174 Project: CarbonData Issue Type: Sub-task Components: core, spark-integration Affects Versions: 1.2.0 Reporter: Aniket Adnaik Fix For: 1.2.0 Streaming Ingest: Write path - schema validation / schema inference from existing carbondata table - streaming ingest allowed to existing tables only -- This message was sent by Atlassian JIRA (v6.4.14#64029)