[jira] [Updated] (CARBONDATA-1173) Streaming Ingest: Write path framework implementation

2017-09-18 Thread Aniket Adnaik (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Adnaik updated CARBONDATA-1173:
--
Affects Version/s: (was: NONE)
   1.2.0

> Streaming Ingest: Write path framework implementation
> -
>
> Key: CARBONDATA-1173
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1173
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: core, data-load, hadoop-integration, spark-integration
>Affects Versions: 1.2.0
>Reporter: Aniket Adnaik
>Assignee: Aniket Adnaik
> Fix For: 1.3.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Carbondata with Spark Structured streaming write path framework  
>  - Carbondata StreamingOutputWriter, StreamingRecordWriter, metadata writer  
>classes, etc
>  - initial framework for streaming ingest feature



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1173) Streaming Ingest: Write path framework implementation

2017-09-18 Thread Aniket Adnaik (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Adnaik updated CARBONDATA-1173:
--
Affects Version/s: (was: 1.3.0)
   NONE

> Streaming Ingest: Write path framework implementation
> -
>
> Key: CARBONDATA-1173
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1173
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: core, data-load, hadoop-integration, spark-integration
>Affects Versions: NONE
>Reporter: Aniket Adnaik
>Assignee: Aniket Adnaik
> Fix For: 1.3.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Carbondata with Spark Structured streaming write path framework  
>  - Carbondata StreamingOutputWriter, StreamingRecordWriter, metadata writer  
>classes, etc
>  - initial framework for streaming ingest feature



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1173) Streaming Ingest: Write path framework implementation

2017-09-18 Thread Aniket Adnaik (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Adnaik updated CARBONDATA-1173:
--
Fix Version/s: (was: NONE)
   1.3.0

> Streaming Ingest: Write path framework implementation
> -
>
> Key: CARBONDATA-1173
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1173
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: core, data-load, hadoop-integration, spark-integration
>Affects Versions: NONE
>Reporter: Aniket Adnaik
>Assignee: Aniket Adnaik
> Fix For: 1.3.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Carbondata with Spark Structured streaming write path framework  
>  - Carbondata StreamingOutputWriter, StreamingRecordWriter, metadata writer  
>classes, etc
>  - initial framework for streaming ingest feature



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1173) Streaming Ingest: Write path framework implementation

2017-09-18 Thread Aniket Adnaik (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Adnaik updated CARBONDATA-1173:
--
Affects Version/s: 1.3.0

> Streaming Ingest: Write path framework implementation
> -
>
> Key: CARBONDATA-1173
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1173
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: core, data-load, hadoop-integration, spark-integration
>Affects Versions: NONE
>Reporter: Aniket Adnaik
>Assignee: Aniket Adnaik
> Fix For: 1.3.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Carbondata with Spark Structured streaming write path framework  
>  - Carbondata StreamingOutputWriter, StreamingRecordWriter, metadata writer  
>classes, etc
>  - initial framework for streaming ingest feature



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (CARBONDATA-1174) Streaming Ingest: Write path schema validation/inference

2017-09-12 Thread Aniket Adnaik (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Adnaik reassigned CARBONDATA-1174:
-

Assignee: Aniket Adnaik

> Streaming Ingest: Write path schema validation/inference
> 
>
> Key: CARBONDATA-1174
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1174
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: core, spark-integration
>Affects Versions: 1.2.0
>Reporter: Aniket Adnaik
>Assignee: Aniket Adnaik
>  Labels: features
> Fix For: 1.2.0
>
>
> Streaming Ingest: Write path 
> - schema validation / schema inference from existing carbondata table
> - streaming ingest allowed to existing tables only



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1072) Streaming Ingestion Feature

2017-08-18 Thread Aniket Adnaik (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Adnaik updated CARBONDATA-1072:
--
Attachment: StreamingIngestionSupportInCarbonData.pdf

> Streaming Ingestion Feature 
> 
>
> Key: CARBONDATA-1072
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1072
> Project: CarbonData
>  Issue Type: New Feature
>  Components: core, data-load, data-query, examples, file-format, 
> spark-integration, sql
>Affects Versions: NONE
>Reporter: Aniket Adnaik
> Fix For: NONE
>
> Attachments: StreamingIngestionSupportInCarbonData.pdf
>
>
> High level break down of work Items/Implementation phases:
> Design document will be attached soon.
>  
> Phase – 1 – Spark Structured Streaming with regular Carbondata Format
> 
> This phase will mainly focus on supporting Streaming ingestion using 
> Spark Structured streaming 
> 1.Write Path Implementation 
>- Integration with Spark’s Structured Streaming framework  
>(FileStreamSink etc)
>- StreamingOutputWriter (StreamingOuputWriterFactory)
>- Prepare Write  (Schema Validation, Segment creation, 
>   Streaming file creation etc)
>- StreamingRecordWriter ( Data conversion from Catalyst InternalRow
>  to Carbondata compatible format , make use of new load path) 
>  2. Read Path Implementation (some overlap with phase-2)
>   -   Modify getsplits() to read from Streaming Segment
>   -   Read commited info from meta data to get correct offsets
>   -   Make use of Min-Max index if available 
>   -   Use sequential scan - data is unsorted , cannot use Btree index 
> 3.Compaction
>  -Minor Compaction
>  -Major Compaction
>4. Metadata Management
>  - Streaming metadata store (e.g. Offsets, timestamps etc.)
>
>5. Failure Recovery
>   -   Rollback on failure
>   -   Handle asynchronous writes to CarbonData (using hflush) 
> -
> Phase – 2 : Spark Structured Streaming with Appendable CarbonData format
>  1.Streaming File Format
>  - Writers use V3 file format for appending Columnar unsorted 
>data blockets
>  - Modify Readers to read from appendable streaming file format
> -
> Phase -3 : 
> 1. Inter-opertability Support
>  - Functionality with other features/Components
>  - Concurrent queries with streaming ingestion
>  - Concurrent operations with Streaming Ingestion (e.g. Compaction, 
>   Alter table, Secondary Index etc.)
> 2. Kafka Connect Ingestion / Carbondata connector
>  - Direct ingestion from Kafka Connect without Spark Structured 
> Streaming
>  - Separate Kafka  Connector to receive data through network port
>  - Data commit and Offset management
> -
> Phase-4 : Support for other streaming engines
> - Analysis of Streaming APIs/interface  with other streaming engines
> - Implementation of connectors  for different streaming engines storm, 
>flink , flume, etc.
> 
> Phase -5 : In-memory Streaming table (probable feature)
>1. In-memory Cache for Streaming data 
>  -Fault tolerant  in-memory buffering / checkpoint with WAL
>  -Readers read from in-memory tables if available
>  -Background threads for writing streaming data ,etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1176) Streaming Ingest: Write path streaming segment/file creation

2017-06-14 Thread Aniket Adnaik (JIRA)
Aniket Adnaik created CARBONDATA-1176:
-

 Summary: Streaming Ingest: Write path streaming segment/file 
creation
 Key: CARBONDATA-1176
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1176
 Project: CarbonData
  Issue Type: Sub-task
  Components: core, data-load, hadoop-integration
Affects Versions: 1.2.0
Reporter: Aniket Adnaik
 Fix For: 1.2.0


Streaming Ingest: Write path : segment /streaming file 
- Streaming segment creation and streaming file creation
 - resolve conflict with spark structured streaming file names. Spark 
structured streaming names streaming files with unique batch id to avoid 
overwriting
 - maintain spark structured streaming recover-ability as streaming file names 
generated by spark structured streaming are unique and recorded in spark 
structured streaming metadata. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1175) Streaming Ingest: Write path data conversion/transformation

2017-06-14 Thread Aniket Adnaik (JIRA)
Aniket Adnaik created CARBONDATA-1175:
-

 Summary: Streaming Ingest: Write path data 
conversion/transformation
 Key: CARBONDATA-1175
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1175
 Project: CarbonData
  Issue Type: Sub-task
Affects Versions: 1.2.0
Reporter: Aniket Adnaik
 Fix For: 1.2.0


Streaming Ingest: Write path Data conversion/Transformation
- input data is a byte stream in catalyst InternalRow format (row major), which 
needs to be converted to column format
- column converter and corresponding iterators needs to be created before 
invoking carbon layer load path
- various carbon properties needs to be set, for example SORT_SCOPE (to skip 
sorting), blocket size, skip global dictionary creation etc 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1174) Streaming Ingest: Write path schema validation/inference

2017-06-14 Thread Aniket Adnaik (JIRA)
Aniket Adnaik created CARBONDATA-1174:
-

 Summary: Streaming Ingest: Write path schema validation/inference
 Key: CARBONDATA-1174
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1174
 Project: CarbonData
  Issue Type: Sub-task
  Components: core, spark-integration
Affects Versions: 1.2.0
Reporter: Aniket Adnaik
 Fix For: 1.2.0


Streaming Ingest: Write path 
- schema validation / schema inference from existing carbondata table
- streaming ingest allowed to existing tables only



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)