[jira] [Created] (CARBONDATA-4174) Handle exception for desc column

2021-04-22 Thread SHREELEKHYA GAMPA (Jira)
SHREELEKHYA GAMPA created CARBONDATA-4174:
-

 Summary: Handle exception for desc column
 Key: CARBONDATA-4174
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4174
 Project: CarbonData
  Issue Type: Bug
Reporter: SHREELEKHYA GAMPA


Validation not present for children column in desc column for a primitive 
datatype and higher level non existing children column desc column for a 
complex datatype



drop table if exists complexcarbontable; create table complexcarbontable 
(deviceInformationId int,channelsId string,ROMSize string,purchasedate 
string,mobile struct,MAC array,gamePointId 
map,contractNumber double) STORED AS carbondata;

describe column deviceInformationId.x on complexcarbontable; describe column 
channelsId.x on complexcarbontable;

describe column mobile.imei.x on complexcarbontable; describe column MAC.item.x 
on complexcarbontable; describe column gamePointId.key.x on complexcarbontable;

[Expected Result] :- Validation should be provided for children column in desc 
column for a primitive datatype and higher level non existing children column 
desc column for a complex datatype. Command execution should fail.

[Actual Issue] : - Validation not present for children column in desc column 
for a primitive datatype and higher level non existing children column desc 
column for a complex datatype. As a result the command execution is successful.

[!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/21/c71035/7a3b04d78ceb4a489e6c038f4bb257db/image.png!|https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/21/c71035/7a3b04d78ceb4a489e6c038f4bb257db/image.png]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4173) Fix inverted index query issue

2021-04-22 Thread SHREELEKHYA GAMPA (Jira)
SHREELEKHYA GAMPA created CARBONDATA-4173:
-

 Summary: Fix inverted index query issue
 Key: CARBONDATA-4173
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4173
 Project: CarbonData
  Issue Type: Bug
Reporter: SHREELEKHYA GAMPA


 select query with filter column which is present in inverted_index column does 
not return any value

>From Spark beeline/SQL/Shell execute the following queries

drop table if exists uniqdata6;

CREATE TABLE uniqdata6(cust_id int,cust_name string,ACTIVE_EMUI_VERSION string, 
DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 
int)stored as carbondata TBLPROPERTIES ('sort_columns'='CUST_ID,CUST_NAME', 
'inverted_index'='CUST_ID,CUST_NAME','sort_scope'='global_sort');

LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table 
uniqdata6 OPTIONS ('FILEHEADER'='CUST_ID,CUST_NAME 
,ACTIVE_EMUI_VERSION,DOB,DOJ, 
BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, 
Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE');

select cust_name from uniqdata6 limit 5;

select * from uniqdata6 where CUST_NAME='CUST_NAME_2';

select * from uniqdata6 where CUST_NAME='CUST_NAME_3';

 

[Expected Result] :- select query with filter column which is present in 
inverted_index column should return correct value

[Actual Issue] : - select query with filter column which is present in 
inverted_index column does not return any value

[!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/15/c71035/05443c9a9c11457e947645f1cf0ad347/image.png!|https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/15/c71035/05443c9a9c11457e947645f1cf0ad347/image.png]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4172) Select query having parent and child struct column in projection returns incorrect results

2021-04-22 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-4172:


 Summary: Select query having  parent and child struct column in 
projection returns incorrect results
 Key: CARBONDATA-4172
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4172
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh


struct column: col1 struct

insert: named_struct('a',1,'b',2,'c','a')

Query : select col1,col1.a from table;

Result:

col1 col1.a

{a:1,b:null,c:null}  1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4037) Improve the table status and segment file writing

2021-04-22 Thread Akash R Nilugal (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal resolved CARBONDATA-4037.
-
Fix Version/s: 2.2.0
   Resolution: Fixed

> Improve the table status and segment file writing
> -
>
> Key: CARBONDATA-4037
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4037
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: Improve table status and segment file writing_1.docx
>
>  Time Spent: 27.5h
>  Remaining Estimate: 0h
>
> Currently, we update table status and segment files multiple times for a 
> single iud/merge/compact operation and delete the index files immediately 
> after merge. When concurrent queries are run, there may be situations like 
> user query is trying to access the segment index files and they are not 
> present, which is availability issue.
>  * To solve above issue, we can make mergeindex files generation mandatory 
> and fail load/compaction if mergeindex fails. Then if merge index is success, 
> update table status file and can delete index files immediately. However, in 
> legacy stores when alter segment merge is called, after merge index success, 
> do not delete index files immediately as it may cause issues for parallel 
> queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4162) Leverage Secondary Index till segment level with SI as datamap and SI with plan rewrite

2021-04-22 Thread Nihal kumar ojha (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal kumar ojha updated CARBONDATA-4162:
-
Summary: Leverage Secondary Index till segment level with SI as datamap and 
SI with plan rewrite  (was: Leverage Secondary Index till segment level with 
Spark plan rewrite)

> Leverage Secondary Index till segment level with SI as datamap and SI with 
> plan rewrite
> ---
>
> Key: CARBONDATA-4162
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4162
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Nihal kumar ojha
>Priority: Major
> Attachments: Support SI at segment level.pdf
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> *Background:*
> Secondary index tables are created as indexes and managed as child tables 
> internally by Carbondata. In the existing architecture, if the parent(main) 
> table and SI table don’t
> have the same valid segments then we disable the SI table. And then from the
> next query onwards, we scan and prune only the parent table until we trigger
> the next load or REINDEX command (as these commands will make the
> parent and SI table segments in sync). Because of this, queries take more
> time to give the result when SI is disabled.
> *Proposed Solution:*
> We are planning to leverage SI till the segment level. It means at place
> of disabling the SI table(when parent and child table segments are not in 
> sync)
> we will do pruning on SI tables for all the valid segments(segments with 
> status
> success, marked for update and load partial success) and the rest of the
> segments will be pruned by the parent table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4170) Support dropping of parent complex columns(array/struct/map)

2021-04-22 Thread Akshay (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay updated CARBONDATA-4170:
---
Description: 
Drop complex columns(array/struct/map) from carbon table. For example - 

arr1 array, struct1  struct, map1 map

Command - 

ALTER TABLE  DROP COLUMNS(arr1, struct1, map1)

Design document - 
[https://docs.google.com/document/d/1DhhkVXM8rMvOuKDZeccJpFEfO3VkA9C0c7JHCV88NXI/edit]

  was:
Drop complex columns(array/struct/map) from carbon table. For example - 

arr1 array, struct1  struct, map1 map

Command - 

ALTER TABLE  DROP COLUMNS(arr1, struct1,map1)

Design document - 
[https://docs.google.com/document/d/1DhhkVXM8rMvOuKDZeccJpFEfO3VkA9C0c7JHCV88NXI/edit]


> Support dropping of parent complex columns(array/struct/map)
> 
>
> Key: CARBONDATA-4170
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4170
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Akshay
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Drop complex columns(array/struct/map) from carbon table. For example - 
> arr1 array, struct1  struct, map1 map
> Command - 
> ALTER TABLE  DROP COLUMNS(arr1, struct1, map1)
> Design document - 
> [https://docs.google.com/document/d/1DhhkVXM8rMvOuKDZeccJpFEfO3VkA9C0c7JHCV88NXI/edit]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4170) Support dropping of parent complex columns(array/struct/map)

2021-04-22 Thread Akshay (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay updated CARBONDATA-4170:
---
Description: 
Drop complex columns(array/struct/map) from carbon table. For example - 

arr1 array, struct1  struct, map1 map

Command - 

ALTER TABLE  DROP COLUMNS(arr1, struct1,map1)

Design document - 
[https://docs.google.com/document/d/1DhhkVXM8rMvOuKDZeccJpFEfO3VkA9C0c7JHCV88NXI/edit]

  was:
Drop complex columns(only array and struct) from carbon table. For example - 

arr1 array, struct1  struct

Command - 

ALTER TABLE  DROP COLUMNS(arr1, struct1)

Design document - 
[https://docs.google.com/document/d/1DhhkVXM8rMvOuKDZeccJpFEfO3VkA9C0c7JHCV88NXI/edit]


> Support dropping of parent complex columns(array/struct/map)
> 
>
> Key: CARBONDATA-4170
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4170
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Akshay
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Drop complex columns(array/struct/map) from carbon table. For example - 
> arr1 array, struct1  struct, map1 map
> Command - 
> ALTER TABLE  DROP COLUMNS(arr1, struct1,map1)
> Design document - 
> [https://docs.google.com/document/d/1DhhkVXM8rMvOuKDZeccJpFEfO3VkA9C0c7JHCV88NXI/edit]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4170) Support dropping of parent complex columns(array/struct/map)

2021-04-22 Thread Akshay (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay updated CARBONDATA-4170:
---
Summary: Support dropping of parent complex columns(array/struct/map)  
(was: Support dropping of single & multi-level complex columns(array/struct))

> Support dropping of parent complex columns(array/struct/map)
> 
>
> Key: CARBONDATA-4170
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4170
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Akshay
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Drop complex columns(only array and struct) from carbon table. For example - 
> arr1 array, struct1  struct
> Command - 
> ALTER TABLE  DROP COLUMNS(arr1, struct1)
> Design document - 
> [https://docs.google.com/document/d/1DhhkVXM8rMvOuKDZeccJpFEfO3VkA9C0c7JHCV88NXI/edit]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4160) Alter carbon schema related to complex columns

2021-04-22 Thread Akshay (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay updated CARBONDATA-4160:
---
Summary: Alter carbon schema related to complex columns(was: Alter 
carbon schema related to complex columns(array/struct)  )

> Alter carbon schema related to complex columns  
> 
>
> Key: CARBONDATA-4160
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4160
> Project: CarbonData
>  Issue Type: New Feature
>  Components: spark-integration
>Reporter: Akshay
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Add complex columns(only array and struct) to carbon table. For example - 
> array, struct
> Command - 
> ALTER TABLE  ADD COLUMNS(arr array)
> Design document - 
> [https://docs.google.com/document/d/1DhhkVXM8rMvOuKDZeccJpFEfO3VkA9C0c7JHCV88NXI/edit]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4158) Make Secondary Index as a coarse grain datamap and use secondary indexes for Presto queries

2021-04-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4158.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Make Secondary Index as a coarse grain datamap and use secondary indexes for 
> Presto queries
> ---
>
> Key: CARBONDATA-4158
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4158
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Venugopal Reddy K
>Priority: Minor
> Fix For: 2.2.0
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> *Background:*
> Secondary Indexes are created as carbon tables and are managed as child 
> tables to the main table. And these indexes are leveraged for query pruning 
> via spark plan modification during optimizer/execution phases of query 
> execution. In order to make use of Secondary Indexes for queries from engines 
> other than spark like presto etc, it is not feasible to modify the engine 
> specific query execution plans as we desire in the current approach. It makes 
> Secondary Indexes not usable for presto query pruning. Thus need arises for 
> an engine agnostic approach to use Secondary Indexes for presto queries.
> *Description:*
> Current Secondary Index pruning is tightly coupled with spark because the 
> query plan modification is specific to the spark engine. It is hard to reuse 
> the solution for presto queries. Need a new solution to use secondary indexes 
> with Presto queries. And it  shouldn’t affect the existing customer using 
> secondary index with spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4171) Transaction Manager, time travel and segment interface refactoring

2021-04-22 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4171:
-
Attachment: Transaction manager, time travel, segment interface_v1.pdf

> Transaction Manager, time travel and segment interface refactoring
> --
>
> Key: CARBONDATA-4171
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4171
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Ajantha Bhat
>Priority: Major
> Attachments: Transaction manager, time travel, segment 
> interface_v1.pdf
>
>
> *Goals:*
> *1) Implement a “Transaction Manager” with optimistic concurrency to provide 
> within a table transaction / versioning.* (interfaces should also be flexible 
> enough to support across table transactions)
> *2) Support time travel in carbonData.*
> *3) Decouple and clean up segment interfaces.* (which should also help in 
> supporting segment concepts to other open format under carbondata metadata 
> service)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-2827) Refactor Segment Status Manager Interface

2021-04-22 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat closed CARBONDATA-2827.

Resolution: Duplicate

> Refactor Segment Status Manager Interface
> -
>
> Key: CARBONDATA-2827
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2827
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Ravindra Pesala
>Priority: Major
> Attachments: Segment Management interface design_V3.pdf, Segment 
> Status Management interface design_V1.docx, Segment Status Management 
> interface design_V1_Ramana_reviewed.docx, Segment Status Management interface 
> design_V2.pdf
>
>
> Carbon uses tablestatus file to record segment status and details of each 
> segment during each load. This tablestatus enables carbon to support 
> concurrent loads and reads without data inconsistency or corruption.
> So it is very important feature of carbondata and we should have clean 
> interfaces to maintain it. Current tablestatus updation is shattered to 
> multiple places and there is no clean interface, so I am proposing to 
> refactor current SegmentStatusManager interface and bringing all tablestatus 
> operations to single interface.  
> This new interface allows to add table status to any other storage like DB. 
> This is needed for S3 type object stores as  these are eventually consistent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-2827) Refactor Segment Status Manager Interface

2021-04-22 Thread Ajantha Bhat (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17327125#comment-17327125
 ] 

Ajantha Bhat commented on CARBONDATA-2827:
--

will be handled as part of *CARBONDATA-4171*

 

*https://issues.apache.org/jira/browse/CARBONDATA-4171*

> Refactor Segment Status Manager Interface
> -
>
> Key: CARBONDATA-2827
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2827
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Ravindra Pesala
>Priority: Major
> Attachments: Segment Management interface design_V3.pdf, Segment 
> Status Management interface design_V1.docx, Segment Status Management 
> interface design_V1_Ramana_reviewed.docx, Segment Status Management interface 
> design_V2.pdf
>
>
> Carbon uses tablestatus file to record segment status and details of each 
> segment during each load. This tablestatus enables carbon to support 
> concurrent loads and reads without data inconsistency or corruption.
> So it is very important feature of carbondata and we should have clean 
> interfaces to maintain it. Current tablestatus updation is shattered to 
> multiple places and there is no clean interface, so I am proposing to 
> refactor current SegmentStatusManager interface and bringing all tablestatus 
> operations to single interface.  
> This new interface allows to add table status to any other storage like DB. 
> This is needed for S3 type object stores as  these are eventually consistent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4171) Transaction Manager, time travel and segment interface refactoring

2021-04-22 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4171:
-
Description: 
*Goals:*

*1) Implement a “Transaction Manager” with optimistic concurrency to provide 
within a table transaction / versioning.* (interfaces should also be flexible 
enough to support across table transactions)

*2) Support time travel in carbonData.*

*3) Decouple and clean up segment interfaces.* (which should also help in 
supporting segment concepts to other open format under carbondata metadata 
service)

  was:
*Goals:*

*1) Implement a “Transaction Manager” with optimistic concurrency to provide 
within a table transaction / versioning.* (interfaces should also be flexible 
enough to support across table transactions)

*2) Support time travel in carbonData.*

***3) Decouple and clean up segment interfaces.* (which should also help in 
supporting segment concepts to other open format under carbondata metadata 
service)


> Transaction Manager, time travel and segment interface refactoring
> --
>
> Key: CARBONDATA-4171
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4171
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Ajantha Bhat
>Priority: Major
>
> *Goals:*
> *1) Implement a “Transaction Manager” with optimistic concurrency to provide 
> within a table transaction / versioning.* (interfaces should also be flexible 
> enough to support across table transactions)
> *2) Support time travel in carbonData.*
> *3) Decouple and clean up segment interfaces.* (which should also help in 
> supporting segment concepts to other open format under carbondata metadata 
> service)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4171) Transaction Manager, time travel and segment interface refactoring

2021-04-22 Thread Ajantha Bhat (Jira)
Ajantha Bhat created CARBONDATA-4171:


 Summary: Transaction Manager, time travel and segment interface 
refactoring
 Key: CARBONDATA-4171
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4171
 Project: CarbonData
  Issue Type: Improvement
Reporter: Ajantha Bhat


*Goals:*

*1) Implement a “Transaction Manager” with optimistic concurrency to provide 
within a table transaction / versioning.* (interfaces should also be flexible 
enough to support across table transactions)

*2) Support time travel in carbonData.*

***3) Decouple and clean up segment interfaces.* (which should also help in 
supporting segment concepts to other open format under carbondata metadata 
service)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)