[jira] [Updated] (CARBONDATA-681) CSVReader related code improvement

2017-01-25 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-681:
-
Description: refactoring csv reader support during data loading, as well as 
replacing relevant class out of Carbon Hadoop component into data loading 
component (processing)  (was: refactoring csv reader support during data 
loading as well as misc changes.)

> CSVReader related code improvement
> --
>
> Key: CARBONDATA-681
> URL: https://issues.apache.org/jira/browse/CARBONDATA-681
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: hadoop-integration
>Reporter: Jihong MA
>Assignee: Jihong MA
>
> refactoring csv reader support during data loading, as well as replacing 
> relevant class out of Carbon Hadoop component into data loading component 
> (processing)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-681) CSVReader related code improvement

2017-01-25 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-681:
-
Issue Type: Sub-task  (was: Improvement)
Parent: CARBONDATA-548

> CSVReader related code improvement
> --
>
> Key: CARBONDATA-681
> URL: https://issues.apache.org/jira/browse/CARBONDATA-681
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: hadoop-integration
>Reporter: Jihong MA
>Assignee: Jihong MA
>
> refactoring csv reader support during data loading as well as misc changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-681) CSVReader related code improvement

2017-01-25 Thread Jihong MA (JIRA)
Jihong MA created CARBONDATA-681:


 Summary: CSVReader related code improvement
 Key: CARBONDATA-681
 URL: https://issues.apache.org/jira/browse/CARBONDATA-681
 Project: CarbonData
  Issue Type: Improvement
  Components: hadoop-integration
Reporter: Jihong MA
Assignee: Jihong MA


refactoring csv reader support during data loading as well as misc changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-661) misc cleanup in carbon core

2017-01-18 Thread Jihong MA (JIRA)
Jihong MA created CARBONDATA-661:


 Summary: misc cleanup in carbon core
 Key: CARBONDATA-661
 URL: https://issues.apache.org/jira/browse/CARBONDATA-661
 Project: CarbonData
  Issue Type: Improvement
  Components: core
Reporter: Jihong MA
Priority: Minor


cleanup un-exercised code/field/functions as well as minor code improvement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-607) Cleanup ValueCompressionHolder class and all sub-classes

2017-01-07 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-607:
-
Description: 
Rewrite ValueCompressionHolder class as a base class for compressing or 
uncompressing numeric data for measurement column chunk. 

refactor all sub-classes under
org.apache.carbondata.core.datastorage.store.compression.decimal.*
org.apache.carbondata.core.datastorage.store.compression.nonDecimal.*
org.apache.carbondata.core.datastorage.store.compression.none.*
org.apache.carbondata.core.datastorage.store.compression.type.*

as part of the work, also fix a performance bug to avoid creating unnecessary 
compression/uncompression value holder during compression or decompression. 

  was:
Rewrite ValueCompressionHolder class as a base class for compressing or 
uncompressing numeric data for measurement column chunk. 

refactored all sub-classes under
org.apache.carbondata.core.datastorage.store.compression.decimal.*
org.apache.carbondata.core.datastorage.store.compression.nonDecimal.*
org.apache.carbondata.core.datastorage.store.compression.none.*
org.apache.carbondata.core.datastorage.store.compression.type.*

as part of the work, also fixed a performance bug to avoid creating unnecessary 
compression/uncompression value holder during compression or decompression. 


> Cleanup ValueCompressionHolder class and all sub-classes
> 
>
> Key: CARBONDATA-607
> URL: https://issues.apache.org/jira/browse/CARBONDATA-607
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: core
>Reporter: Jihong MA
>Assignee: Jihong MA
>
> Rewrite ValueCompressionHolder class as a base class for compressing or 
> uncompressing numeric data for measurement column chunk. 
> refactor all sub-classes under
> org.apache.carbondata.core.datastorage.store.compression.decimal.*
> org.apache.carbondata.core.datastorage.store.compression.nonDecimal.*
> org.apache.carbondata.core.datastorage.store.compression.none.*
> org.apache.carbondata.core.datastorage.store.compression.type.*
> as part of the work, also fix a performance bug to avoid creating unnecessary 
> compression/uncompression value holder during compression or decompression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-607) Cleanup ValueCompressionHolder class and all sub-classes

2017-01-07 Thread Jihong MA (JIRA)
Jihong MA created CARBONDATA-607:


 Summary: Cleanup ValueCompressionHolder class and all sub-classes
 Key: CARBONDATA-607
 URL: https://issues.apache.org/jira/browse/CARBONDATA-607
 Project: CarbonData
  Issue Type: Improvement
  Components: core
Reporter: Jihong MA
Assignee: Jihong MA


Rewrite ValueCompressionHolder class as a base class for compressing or 
uncompressing numeric data for measurement column chunk. 

refactored all sub-classes under
org.apache.carbondata.core.datastorage.store.compression.decimal.*
org.apache.carbondata.core.datastorage.store.compression.nonDecimal.*
org.apache.carbondata.core.datastorage.store.compression.none.*
org.apache.carbondata.core.datastorage.store.compression.type.*

as part of the work, also fixed a performance bug to avoid creating unnecessary 
compression/uncompression value holder during compression or decompression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-607) Cleanup ValueCompressionHolder class and all sub-classes

2017-01-07 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-607:
-
Issue Type: Sub-task  (was: Improvement)
Parent: CARBONDATA-548

> Cleanup ValueCompressionHolder class and all sub-classes
> 
>
> Key: CARBONDATA-607
> URL: https://issues.apache.org/jira/browse/CARBONDATA-607
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: core
>Reporter: Jihong MA
>Assignee: Jihong MA
>
> Rewrite ValueCompressionHolder class as a base class for compressing or 
> uncompressing numeric data for measurement column chunk. 
> refactored all sub-classes under
> org.apache.carbondata.core.datastorage.store.compression.decimal.*
> org.apache.carbondata.core.datastorage.store.compression.nonDecimal.*
> org.apache.carbondata.core.datastorage.store.compression.none.*
> org.apache.carbondata.core.datastorage.store.compression.type.*
> as part of the work, also fixed a performance bug to avoid creating 
> unnecessary compression/uncompression value holder during compression or 
> decompression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-588) cleanup WriterCompressModel

2017-01-03 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-588:
-
Issue Type: Sub-task  (was: Improvement)
Parent: CARBONDATA-548

> cleanup WriterCompressModel
> ---
>
> Key: CARBONDATA-588
> URL: https://issues.apache.org/jira/browse/CARBONDATA-588
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: core
>Reporter: Jihong MA
>Assignee: Jihong MA
>Priority: Minor
> Fix For: 1.0.0-incubating
>
>
> a separate compression type field is unnecessary and error-prone as it has 
> been captured in compressionFinder abstraction. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-588) cleanup WriterCompressModel

2017-01-03 Thread Jihong MA (JIRA)
Jihong MA created CARBONDATA-588:


 Summary: cleanup WriterCompressModel
 Key: CARBONDATA-588
 URL: https://issues.apache.org/jira/browse/CARBONDATA-588
 Project: CarbonData
  Issue Type: Improvement
  Components: core
Reporter: Jihong MA
Assignee: Jihong MA
Priority: Minor
 Fix For: 1.0.0-incubating


a separate compression type field is unnecessary and error-prone as it has been 
captured in compressionFinder abstraction. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-550) Add unit test cases for Bigint, Big decimal value compression

2016-12-21 Thread Jihong MA (JIRA)
Jihong MA created CARBONDATA-550:


 Summary: Add unit test cases for Bigint, Big decimal value 
compression
 Key: CARBONDATA-550
 URL: https://issues.apache.org/jira/browse/CARBONDATA-550
 Project: CarbonData
  Issue Type: Improvement
  Components: core
Reporter: Jihong MA
Assignee: Ashok Kumar


Please add unit test cases for Bigint & Bigdecimal value compression after 
merging PR. the old code path is no longer exercised after the code change so 
as to the test cases in ValueCompressionUtilTest.java.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-549) code improvement for bigint compression

2016-12-21 Thread Jihong MA (JIRA)
Jihong MA created CARBONDATA-549:


 Summary: code improvement for bigint compression
 Key: CARBONDATA-549
 URL: https://issues.apache.org/jira/browse/CARBONDATA-549
 Project: CarbonData
  Issue Type: Sub-task
  Components: core
Reporter: Jihong MA
Assignee: Jihong MA


code cleanup for big int/ decimal compression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CARBONDATA-548) Miscellaneous code improvements

2016-12-21 Thread Jihong MA (JIRA)
Jihong MA created CARBONDATA-548:


 Summary: Miscellaneous code improvements
 Key: CARBONDATA-548
 URL: https://issues.apache.org/jira/browse/CARBONDATA-548
 Project: CarbonData
  Issue Type: Improvement
  Components: core
Reporter: Jihong MA
Assignee: Jihong MA


miscellaneous code improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-516) [SPARK2]update union class in CarbonLateDecoderRule for Spark 2.x integration

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-516:
-
Summary: [SPARK2]update union class in CarbonLateDecoderRule for Spark 2.x 
integration  (was: [SPARK2]update union issue in CarbonLateDecoderRule for 
Spark 2.x integration)

> [SPARK2]update union class in CarbonLateDecoderRule for Spark 2.x integration
> -
>
> Key: CARBONDATA-516
> URL: https://issues.apache.org/jira/browse/CARBONDATA-516
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: QiangCai
>Assignee: QiangCai
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In spark2, Union class is no longer sub-class of BinaryNode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-464) Frequent GC incurs when Carbon's blocklet size is enlarged from the default

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-464:
-
Description: 
other columnar file format fetch 1 million(a row group) at a time, its data is 
divided into column chunks in columnar format, and each column trunk consists 
of many pages, the page(default size 1 MB) can be independently uncompressed 
and processed.
In case of current carbon,  since we use larger blocklet, it requires larger 
processing memory because it decompresses all projected column chunks within a 
blocklet all at once, which consumes big amount of memory in total. Maybe we 
should consider to come up with an alternative approach to balance I/O and 
processing, in order to reduce GC pressure.

  was:
parquet might fetch from i/o 1 million(a row group) at one time, its data is 
divided into column chunks in columnar format, and each column trunk consists 
of many pages, the page(default size 1 MB) can be independently uncompressed 
and processed.
In case of current carbon since we use larger blocklet, it requires larger 
processing memory as well, as it decompresses all projected column chunks 
within a blocklet, which consumes big amount of memory. Maybe we should 
consider to come up with similar approach to balance I/O and processing, but 
such a change requires carbon format level changes.

Summary: Frequent GC incurs when Carbon's blocklet size is enlarged 
from the default  (was: Big GC occurs frequently when Carbon's blocklet size is 
enlarged from the default)

> Frequent GC incurs when Carbon's blocklet size is enlarged from the default
> ---
>
> Key: CARBONDATA-464
> URL: https://issues.apache.org/jira/browse/CARBONDATA-464
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: suo tong
>
> other columnar file format fetch 1 million(a row group) at a time, its data 
> is divided into column chunks in columnar format, and each column trunk 
> consists of many pages, the page(default size 1 MB) can be independently 
> uncompressed and processed.
> In case of current carbon,  since we use larger blocklet, it requires larger 
> processing memory because it decompresses all projected column chunks within 
> a blocklet all at once, which consumes big amount of memory in total. Maybe 
> we should consider to come up with an alternative approach to balance I/O and 
> processing, in order to reduce GC pressure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-436) Make blocklet size configuration respect to the actual size (in terms of byte) of the blocklet

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-436:
-
Description: Currently, the blocklet size is based on the row counts within 
the blocklet. The default value(12) is small for hdfs io. If we increase 
the value, which may cause too many Young-GC when we scan many columns, 
instead, we can extend the configuration with respect to the actual size of the 
blocklet.  (was: Currently, the blocklet size is the row counts in the 
blocklet. The default value(12) is small for hdfs io. If we increase the 
value, which may cause too many Young-GC when we scan many columns. Like 
parquet, its row group size can be configed, and using hdfs block size as its 
default value.)
Summary: Make blocklet size configuration respect to the actual size 
(in terms of byte) of the blocklet  (was: change blocklet size related to the 
raw size of data )

> Make blocklet size configuration respect to the actual size (in terms of 
> byte) of the blocklet
> --
>
> Key: CARBONDATA-436
> URL: https://issues.apache.org/jira/browse/CARBONDATA-436
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: suo tong
>Assignee: QiangCai
>
> Currently, the blocklet size is based on the row counts within the blocklet. 
> The default value(12) is small for hdfs io. If we increase the value, 
> which may cause too many Young-GC when we scan many columns, instead, we can 
> extend the configuration with respect to the actual size of the blocklet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-432) Feed Carbon task‘s input size to Spark

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-432:
-
Description: Currently, the input size of task/stage couldn't be displayed 
properly in the spark web UI  (was: Currently, the input size of task/stage 
does not display in the spark job UI)
Summary: Feed Carbon task‘s input size to Spark  (was: Adaptation  
task‘s input size to spark job UI)

> Feed Carbon task‘s input size to Spark
> --
>
> Key: CARBONDATA-432
> URL: https://issues.apache.org/jira/browse/CARBONDATA-432
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: suo tong
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, the input size of task/stage couldn't be displayed properly in the 
> spark web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-442) Query result mismatching with Hive

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-442:
-
Summary: Query result mismatching with Hive  (was: SELECT querry result 
mismatched with hive result)

> Query result mismatching with Hive
> --
>
> Key: CARBONDATA-442
> URL: https://issues.apache.org/jira/browse/CARBONDATA-442
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SWATI RAO
>
> => I created table using following command : 
> create table Carbon_automation_test5 (imei string,deviceInformationId int,MAC 
> string,deviceColor string,device_backColor string,modelId string,marketName 
> string,AMSize string,ROMSize string,CUPAudit string,CPIClocked string,series 
> string,productionDate string,bomCode string,internalModels string, 
> deliveryTime string, channelsId string,channelsName string , deliveryAreaId 
> string, deliveryCountry string, deliveryProvince string, deliveryCity 
> string,deliveryDistrict string, deliveryStreet string,oxSingleNumber string, 
> ActiveCheckTime string, ActiveAreaId string, ActiveCountry string, 
> ActiveProvince string, Activecity string, ActiveDistrict string, ActiveStreet 
> string, ActiveOperatorId string, Active_releaseId string, Active_EMUIVersion 
> string,Active_operaSysVersion string, Active_BacVerNumber string, 
> Active_BacFlashVer string,Active_webUIVersion string, Active_webUITypeCarrVer 
> string,Active_webTypeDataVerNumber string, Active_operatorsVersion string, 
> Active_phonePADPartitionedVersions string,Latest_YEAR int, Latest_MONTH int, 
> Latest_DAY int, Latest_HOUR string, Latest_areaId string, Latest_country 
> string, Latest_province string, Latest_city string,Latest_district string, 
> Latest_street string, Latest_releaseId string,Latest_EMUIVersion string, 
> Latest_operaSysVersion string, Latest_BacVerNumber string,Latest_BacFlashVer 
> string, Latest_webUIVersion string, Latest_webUITypeCarrVer 
> string,Latest_webTypeDataVerNumber string, Latest_operatorsVersion 
> string,Latest_phonePADPartitionedVersions string, Latest_operatorId 
> string,gamePointDescription string, gamePointId int,contractNumber int) 
> stored by 'org.apache.carbondata.format' 
> => Load csv to table : 
> LOAD DATA INPATH 'hdfs://localhost:54310/user/hduser/100_olap.csv' INTO table 
> Carbon_automation_test5 OPTIONS('DELIMITER'= ',' ,'QUOTECHAR'= '"', 
> 'FILEHEADER'= 
> 'imei,deviceInformationId,MAC,deviceColor,device_backColor,modelId,marketName,AMSize,ROMSize,CUPAudit,CPIClocked,series,productionDate,bomCode,internalModels,deliveryTime,channelsId,channelsName,deliveryAreaId,deliveryCountry,deliveryProvince,deliveryCity,deliveryDistrict,deliveryStreet,oxSingleNumber,contractNumber,ActiveCheckTime,ActiveAreaId,ActiveCountry,ActiveProvince,Activecity,ActiveDistrict,ActiveStreet,ActiveOperatorId,Active_releaseId,Active_EMUIVersion,Active_operaSysVersion,Active_BacVerNumber,Active_BacFlashVer,Active_webUIVersion,Active_webUITypeCarrVer,Active_webTypeDataVerNumber,Active_operatorsVersion,Active_phonePADPartitionedVersions,Latest_YEAR,Latest_MONTH,Latest_DAY,Latest_HOUR,Latest_areaId,Latest_country,Latest_province,Latest_city,Latest_district,Latest_street,Latest_releaseId,Latest_EMUIVersion,Latest_operaSysVersion,Latest_BacVerNumber,Latest_BacFlashVer,Latest_webUIVersion,Latest_webUITypeCarrVer,Latest_webTypeDataVerNumber,Latest_operatorsVersion,Latest_phonePADPartitionedVersions,Latest_operatorId,gamePointId,gamePointDescription')
> =>now executed SELECT querry : 
> SELECT Carbon_automation_test5.AMSize AS AMSize, 
> Carbon_automation_test5.ActiveCountry AS ActiveCountry, 
> Carbon_automation_test5.Activecity AS Activecity , 
> SUM(Carbon_automation_test5.gamePointId) AS Sum_gamePointId FROM ( SELECT 
> AMSize,ActiveCountry,gamePointId, Activecity FROM (select * from 
> Carbon_automation_test5) SUB_QRY ) Carbon_automation_test5 INNER JOIN ( 
> SELECT ActiveCountry, Activecity, AMSize FROM (select * from 
> Carbon_automation_test5) SUB_QRY ) Carbon_automation_vmall_test1 ON 
> Carbon_automation_test5.AMSize = Carbon_automation_vmall_test1.AMSize WHERE 
> NOT(Carbon_automation_test5.AMSize <= '3RAM size') GROUP BY 
> Carbon_automation_test5.AMSize, Carbon_automation_test5.ActiveCountry, 
> Carbon_automation_test5.Activecity ORDER BY Carbon_automation_test5.AMSize 
> ASC, Carbon_automation_test5.ActiveCountry ASC, 
> Carbon_automation_test5.Activecity ASC;
> +++-+--+--+
> |   AMSize   | ActiveCountry  | Activecity  | Sum_gamePointId  |
> +++-+--+--+
> | 4RAM size  | Chinese| changsha| 200860   |
> | 4RAM size  | Chinese| guangzhou   | 38016|
> | 4RAM size  | Chinese| shenzhen| 49610|
> | 4RAM size  | 

[jira] [Updated] (CARBONDATA-478) Separate SparkRowReadSupportImpl implementation for integrating with Spark1.x vs. Spark 2.x

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-478:
-
Issue Type: New Feature  (was: Bug)
   Summary: Separate SparkRowReadSupportImpl implementation for integrating 
with Spark1.x vs. Spark 2.x  (was: Spark2 module should have different 
SparkRowReadSupportImpl with spark1)

> Separate SparkRowReadSupportImpl implementation for integrating with Spark1.x 
> vs. Spark 2.x
> ---
>
> Key: CARBONDATA-478
> URL: https://issues.apache.org/jira/browse/CARBONDATA-478
> Project: CarbonData
>  Issue Type: New Feature
>  Components: data-query
>Affects Versions: 0.2.0-incubating
>Reporter: QiangCai
>Assignee: QiangCai
> Fix For: 1.0.0-incubating
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-484) Implement LRU cache for B-Tree

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-484:
-
Description: 
LRU Cache for B-Tree is proposed  to ensure to avoid out memory, when too many 
number of tables exits and all are not frequently used.

Problem:

CarbonData is maintaining two level of B-Tree cache, one at the driver level 
and another at executor level.  Currently CarbonData has the mechanism to 
invalidate the segments and blocks cache for the invalid table segments, but 
there is no eviction policy for the unused cached object. So the instance at 
which complete memory is utilized then the system will not be able to process 
any new requests.

Solution:

In the cache maintained at the driver level and at the executor there must be 
objects in cache currently not in use. Therefore system should have the 
mechanism to below mechanism.

1.   Set the max memory limit till which objects could be hold in the 
memory.

2.   When configured memory limit reached then identify the cached objects 
currently not in use so that the required memory could be freed without 
impacting the existing process.

3.   Eviction should be done only till the required memory is not meet.

For details please refer to attachments.

  was:
*LRU Cache for B-Tree*
Problem:

CarbonData is maintaining two level of B-Tree cache, one at the driver level 
and another at executor level.  Currently CarbonData has the mechanism to 
invalidate the segments and blocks cache for the invalid table segments, but 
there is no eviction policy for the unused cached object. So the instance at 
which complete memory is utilized then the system will not be able to process 
any new requests.

Solution:

In the cache maintained at the driver level and at the executor there must be 
objects in cache currently not in use. Therefore system should have the 
mechanism to below mechanism.

1.   Set the max memory limit till which objects could be hold in the 
memory.

2.   When configured memory limit reached then identify the cached objects 
currently not in use so that the required memory could be freed without 
impacting the existing process.

3.   Eviction should be done only till the required memory is not meet.

For details please refer to attachments.

Summary: Implement LRU cache for B-Tree   (was: Implement LRU cache for 
B-Tree to ensure to avoid out memory, when too many number of tables exits and 
all are not frequently used.)

> Implement LRU cache for B-Tree 
> ---
>
> Key: CARBONDATA-484
> URL: https://issues.apache.org/jira/browse/CARBONDATA-484
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Mohammad Shahid Khan
>Assignee: Mohammad Shahid Khan
> Attachments: B-Tree LRU Cache.pdf
>
>
> LRU Cache for B-Tree is proposed  to ensure to avoid out memory, when too 
> many number of tables exits and all are not frequently used.
> Problem:
> CarbonData is maintaining two level of B-Tree cache, one at the driver level 
> and another at executor level.  Currently CarbonData has the mechanism to 
> invalidate the segments and blocks cache for the invalid table segments, but 
> there is no eviction policy for the unused cached object. So the instance at 
> which complete memory is utilized then the system will not be able to process 
> any new requests.
> Solution:
> In the cache maintained at the driver level and at the executor there must be 
> objects in cache currently not in use. Therefore system should have the 
> mechanism to below mechanism.
> 1.   Set the max memory limit till which objects could be hold in the 
> memory.
> 2.   When configured memory limit reached then identify the cached 
> objects currently not in use so that the required memory could be freed 
> without impacting the existing process.
> 3.   Eviction should be done only till the required memory is not meet.
> For details please refer to attachments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-495) Unify compressor interface

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-495:
-
Description: Use compressor factory to unify the interface and eliminate 
small objects  (was: Use factory for compressor to unify the interface and 
reduce small objects)
 Issue Type: Improvement  (was: Bug)

> Unify compressor interface
> --
>
> Key: CARBONDATA-495
> URL: https://issues.apache.org/jira/browse/CARBONDATA-495
> Project: CarbonData
>  Issue Type: Improvement
>Affects Versions: 0.2.0-incubating
>Reporter: Jacky Li
>Assignee: Jacky Li
> Fix For: 1.0.0-incubating
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Use compressor factory to unify the interface and eliminate small objects



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-516) [SPARK2]update union issue in CarbonLateDecoderRule for Spark 2.x integration

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-516:
-
Description: In spark2, Union class is no longer sub-class of BinaryNode.   
(was: In spark2, Union class is no longer the sub-class of BinaryNode. We need 
fix union issue in CarbonLateDecoderRule for spark2.)
 Issue Type: New Feature  (was: Bug)
Summary: [SPARK2]update union issue in CarbonLateDecoderRule for Spark 
2.x integration  (was: [SPARK2]fix union issue in CarbonLateDecoderRule)

> [SPARK2]update union issue in CarbonLateDecoderRule for Spark 2.x integration
> -
>
> Key: CARBONDATA-516
> URL: https://issues.apache.org/jira/browse/CARBONDATA-516
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: QiangCai
>Assignee: QiangCai
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In spark2, Union class is no longer sub-class of BinaryNode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-519) Enable vector reader in Carbon-Spark 2.0 integration and Carbon layer

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-519:
-
Description: Spark 2.0 supports vectorized reader and uses whole codegen to 
improve performance, Carbon will enable vectorized reader integrating with 
Spark to take advantage of new features of Spark2.x  (was: Spark 2.0 supports 
batch reader and uses whole codegen to improve performance, so carbon also can 
implement vector reader and leverage the features of Spark2.0)
 Issue Type: New Feature  (was: Improvement)

> Enable vector reader in Carbon-Spark 2.0 integration and Carbon layer
> -
>
> Key: CARBONDATA-519
> URL: https://issues.apache.org/jira/browse/CARBONDATA-519
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Spark 2.0 supports vectorized reader and uses whole codegen to improve 
> performance, Carbon will enable vectorized reader integrating with Spark to 
> take advantage of new features of Spark2.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-522) New data loading flowcauses testcase failures like big decimal etc

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-522:
-
Description: 
Pls check http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/105/.

new data flow cause test regressions.

  was:
Pls check http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/105/.

I suggest that we should test new data loading flow when adding new feature.

Summary: New data loading flowcauses testcase failures like big decimal 
etc  (was: New data loading flow can not pass some testcase like big decimal 
etc)

> New data loading flowcauses testcase failures like big decimal etc
> --
>
> Key: CARBONDATA-522
> URL: https://issues.apache.org/jira/browse/CARBONDATA-522
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Ravindra Pesala
>
> Pls check http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/105/.
> new data flow cause test regressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-527) Greater than/less-than/Like filters optimization for dictionary encoded columns

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-527:
-
Issue Type: New Feature  (was: Improvement)
   Summary: Greater than/less-than/Like filters optimization for dictionary 
encoded columns  (was: Greater than/less-than/Like filters optimization for 
dictionary columns)

> Greater than/less-than/Like filters optimization for dictionary encoded 
> columns
> ---
>
> Key: CARBONDATA-527
> URL: https://issues.apache.org/jira/browse/CARBONDATA-527
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Sujith
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Current design 
> In greater than/less-than/Like filters, system first iterates each row 
> present in the dictionary cache for identifying valid filter actual members  
> by applying the filter expression , once evaluation done system will hold the 
> list of identified valid filter actual member values(String), now in next 
> step again  system will look up the dictionary cache in order to identify the 
> dictionary surrogate values of the identified members. this look up is an 
> additional cost to our system even though the look up methodology is an 
> binary search in dictionary cache.
>  
> Proposed design/solution:
> Identify the dictionary surrogate values in filter expression evaluation step 
> itself  when actual dictionary values will be scanned for identifying valid 
> filter members .
> Keep a dictionary counter variable which will be increased  when system 
> iterates through  the dictionary cache in order to retrieve each actual 
> member stored in dictionary cache , after this system will evaluate each row 
> against the filter expression to identify whether its a valid filter member 
> or not, while doing this process itself counter value can be taken as valid 
> selected dictionary value since the actual member values and its  dictionary 
> values will be kept in same order in dictionary cache as the iteration order.
> thus it will eliminate the further dictionary look up step which is required  
> to retrieve the dictionary surrogate value against identified actual valid 
> filter member. this can also increase significantly the filter query 
> performance of such filter queries which require expression evaluation to 
> identify it the filter members by looking up dictionary cache, like greater 
> than/less-than/Like filters .
> Note : this optimization is applicable for dictionary columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-531) Eliminate spark dependency in carbon core

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-531:
-
Description: Clean up the interface and take out Spark dependency on 
Carbon-core module.  (was: Carbon-core module should not depend on spark )
Summary: Eliminate spark dependency in carbon core  (was: Remove spark 
dependency in carbon core)

> Eliminate spark dependency in carbon core
> -
>
> Key: CARBONDATA-531
> URL: https://issues.apache.org/jira/browse/CARBONDATA-531
> Project: CarbonData
>  Issue Type: Improvement
>Affects Versions: 0.2.0-incubating
>Reporter: Jacky Li
> Fix For: 1.0.0-incubating
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Clean up the interface and take out Spark dependency on Carbon-core module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-536) Initialize GlobalDictionaryUtil.updateTableMetadataFunc for Spark 2.x

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-536:
-
Description: GlobalDictionaryUtil.updateTableMetadataFunc needs to be 
initialized.  (was: For spark2, GlobalDictionaryUtil.updateTableMetadataFunc 
should been initialized)
Summary: Initialize GlobalDictionaryUtil.updateTableMetadataFunc for 
Spark 2.x  (was: For spark2, GlobalDictionaryUtil.updateTableMetadataFunc 
should been initialized)

> Initialize GlobalDictionaryUtil.updateTableMetadataFunc for Spark 2.x
> -
>
> Key: CARBONDATA-536
> URL: https://issues.apache.org/jira/browse/CARBONDATA-536
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.0.0-incubating
>Reporter: QiangCai
>Assignee: QiangCai
> Fix For: 1.0.0-incubating
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> GlobalDictionaryUtil.updateTableMetadataFunc needs to be initialized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-322) Integration with spark 2.x

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-322:
-
Issue Type: New Feature  (was: Improvement)

> Integration with  spark 2.x 
> 
>
> Key: CARBONDATA-322
> URL: https://issues.apache.org/jira/browse/CARBONDATA-322
> Project: CarbonData
>  Issue Type: New Feature
>  Components: spark-integration
>Affects Versions: 0.2.0-incubating
>Reporter: Fei Wang
>Assignee: Fei Wang
> Fix For: 1.0.0-incubating
>
>
> Since spark 2.0 released. there are many nice features such as more efficient 
> parser, vectorized execution, adaptive execution. 
> It is good to integrate with spark 2.x
> current integration up to Spark v1.6 is tightly coupled with spark, we would 
> like to cleanup the interface with following design points in mind: 
> 1. decoupled with Spark, integration based on Spark's v2 datasource API
> 2. Enable vectorized carbon reader
> 3. Support saving DataFrame to Carbondata file through Carbondata's output 
> format.
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-322) Integration with spark 2.x

2016-12-15 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-322:
-
Description: 
Since spark 2.0 released. there are many nice features such as more efficient 
parser, vectorized execution, adaptive execution. 
It is good to integrate with spark 2.x

current integration up to Spark v1.6 is tightly coupled with spark, we would 
like to cleanup the interface with following design points in mind: 

1. decoupled with Spark, integration based on Spark's v2 datasource API
2. Enable vectorized carbon reader
3. Support saving DataFrame to Carbondata file through Carbondata's output 
format.
...


  was:
As spark 2.0 released. there are many nice features such as more efficient 
parser, vectorized execution, adaptive execution. 
It is good to integrate with spark 2.x

Another side now in carbondata, spark integration is heavy coupling with spark 
code and the code need clean, we should redesign the spark integration, it 
should satisfy flowing requirement:

1. decoupled with spark, integrate according to spark datasource API(V2)
2. This integration should support vectorized carbon reader
3. Supoort write to carbondata from dadatrame
...


 Issue Type: Improvement  (was: Bug)
Summary: Integration with  spark 2.x   (was: integrate spark 2.x )

> Integration with  spark 2.x 
> 
>
> Key: CARBONDATA-322
> URL: https://issues.apache.org/jira/browse/CARBONDATA-322
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 0.2.0-incubating
>Reporter: Fei Wang
>Assignee: Fei Wang
> Fix For: 1.0.0-incubating
>
>
> Since spark 2.0 released. there are many nice features such as more efficient 
> parser, vectorized execution, adaptive execution. 
> It is good to integrate with spark 2.x
> current integration up to Spark v1.6 is tightly coupled with spark, we would 
> like to cleanup the interface with following design points in mind: 
> 1. decoupled with Spark, integration based on Spark's v2 datasource API
> 2. Enable vectorized carbon reader
> 3. Support saving DataFrame to Carbondata file through Carbondata's output 
> format.
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CARBONDATA-430) Carbon data tpch benchmark

2016-11-21 Thread Jihong MA (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated CARBONDATA-430:
-
Issue Type: Improvement  (was: Task)

> Carbon data tpch benchmark
> --
>
> Key: CARBONDATA-430
> URL: https://issues.apache.org/jira/browse/CARBONDATA-430
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: suo tong
>Assignee: suo tong
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)