[GitHub] carbondata issue #2571: [CARBONDATA-2792][schema restructure] Create externa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2571 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7721/ ---
[GitHub] carbondata issue #2597: [CARBONDATA-2802][BloomDataMap] Remove clearing cach...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2597 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6111/ ---
[GitHub] carbondata issue #2598: [CARBONDATA-2811][BloomDataMap] Add query test case ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2598 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6446/ ---
[GitHub] carbondata issue #2598: [CARBONDATA-2811][BloomDataMap] Add query test case ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2598 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7720/ ---
[GitHub] carbondata pull request #2583: [CARBONDATA-2803]fix wrong datasize calculati...
Github user akashrn5 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2583#discussion_r207103282 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -2651,8 +2651,17 @@ public static int isFilterPresent(byte[][] filterValues, carbonIndexSize = getCarbonIndexSize(fileStore, locationMap); for (Map.Entry> entry : indexFilesMap.entrySet()) { // get the size of carbondata files +String tempBlockFilePath = null; for (String blockFile : entry.getValue()) { - carbonDataSize += FileFactory.getCarbonFile(blockFile).getSize(); + // the indexFileMap contains all the blocklets and index file mapping. For example, if one + // block contains 3 blocklets, then entry.getValue() will list all the blocklets of all + // the block present in it. Since all the three blocklets will have the same block path, + // so just get the size of one block path for exact data size and avoid wrong datasize + // calculation. + if (!blockFile.equals(tempBlockFilePath)) { --- End diff -- ok, i will check and remove this change ---
[GitHub] carbondata issue #2598: [CARBONDATA-2811][BloomDataMap] Add query test case ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2598 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6110/ ---
[GitHub] carbondata issue #2595: [Documentation] [Unsafe Configuration] Added carbon....
Github user manishgupta88 commented on the issue: https://github.com/apache/carbondata/pull/2595 @xuchuanyin Usually in production scenarios driver memory will be less than the executor memory. Now we are using unsafe for caching block/blocklet dataMap in driver. Current unsafe memory configured fo executor is getting used for driver also which is not a good idea. Therefore it is required to separate out driver and executor unsafe memory. You can observe the same in spark configuration also that spark has given different parameters for configuring driver and executor memory overhead to control the unsafe memory usage. spark.yarn.driver.memoryOverhead and spark.yarn.executor.memoryOverhead ---
[GitHub] carbondata issue #2599: [CARBONDATA-2812] Implement freeMemory for complex p...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2599 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6444/ ---
[GitHub] carbondata pull request #2593: [CARBONDATA-2753][Compatibility] Merge Index ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2593 ---
[GitHub] carbondata issue #2598: [CARBONDATA-2811][BloomDataMap] Add query test case ...
Github user kevinjmh commented on the issue: https://github.com/apache/carbondata/pull/2598 retest this please ---
[GitHub] carbondata issue #2593: [CARBONDATA-2753][Compatibility] Merge Index file no...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2593 LGTM ---
[GitHub] carbondata pull request #2583: [CARBONDATA-2803]fix wrong datasize calculati...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2583#discussion_r207097923 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java --- @@ -2651,8 +2651,17 @@ public static int isFilterPresent(byte[][] filterValues, carbonIndexSize = getCarbonIndexSize(fileStore, locationMap); for (Map.Entry> entry : indexFilesMap.entrySet()) { // get the size of carbondata files +String tempBlockFilePath = null; for (String blockFile : entry.getValue()) { - carbonDataSize += FileFactory.getCarbonFile(blockFile).getSize(); + // the indexFileMap contains all the blocklets and index file mapping. For example, if one + // block contains 3 blocklets, then entry.getValue() will list all the blocklets of all + // the block present in it. Since all the three blocklets will have the same block path, + // so just get the size of one block path for exact data size and avoid wrong datasize + // calculation. + if (!blockFile.equals(tempBlockFilePath)) { --- End diff -- I feel this fix is not required, Please check PR https://github.com/apache/carbondata/pull/2596 to avoid duplicates from indexfileMap ---
[GitHub] carbondata issue #2599: [CARBONDATA-2812] Implement freeMemory for complex p...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2599 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7718/ ---
[GitHub] carbondata issue #2598: [CARBONDATA-2811][BloomDataMap] Add query test case ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2598 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7719/ ---
[GitHub] carbondata issue #2597: [CARBONDATA-2802][BloomDataMap] Remove clearing cach...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2597 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6443/ ---
[GitHub] carbondata issue #2597: [CARBONDATA-2802][BloomDataMap] Remove clearing cach...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2597 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7717/ ---
[GitHub] carbondata issue #2599: [CARBONDATA-2812] Implement freeMemory for complex p...
Github user dhatchayani commented on the issue: https://github.com/apache/carbondata/pull/2599 Retest sdv please ---
[GitHub] carbondata issue #2599: [CARBONDATA-2812] Implement freeMemory for complex p...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2599 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6109/ ---
[GitHub] carbondata issue #2597: [CARBONDATA-2802][BloomDataMap] Remove clearing cach...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2597 This modification is not a final complete fix for ISSUE2802 ---
[GitHub] carbondata issue #2598: [CARBONDATA-2811][BloomDataMap] Add query test case ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2598 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6445/ ---
[jira] [Resolved] (CARBONDATA-2478) Add datamap-developer-guide.md file in readme
[ https://issues.apache.org/jira/browse/CARBONDATA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen resolved CARBONDATA-2478. Resolution: Fixed Fix Version/s: 1.4.1 1.5.0 > Add datamap-developer-guide.md file in readme > - > > Key: CARBONDATA-2478 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2478 > Project: CarbonData > Issue Type: Bug > Components: docs >Affects Versions: 1.4.0 >Reporter: Vandana Yadav >Assignee: Vandana Yadav >Priority: Trivial > Fix For: 1.5.0, 1.4.1 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Add datamap-developer-guide.md file in readme > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2478) Add datamap-developer-guide.md file in readme
[ https://issues.apache.org/jira/browse/CARBONDATA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen reassigned CARBONDATA-2478: -- Assignee: Vandana Yadav > Add datamap-developer-guide.md file in readme > - > > Key: CARBONDATA-2478 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2478 > Project: CarbonData > Issue Type: Bug > Components: docs >Affects Versions: 1.4.0 >Reporter: Vandana Yadav >Assignee: Vandana Yadav >Priority: Trivial > Fix For: 1.5.0, 1.4.1 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Add datamap-developer-guide.md file in readme > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2579: [HOTFIX][PR 2575] Fixed modular plan creation only i...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2579 LGTM ---
[GitHub] carbondata pull request #2305: [CARBONDATA-2478] Added datamap-developer-gui...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2305 ---
[GitHub] carbondata pull request #2599: [CARBONDATA-2812] Implement freeMemory for co...
GitHub user dhatchayani opened a pull request: https://github.com/apache/carbondata/pull/2599 [CARBONDATA-2812] Implement freeMemory for complex pages **Problem:** The memory used by the ColumnPageWrapper (for complex data types) is not cleared and so it requires more memory to Load and Query. **Solution:** Clear the used memory in the freeMemory method. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Manual Testing - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dhatchayani/carbondata CARBONDATA-2812 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2599.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2599 commit 28a9876274e97e71f7d65dcfbfce23fc173b6727 Author: dhatchayani Date: 2018-08-02T03:00:32Z [CARBONDATA-2812] Implement freeMemory for complex pages ---
[jira] [Resolved] (CARBONDATA-2800) Add useful tips for bloomfilter datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-2800. Resolution: Fixed Fix Version/s: 1.4.1 > Add useful tips for bloomfilter datamap > --- > > Key: CARBONDATA-2800 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2800 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Fix For: 1.4.1 > > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2812) Implement freeMemory for complex pages
[ https://issues.apache.org/jira/browse/CARBONDATA-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2812: Summary: Implement freeMemory for complex pages (was: Implement free memory for complex pages ) > Implement freeMemory for complex pages > --- > > Key: CARBONDATA-2812 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2812 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2793) Add document for 32k feature
[ https://issues.apache.org/jira/browse/CARBONDATA-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin resolved CARBONDATA-2793. Resolution: Fixed Fix Version/s: 1.4.1 > Add document for 32k feature > > > Key: CARBONDATA-2793 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2793 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Fix For: 1.4.1 > > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2800) Add useful tips for bloomfilter datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin reassigned CARBONDATA-2800: -- Assignee: xuchuanyin > Add useful tips for bloomfilter datamap > --- > > Key: CARBONDATA-2800 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2800 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2598: [CARBONDATA-2811][BloomDataMap] Add query tes...
GitHub user kevinjmh opened a pull request: https://github.com/apache/carbondata/pull/2598 [CARBONDATA-2811][BloomDataMap] Add query test case using search mode on table with bloom filter Add query test case using search mode on table with bloom filter Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinjmh/carbondata bloom_searchmode Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2598.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2598 commit 2d6c9b9aeb306607305fbaf66a885e6cecc5676b Author: Manhua Date: 2018-08-02T02:21:16Z add search mode test case with bloom ---
[GitHub] carbondata pull request #2597: [CARBONDATA-2802][BloomDataMap] Remove cleari...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/2597 [CARBONDATA-2802][BloomDataMap] Remove clearing cache after rebuiding index datamap This is no need to clear cache after rebuilding index datamap due to the following reasons: 1. currently it will clear all the caches for all index datamaps, not only for the current rebuilding one 2. the life cycle of table data and index datamap data is the same, there is no need to clear it. (once the index datamap is created or once the main table is loaded, data of the datamap will be generated too -- in both scenarios, data of the datamap is up to date with the main table. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `NO` - [x] Any backward compatibility impacted? `NO` - [x] Document update required? `NO` - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `NO` - How it is tested? Please attach test report. `Tested in local machine` - Is it a performance related change? Please attach the performance test report. `NO` - Any additional information to help reviewers in testing this change. `NA` - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `NA` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0802_remove_clear_dm_4_index_dm Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2597.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2597 commit f3c816745c8f7a829384d3241934196b1bb391ee Author: xuchuanyin Date: 2018-08-02T02:45:17Z Remove clearing cache after rebuiding index datamap This is no need to clear cache after rebuilding index datamap due to the following reasons: 1. currently it will clear all the caches for all index datamaps, not only for the current rebuilding one 2. the life cycle of table data and index datamap data is the same, there is no need to clear it. (once the index datamap is created or once the main table is loaded, data of the datamap will be generated too -- in both scenarios, data of the datamap is up to date with the main table. ---
[jira] [Created] (CARBONDATA-2812) Implement free memory for complex pages
dhatchayani created CARBONDATA-2812: --- Summary: Implement free memory for complex pages Key: CARBONDATA-2812 URL: https://issues.apache.org/jira/browse/CARBONDATA-2812 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2811) Add query test case using search mode on table with bloom filter
jiangmanhua created CARBONDATA-2811: --- Summary: Add query test case using search mode on table with bloom filter Key: CARBONDATA-2811 URL: https://issues.apache.org/jira/browse/CARBONDATA-2811 Project: CarbonData Issue Type: Sub-task Reporter: jiangmanhua Assignee: jiangmanhua -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2594: [CARBONDATA-2809][DataMap] Skip rebuilding for non-l...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2594 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6441/ ---
[GitHub] carbondata issue #2594: [CARBONDATA-2809][DataMap] Skip rebuilding for non-l...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2594 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7715/ ---
[GitHub] carbondata issue #2594: [CARBONDATA-2809][DataMap] Skip rebuilding for non-l...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2594 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6108/ ---
[jira] [Resolved] (CARBONDATA-2783) Update document of bloom filter datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiangmanhua resolved CARBONDATA-2783. - Resolution: Fixed Fix Version/s: 1.4.1 > Update document of bloom filter datamap > --- > > Key: CARBONDATA-2783 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2783 > Project: CarbonData > Issue Type: Sub-task >Reporter: jiangmanhua >Assignee: jiangmanhua >Priority: Major > Fix For: 1.4.1 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2783) Update document of bloom filter datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiangmanhua reassigned CARBONDATA-2783: --- Assignee: jiangmanhua > Update document of bloom filter datamap > --- > > Key: CARBONDATA-2783 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2783 > Project: CarbonData > Issue Type: Sub-task >Reporter: jiangmanhua >Assignee: jiangmanhua >Priority: Major > Fix For: 1.4.1 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r207084686 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. - - By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + **The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled. + + **NOTE:** Following Data Types are not Supported for Local Dictionary: + * SMALLINT + * INTEGER + * BIGINT + * DOUBLE + * DECIMAL + * TIMESTAMP + * DATE + * CHAR --- End diff -- Why is `CHAR` not supported? As I know, SparkSQL treat both varchar and char as string, so in carbon data we actually see string. ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r207085356 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. - - By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + **The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled. + + **NOTE:** Following Data Types are not Supported for Local Dictionary: + * SMALLINT + * INTEGER + * BIGINT + * DOUBLE + * DECIMAL + * TIMESTAMP + * DATE + * CHAR + * BOOLEAN + + By default, Local Dictionary will be disabled. Users will be able to pass following properties in create table command: | Properties | Default value | Description | | -- | - | --- | - | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will not be enabled for the table | - | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for local dictionary generation (range- 1000 to 10) | - | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. | + | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for local dictionary generation (maximum - 10) | + | LOCAL_DICTIONARY_INCLUDE | all string/varchar columns not specified in dictionary include| Columns for which Local Dictionary is generated. | | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated | **NOTE:** If the cardinality exceeds the threshold, this column will not use local dictionary encoding. And in this case, the data loading performance will decrease since there is a rollback procedure for local dictionary encoding. --- End diff -- For line 149 (162): ``` Encoded data and Actual data are both stored when Local Dictionary is enabled. ``` please change it to: ``` Encoded data with & without Local dictionary are both stored when Local Dictionary is enabled during data loading, so it requires more memory than before. ``` ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r207084952 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. - - By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + **The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled. + + **NOTE:** Following Data Types are not Supported for Local Dictionary: + * SMALLINT + * INTEGER + * BIGINT + * DOUBLE + * DECIMAL + * TIMESTAMP + * DATE + * CHAR --- End diff -- what about complex? you didn't mention it ---
[GitHub] carbondata pull request #2592: [WIP]Updated & enhanced Documentation of Carb...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2592#discussion_r207084144 --- Diff: docs/useful-tips-on-carbondata.md --- @@ -30,16 +30,16 @@ - **Table Column Description** - | Column Name | Data Type | Cardinality | Attribution | - |-|---|-|-| - | msisdn | String| 30 million | Dimension | - | BEGIN_TIME | BigInt| 10 Thousand | Dimension | - | HOST| String| 1 million | Dimension | - | Dime_1 | String| 1 Thousand | Dimension | - | counter_1 | Decimal | NA | Measure | - | counter_2 | Numeric(20,0) | NA | Measure | - | ... | ... | NA | Measure | - | counter_100 | Decimal | NA | Measure | +| Column Name | Data Type | Cardinality | Attribution | +|-|---|-|-| +| msisdn | String| 30 million | Dimension | +| BEGIN_TIME | BigInt| 10 Thousand | Dimension | +| HOST| String| 1 million | Dimension | +| Dime_1 | String| 1 Thousand | Dimension | +| counter_1 | Decimal | NA | Measure | +| counter_2 | Numeric(20,0) | NA | Measure | +| ... | ... | NA | Measure | +| counter_100 | Decimal | NA | Measure | - **Put the frequently-used column filter in the beginning** --- End diff -- For the following section, it is the similar with it. ---
[GitHub] carbondata pull request #2592: [WIP]Updated & enhanced Documentation of Carb...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2592#discussion_r207083747 --- Diff: docs/useful-tips-on-carbondata.md --- @@ -158,18 +156,18 @@ Recently we did some performance POC on CarbonData for Finance and telecommunication Field. It involved detailed queries and aggregation scenarios. After the completion of POC, some of the configurations impacting the performance have been identified and tabulated below : - | Parameter | Location | Used For | Description | Tuning | - |--|---|---|--|| - | carbon.sort.intermediate.files.limit | spark/carbonlib/carbon.properties | Data loading | During the loading of data, local temp is used to sort the data. This number specifies the minimum number of intermediate files after which the merge sort has to be initiated. | Increasing the parameter to a higher value will improve the load performance. For example, when we increase the value from 20 to 100, it increases the data load performance from 35MB/S to more than 50MB/S. Higher values of this parameter consumes more memory during the load. | - | carbon.number.of.cores.while.loading | spark/carbonlib/carbon.properties | Data loading | Specifies the number of cores used for data processing during data loading in CarbonData. | If you have more number of CPUs, then you can increase the number of CPUs, which will increase the performance. For example if we increase the value from 2 to 4 then the CSV reading performance can increase about 1 times | - | carbon.compaction.level.threshold | spark/carbonlib/carbon.properties | Data loading and Querying | For minor compaction, specifies the number of segments to be merged in stage 1 and number of compacted segments to be merged in stage 2. | Each CarbonData load will create one segment, if every load is small in size it will generate many small file over a period of time impacting the query performance. Configuring this parameter will merge the small segment to one big segment which will sort the data and improve the performance. For Example in one telecommunication scenario, the performance improves about 2 times after minor compaction. | - | spark.sql.shuffle.partitions | spark/conf/spark-defaults.conf | Querying | The number of task started when spark shuffle. | The value can be 1 to 2 times as much as the executor cores. In an aggregation scenario, reducing the number from 200 to 32 reduced the query time from 17 to 9 seconds. | - | spark.executor.instances/spark.executor.cores/spark.executor.memory | spark/conf/spark-defaults.conf | Querying | The number of executors, CPU cores, and memory used for CarbonData query. | In the bank scenario, we provide the 4 CPUs cores and 15 GB for each executor which can get good performance. This 2 value does not mean more the better. It needs to be configured properly in case of limited resources. For example, In the bank scenario, it has enough CPU 32 cores each node but less memory 64 GB each node. So we cannot give more CPU but less memory. For example, when 4 cores and 12GB for each executor. It sometimes happens GC during the query which impact the query performance very much from the 3 second to more than 15 seconds. In this scenario need to increase the memory or decrease the CPU cores. | - | carbon.detail.batch.size | spark/carbonlib/carbon.properties | Data loading | The buffer size to store records, returned from the block scan. | In limit scenario this parameter is very important. For example your query limit is 1000. But if we set this value to 3000 that means we get 3000 records from scan but spark will only take 1000 rows. So the 2000 remaining are useless. In one Finance test case after we set it to 100, in the limit 1000 scenario the performance increase about 2 times in comparison to if we set this value to 12000. | - | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | Whether use YARN local directories for multi-table load disk
[GitHub] carbondata pull request #2592: [WIP]Updated & enhanced Documentation of Carb...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2592#discussion_r207084014 --- Diff: docs/useful-tips-on-carbondata.md --- @@ -30,16 +30,16 @@ - **Table Column Description** - | Column Name | Data Type | Cardinality | Attribution | - |-|---|-|-| - | msisdn | String| 30 million | Dimension | - | BEGIN_TIME | BigInt| 10 Thousand | Dimension | - | HOST| String| 1 million | Dimension | - | Dime_1 | String| 1 Thousand | Dimension | - | counter_1 | Decimal | NA | Measure | - | counter_2 | Numeric(20,0) | NA | Measure | - | ... | ... | NA | Measure | - | counter_100 | Decimal | NA | Measure | +| Column Name | Data Type | Cardinality | Attribution | +|-|---|-|-| +| msisdn | String| 30 million | Dimension | +| BEGIN_TIME | BigInt| 10 Thousand | Dimension | +| HOST| String| 1 million | Dimension | +| Dime_1 | String| 1 Thousand | Dimension | +| counter_1 | Decimal | NA | Measure | +| counter_2 | Numeric(20,0) | NA | Measure | +| ... | ... | NA | Measure | +| counter_100 | Decimal | NA | Measure | - **Put the frequently-used column filter in the beginning** --- End diff -- Since we have changed the default behavior of sort_columns, I think this section can be removed. OR we can change it to `Put the fequently-used column filter in the beginning of sort_columns` ---
[GitHub] carbondata issue #2595: [Documentation] [Unsafe Configuration] Added carbon....
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2595 Why driver needs this unsafe memory? ---
[jira] [Comment Edited] (CARBONDATA-2802) Creation of Bloomfilter Datamap is failing after UID,compaction,pre-aggregate datamap creation
[ https://issues.apache.org/jira/browse/CARBONDATA-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566207#comment-16566207 ] xuchuanyin edited comment on CARBONDATA-2802 at 8/2/18 1:32 AM: The error is not due to BloomFilter datamap. We can reproduce it this way: 1. create base table; 2. load data; 3. create index datamap 4. create preagg 5. query on preagg 6. clear datamaps for table (this will cause the problem) . In test code ,we can call ``` val carbonTable = CarbonEnv.getCarbonTable("default", "table")(sparkSession) val tableIdentifier = carbonTable.getAbsoluteTableIdentifier DatamapStoreManager.getInstance().clearDataMaps(tableIdentifier) ``` If we skip step3 or step5, the result is OK was (Author: xuchuanyin): The error is not due to BloomFilter datamap. We can reproduce it this way: 1. create base table; 2. load data; 3. create index datamap 4. create preagg 5. query on preagg 6. clear datamaps for table (this will cause the problem) . In test code ,we can call ``` val carbonTable = CarbonEnv.getCarbonTable("default", "table")(sparkSession) val tableIdentifier = carbonTable.getAbsoluteTableIdentifier DatamapStoreManager.getInstance().clearDataMaps(tableIdentifier) ``` If we skip step4 or step5, the result is OK > Creation of Bloomfilter Datamap is failing after UID,compaction,pre-aggregate > datamap creation > -- > > Key: CARBONDATA-2802 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2802 > Project: CarbonData > Issue Type: Bug > Components: other >Affects Versions: 1.4.1 > Environment: Spark 2.2 >Reporter: Rahul Singha >Priority: Minor > Labels: bloom-filter > > *Steps :* > 1.CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format'; > 2.LOAD DATA INPATH 'hdfs://hacluster/user/rahul/2000_UniqData.csv' into table > uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > 3.update uniqdata set (active_emui_version) = ('ACTIVE_EMUI_VERSION_1') > where cust_id = 9000; > 4.delete from uniqdata where cust_id = 9000; > 5.insert into uniqdata select > 9000,'CUST_NAME_0','ACTIVE_EMUI_VERSION_0','1970-01-01 > 01:00:03.0','1970-01-01 > 02:00:03.0',123372036854,-223372036854,12345678901.123400,22345678901.123400,1.12345674897976E10, > -1.12345674897976E10,1; > 6.alter table uniqdata compact 'major'; > 7.create datamap uniqdata_agg on table uniqdata using 'preaggregate' as > select cust_name, avg(cust_id) from uniqdata group by cust_id, cust_name; > 8.CREATE DATAMAP bloom_dob ON TABLE uniqdata USING 'bloomfilter' DMPROPERTIES > ('INDEX_COLUMNS' = 'dob', 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > *Actual output :* > 0: jdbc:hive2://ha-cluster/default> CREATE DATAMAP bloom_dob ON TABLE > uniqdata USING 'bloomfilter' DMPROPERTIES ('INDEX_COLUMNS' = 'dob', > 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 1 in stage 199.0 failed 4 times, most recent failure: Lost task 1.3 in > stage 199.0 (TID 484, BLR125336, executor 182): > java.io.InvalidClassException: > scala.collection.convert.Wrappers$MutableSetWrapper; no valid constructor > at > java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:157) > at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:862) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2041) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) > at java.util.ArrayList.readObject(ArrayList.java:797) > at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158) > at
[jira] [Comment Edited] (CARBONDATA-2802) Creation of Bloomfilter Datamap is failing after UID,compaction,pre-aggregate datamap creation
[ https://issues.apache.org/jira/browse/CARBONDATA-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566207#comment-16566207 ] xuchuanyin edited comment on CARBONDATA-2802 at 8/2/18 1:09 AM: The error is not due to BloomFilter datamap. We can reproduce it this way: 1. create base table; 2. load data; 3. create index datamap 4. create preagg 5. query on preagg 6. clear datamaps for table (this will cause the problem) . In test code ,we can call ``` val carbonTable = CarbonEnv.getCarbonTable("default", "table")(sparkSession) val tableIdentifier = carbonTable.getAbsoluteTableIdentifier DatamapStoreManager.getInstance().clearDataMaps(tableIdentifier) ``` If we skip step4 or step5, the result is OK was (Author: xuchuanyin): The error is not due to BloomFilter datamap. We can reproduce it this way: 1. create base table; 2. load data; 3. create index datamap 4. create preagg 5. query on preagg 6. clear datamaps for table (this will cause the problem) . In test code ,we can call ``` val carbonTable = CarbonEnv.getCarbonTable("default", "table")(sparkSession) val tableIdentifier = carbonTable.getAbsoluteTableIdentifier DatamapStoreManager.getInstance().clearDataMaps(tableIdentifier) ``` > Creation of Bloomfilter Datamap is failing after UID,compaction,pre-aggregate > datamap creation > -- > > Key: CARBONDATA-2802 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2802 > Project: CarbonData > Issue Type: Bug > Components: other >Affects Versions: 1.4.1 > Environment: Spark 2.2 >Reporter: Rahul Singha >Priority: Minor > Labels: bloom-filter > > *Steps :* > 1.CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format'; > 2.LOAD DATA INPATH 'hdfs://hacluster/user/rahul/2000_UniqData.csv' into table > uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > 3.update uniqdata set (active_emui_version) = ('ACTIVE_EMUI_VERSION_1') > where cust_id = 9000; > 4.delete from uniqdata where cust_id = 9000; > 5.insert into uniqdata select > 9000,'CUST_NAME_0','ACTIVE_EMUI_VERSION_0','1970-01-01 > 01:00:03.0','1970-01-01 > 02:00:03.0',123372036854,-223372036854,12345678901.123400,22345678901.123400,1.12345674897976E10, > -1.12345674897976E10,1; > 6.alter table uniqdata compact 'major'; > 7.create datamap uniqdata_agg on table uniqdata using 'preaggregate' as > select cust_name, avg(cust_id) from uniqdata group by cust_id, cust_name; > 8.CREATE DATAMAP bloom_dob ON TABLE uniqdata USING 'bloomfilter' DMPROPERTIES > ('INDEX_COLUMNS' = 'dob', 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > *Actual output :* > 0: jdbc:hive2://ha-cluster/default> CREATE DATAMAP bloom_dob ON TABLE > uniqdata USING 'bloomfilter' DMPROPERTIES ('INDEX_COLUMNS' = 'dob', > 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 1 in stage 199.0 failed 4 times, most recent failure: Lost task 1.3 in > stage 199.0 (TID 484, BLR125336, executor 182): > java.io.InvalidClassException: > scala.collection.convert.Wrappers$MutableSetWrapper; no valid constructor > at > java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:157) > at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:862) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2041) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) > at java.util.ArrayList.readObject(ArrayList.java:797) > at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158) > at
[jira] [Commented] (CARBONDATA-2802) Creation of Bloomfilter Datamap is failing after UID,compaction,pre-aggregate datamap creation
[ https://issues.apache.org/jira/browse/CARBONDATA-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566207#comment-16566207 ] xuchuanyin commented on CARBONDATA-2802: The error is not due to BloomFilter datamap. We can reproduce it this way: 1. create base table; 2. load data; 3. create index datamap 4. create preagg 5. query on preagg 6. clear datamaps for table (this will cause the problem) . In test code ,we can call ``` val carbonTable = CarbonEnv.getCarbonTable("default", "table")(sparkSession) val tableIdentifier = carbonTable.getAbsoluteTableIdentifier DatamapStoreManager.getInstance().clearDataMaps(tableIdentifier) ``` > Creation of Bloomfilter Datamap is failing after UID,compaction,pre-aggregate > datamap creation > -- > > Key: CARBONDATA-2802 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2802 > Project: CarbonData > Issue Type: Bug > Components: other >Affects Versions: 1.4.1 > Environment: Spark 2.2 >Reporter: Rahul Singha >Priority: Minor > Labels: bloom-filter > > *Steps :* > 1.CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format'; > 2.LOAD DATA INPATH 'hdfs://hacluster/user/rahul/2000_UniqData.csv' into table > uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > 3.update uniqdata set (active_emui_version) = ('ACTIVE_EMUI_VERSION_1') > where cust_id = 9000; > 4.delete from uniqdata where cust_id = 9000; > 5.insert into uniqdata select > 9000,'CUST_NAME_0','ACTIVE_EMUI_VERSION_0','1970-01-01 > 01:00:03.0','1970-01-01 > 02:00:03.0',123372036854,-223372036854,12345678901.123400,22345678901.123400,1.12345674897976E10, > -1.12345674897976E10,1; > 6.alter table uniqdata compact 'major'; > 7.create datamap uniqdata_agg on table uniqdata using 'preaggregate' as > select cust_name, avg(cust_id) from uniqdata group by cust_id, cust_name; > 8.CREATE DATAMAP bloom_dob ON TABLE uniqdata USING 'bloomfilter' DMPROPERTIES > ('INDEX_COLUMNS' = 'dob', 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > *Actual output :* > 0: jdbc:hive2://ha-cluster/default> CREATE DATAMAP bloom_dob ON TABLE > uniqdata USING 'bloomfilter' DMPROPERTIES ('INDEX_COLUMNS' = 'dob', > 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 1 in stage 199.0 failed 4 times, most recent failure: Lost task 1.3 in > stage 199.0 (TID 484, BLR125336, executor 182): > java.io.InvalidClassException: > scala.collection.convert.Wrappers$MutableSetWrapper; no valid constructor > at > java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:157) > at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:862) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2041) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431) > at java.util.ArrayList.readObject(ArrayList.java:797) > at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209) > at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285) > at
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207072493 --- Diff: docs/s3-guide.md --- @@ -0,0 +1,64 @@ + + +#S3 Guide (Alpha Feature 1.4.1) +S3 is an Object Storage API on cloud, it is recommended for storing large data files. You can use +this feature if you want to store data on Amazon cloud or Huawei cloud(OBS). +Since the data is stored on to cloud there are no restrictions on the size of data and the data can be accessed from anywhere at any time. +Carbondata can support any Object Storage that conforms to Amazon S3 API. --- End diff -- This sentence can be merged with the above sentence "You can use this feature if you want to store data " ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207071826 --- Diff: docs/configuration-parameters.md --- @@ -106,7 +106,10 @@ This section provides the details of all the configurations required for CarbonD |-|--|-| | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed buffer size is 10485760 byte. | | carbon.lock.type | LOCALLOCK | This configuration specifies the type of lock to be acquired during concurrent operations on table. There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking. | -| carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to be created. Recommended to configure zookeeper lock type or configure HDFS lock path(to this property) in case of S3 file system as locking is not feasible on S3. +| carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to --- End diff -- add a brief description as to why locks are used in carbondata.what is TABLEPATH ? ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207073807 --- Diff: docs/s3-guide.md --- @@ -0,0 +1,64 @@ + + +#S3 Guide (Alpha Feature 1.4.1) +S3 is an Object Storage API on cloud, it is recommended for storing large data files. You can use +this feature if you want to store data on Amazon cloud or Huawei cloud(OBS). +Since the data is stored on to cloud there are no restrictions on the size of data and the data can be accessed from anywhere at any time. +Carbondata can support any Object Storage that conforms to Amazon S3 API. + +#Writing to Object Storage +To store carbondata files on to Object Store location, you need to set `carbon +.storelocation` property to Object Store path in CarbonProperties file. For example, carbon +.storelocation=s3a://mybucket/carbonstore. By setting this property, all the tables will be created on the specified Object Store path. + +If your existing store is HDFS, and you want to store specific tables on S3 location, then `location` parameter has to be set during create --- End diff -- If you don't wish to change the existing store location and would wish to store only specific tables onto S3,it can be done by setting the 'location' option parameter in the create table ddl command ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207072154 --- Diff: docs/data-management-on-carbondata.md --- @@ -730,6 +736,8 @@ Users can specify which columns to include and exclude for local dictionary gene * If the IGNORE option is used, then bad records are neither loaded nor written to the separate CSV file. * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails. * The maximum number of characters per column is 32000. If there are more than 32000 characters in a column, data loading will fail. + * Since Bad Records Path can be specified in both create, load and carbon properties. --- End diff -- entire sentence to be reformed. not a grammatically correct statement ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207071907 --- Diff: docs/configuration-parameters.md --- @@ -106,7 +106,10 @@ This section provides the details of all the configurations required for CarbonD |-|--|-| | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed buffer size is 10485760 byte. | | carbon.lock.type | LOCALLOCK | This configuration specifies the type of lock to be acquired during concurrent operations on table. There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking. | -| carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to be created. Recommended to configure zookeeper lock type or configure HDFS lock path(to this property) in case of S3 file system as locking is not feasible on S3. +| carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to +be created. Recommended to configure HDFS lock path(to this property) in case of S3 file system +as locking is not feasible on S3. +**Note:** If this property is not set to HDFS location for S3 store, then there is a possibility of data corruption. --- End diff -- can add a brief sentence as to why corruption might happen ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207074600 --- Diff: docs/s3-guide.md --- @@ -0,0 +1,64 @@ + + +#S3 Guide (Alpha Feature 1.4.1) +S3 is an Object Storage API on cloud, it is recommended for storing large data files. You can use +this feature if you want to store data on Amazon cloud or Huawei cloud(OBS). +Since the data is stored on to cloud there are no restrictions on the size of data and the data can be accessed from anywhere at any time. +Carbondata can support any Object Storage that conforms to Amazon S3 API. + +#Writing to Object Storage +To store carbondata files on to Object Store location, you need to set `carbon +.storelocation` property to Object Store path in CarbonProperties file. For example, carbon +.storelocation=s3a://mybucket/carbonstore. By setting this property, all the tables will be created on the specified Object Store path. + +If your existing store is HDFS, and you want to store specific tables on S3 location, then `location` parameter has to be set during create +table. +For example: + +``` +CREATE TABLE IF NOT EXISTS db1.table1(col1 string, col2 int) STORED AS carbondata LOCATION 's3a://mybucket/carbonstore' +``` + +For more details on create table, Refer [data-management-on-carbondata](https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#create-table) + +#Authentication +You need to set authentication properties to store the carbondata files on to S3 location. For +more details on authentication properties, refer +[hadoop authentication document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties) + +Another way of setting the authentication parameters is as follows: + +``` + SparkSession + .builder() + .master(masterURL) + .appName("S3Example") + .config("spark.driver.host", "localhost") + .config("spark.hadoop.fs.s3a.access.key", "") + .config("spark.hadoop.fs.s3a.secret.key", "") + .config("spark.hadoop.fs.s3a.endpoint", "1.1.1.1") + .getOrCreateCarbonSession() +``` + +#Recommendations +1. Object Storage like S3 does not support file leasing mechanism(supported by HDFS) that is +required to take locks which ensure consistency between concurrent operations therefore, it is +recommended to set the configurable lock path property([carbon.lock.path](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md#miscellaneous-configuration)) + to a HDFS directory. +2. As Object Storage are eventual consistent meaning that any put request can take some time to --- End diff -- Concurrent data manipulation operations are not supported. object stores follow eventual consistency semantics,ie.,any put request might take some time to reflect when trying to list.This behaviour causes not to ensure the data read is always consistent or latest. ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r207070998 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. - - By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + **The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled. + + **NOTE:** Following Data Types are not Supported for Local Dictionary: + * SMALLINT + * INTEGER + * BIGINT + * DOUBLE + * DECIMAL + * TIMESTAMP + * DATE + * CHAR + * BOOLEAN + + By default, Local Dictionary will be disabled. Users will be able to pass following properties in create table command: | Properties | Default value | Description | | -- | - | --- | - | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will not be enabled for the table | - | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for local dictionary generation (range- 1000 to 10) | - | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. | + | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for local dictionary generation (maximum - 10) | + | LOCAL_DICTIONARY_INCLUDE | all string/varchar columns not specified in dictionary include| Columns for which Local Dictionary is generated. | | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated | **NOTE:** If the cardinality exceeds the threshold, this column will not use local dictionary encoding. And in this case, the data loading performance will decrease since there is a rollback procedure for local dictionary encoding. --- End diff -- fallback? ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r207069841 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: --- End diff -- Add a small sentence on what local dictionary means ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r207069983 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. --- End diff -- remove No-Dictionary ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r207071127 --- Diff: docs/data-management-on-carbondata.md --- @@ -170,6 +183,9 @@ This tutorial is going to introduce all commands and data operations on CarbonDa TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='true','LOCAL_DICTIONARY_THRESHOLD'='1000', 'LOCAL_DICTIONARY_INCLUDE'='column1','LOCAL_DICTIONARY_EXCLUDE'='column2') ``` + --- End diff -- sentence not easy to understand. need simpler language to explain the reason ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r207070762 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. - - By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + **The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled. + + **NOTE:** Following Data Types are not Supported for Local Dictionary: + * SMALLINT + * INTEGER + * BIGINT + * DOUBLE + * DECIMAL + * TIMESTAMP + * DATE + * CHAR + * BOOLEAN + + By default, Local Dictionary will be disabled. Users will be able to pass following properties in create table command: | Properties | Default value | Description | | -- | - | --- | - | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will not be enabled for the table | - | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for local dictionary generation (range- 1000 to 10) | - | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. | + | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for local dictionary generation (maximum - 10) | --- End diff -- description not correct. need to explain what threshold means and what happens beyond threshold ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r207071343 --- Diff: docs/data-management-on-carbondata.md --- @@ -524,6 +540,9 @@ Users can specify which columns to include and exclude for local dictionary gene ``` ALTER TABLE tablename UNSET TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE','LOCAL_DICTIONARY_THRESHOLD','LOCAL_DICTIONARY_INCLUDE','LOCAL_DICTIONARY_EXCLUDE') ``` + + **NOTE:** For old tables, by default, local dictionary is disabled. If user wants local dictionary, user can enable/disable local dictionary for new data on those tables at their discretion. --- End diff -- local dictionary is disabled for new tables also.Need to mention it can be enabled for old tables also ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r207070525 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. - - By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + **The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled. --- End diff -- can add a sentence as to why it will increase ---
[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2590#discussion_r207070939 --- Diff: docs/data-management-on-carbondata.md --- @@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and data operations on CarbonDa - **Local Dictionary Configuration** - Local Dictionary is generated only for no-dictionary string/varchar datatype columns. It helps in: + Local Dictionary is generated only for string/varchar datatype columns which are not included in dictionary include. It helps in: 1. Getting more compression on dimension columns with less cardinality. 2. Filter queries and full scan queries on No-dictionary columns with local dictionary will be faster as filter will be done on encoded data. 3. Reducing the store size and memory footprint as only unique values will be stored as part of local dictionary and corresponding data will be stored as encoded data. - - By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar datatype columns. + + **The cost for Local Dictionary:** The memory size will increase when local dictionary is enabled. + + **NOTE:** Following Data Types are not Supported for Local Dictionary: + * SMALLINT + * INTEGER + * BIGINT + * DOUBLE + * DECIMAL + * TIMESTAMP + * DATE + * CHAR + * BOOLEAN + + By default, Local Dictionary will be disabled. Users will be able to pass following properties in create table command: | Properties | Default value | Description | | -- | - | --- | - | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will not be enabled for the table | - | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for local dictionary generation (range- 1000 to 10) | - | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for which Local Dictionary is generated. | + | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will be disabled for the table | + | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for local dictionary generation (maximum - 10) | + | LOCAL_DICTIONARY_INCLUDE | all string/varchar columns not specified in dictionary include| Columns for which Local Dictionary is generated. | --- End diff -- if i don't specify this property, what is the behaviour? ---
[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2524 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6107/ ---
[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] Refactor CarbonStore API
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2589 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6106/ ---
[GitHub] carbondata issue #2596: [CARBONDATA-2806] Fix clean carbondata files when ta...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2596 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6105/ ---
[GitHub] carbondata issue #2579: [HOTFIX][PR 2575] Fixed modular plan creation only i...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2579 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6104/ ---
[GitHub] carbondata issue #2595: [Documentation] [Unsafe Configuration] Added carbon....
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2595 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6103/ ---
[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2524 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6440/ ---
[GitHub] carbondata issue #2577: [CARBONDATA-2796][32K]Fix data loading problem when ...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2577 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6101/ ---
[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2524 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7714/ ---
[GitHub] carbondata issue #2596: [CARBONDATA-2806] Fix clean carbondata files when ta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2596 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6438/ ---
[GitHub] carbondata issue #2596: [CARBONDATA-2806] Fix clean carbondata files when ta...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2596 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7712/ ---
[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] Refactor CarbonStore API
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2589 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6439/ ---
[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] Refactor CarbonStore API
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2589 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7713/ ---
[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2590 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6100/ ---
[jira] [Commented] (CARBONDATA-2539) MV Dataset - Subqueries is not accessing the data from the MV datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565596#comment-16565596 ] Ravindra Pesala commented on CARBONDATA-2539: - After datamap creation you should rebuild datamap before accessing it.otherwise it will be disabled. > MV Dataset - Subqueries is not accessing the data from the MV datamap. > -- > > Key: CARBONDATA-2539 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2539 > Project: CarbonData > Issue Type: Bug > Components: data-query > Environment: 3 node opensource ANT cluster. >Reporter: Prasanna Ravichandran >Assignee: Ravindra Pesala >Priority: Minor > Fix For: 1.5.0, 1.4.1 > > Attachments: data.csv > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Inner subquery is not accessing the data from the MV datamap. It is accessing > the data from the main table. > Test queries - Spark shell: > scala> carbon.sql("drop table if exists origintable").show() > ++ > || > ++ > ++ > scala> carbon.sql("CREATE TABLE originTable (empno int, empname String, > designation String, doj Timestamp, workgroupcategory int, > workgroupcategoryname String, deptno int, deptname String, projectcode int, > projectjoindate Timestamp, projectenddate Timestamp,attendance int, > utilization int,salary int) STORED BY > 'org.apache.carbondata.format'").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '\"','timestampformat'='dd-MM-')").show(200,false) > ++ > || > ++ > ++ > > scala> carbon.sql("drop datamap datamap_subqry").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("create datamap datamap_subqry using 'mv' as select > min(salary) from originTable group by empno").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("explain SELECT max(empno) FROM originTable WHERE salary IN > (select min(salary) from originTable group by empno ) group by > empname").show(200,false) > ++ > |plan | >
[jira] [Commented] (CARBONDATA-2534) MV Dataset - MV creation is not working with the substring()
[ https://issues.apache.org/jira/browse/CARBONDATA-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565578#comment-16565578 ] Ravindra Pesala commented on CARBONDATA-2534: - After datamap creation you should rebuild datamap before accessing it.otherwise it will be disabled. > MV Dataset - MV creation is not working with the substring() > - > > Key: CARBONDATA-2534 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2534 > Project: CarbonData > Issue Type: Bug > Components: data-query > Environment: 3 node opensource ANT cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Labels: CarbonData, MV, Materialistic_Views > Fix For: 1.5.0, 1.4.1 > > Attachments: MV_substring.docx, data.csv > > Time Spent: 3h 10m > Remaining Estimate: 0h > > MV creation is not working with the sub string function. We are getting the > spark.sql.AnalysisException while trying to create a MV with the substring > and aggregate function. > *Spark -shell test queries:* > scala> carbon.sql("create datamap mv_substr using 'mv' as select > sum(salary),substring(empname,2,5),designation from originTable group by > substring(empname,2,5),designation").show(200,false) > *org.apache.spark.sql.AnalysisException: Cannot create a table having a > column whose name contains commas in Hive metastore. Table: > `default`.`mv_substr_table`; Column: substring_empname,_2,_5;* > *at* > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:150) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:148) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema(HiveExternalCatalog.scala:148) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:222) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > at > org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216) > at > org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:110) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:316) > at > org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:119) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) > at org.apache.spark.sql.Dataset.(Dataset.scala:183) > at > org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108) > at > org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97) > at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:155) > at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95) > at > org.apache.spark.sql.execution.command.table.CarbonCreateTableCommand.processMetadata(CarbonCreateTableCommand.scala:126) > at > org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:68) > at > org.apache.carbondata.mv.datamap.MVHelper$.createMVDataMap(MVHelper.scala:103) > at > org.apache.carbondata.mv.datamap.MVDataMapProvider.initMeta(MVDataMapProvider.scala:53) > at > org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:118) > at > org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:90) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) > at org.apache.spark.sql.Dataset.(Dataset.scala:183) > at > org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108) > at > org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97) > at
[GitHub] carbondata pull request #2596: [CARBONDATA-2806] Fix clean carbondata files ...
GitHub user ravipesala opened a pull request: https://github.com/apache/carbondata/pull/2596 [CARBONDATA-2806] Fix clean carbondata files when task has multiple carbondata files Problem: When task has multiple blocks and blocklets then clean files is not cleaning properly. Solution: SegmentFile read contains duplicate block paths so cleaning is aborting in between.So remove the duplicate block paths. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata flat-folder-delete-new Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2596.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2596 commit 6edadc82d488fc0e251a4c63b62e056aac3968ee Author: ravipesala Date: 2018-08-01T16:05:59Z Fix clean carbondata files when task has multiple carbondata files ---
[GitHub] carbondata issue #2579: [HOTFIX][PR 2575] Fixed modular plan creation only i...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2579 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6437/ ---
[GitHub] carbondata issue #2579: [HOTFIX][PR 2575] Fixed modular plan creation only i...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2579 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7711/ ---
[GitHub] carbondata issue #2552: [CARBONDATA-2781] Added fix for Null Pointer Excpeti...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2552 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6099/ ---
[GitHub] carbondata issue #2595: [Documentation] [Unsafe Configuration] Added carbon....
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2595 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6436/ ---
[GitHub] carbondata issue #2595: [Documentation] [Unsafe Configuration] Added carbon....
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2595 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7710/ ---
[GitHub] carbondata issue #2594: [CARBONDATA-2809][DataMap] Skip rebuilding for non-l...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2594 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6098/ ---
[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2579#discussion_r206914734 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala --- @@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends Rule[LogicalPlan] { } def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): Boolean = { -!plan.isInstanceOf[Command] && !isDataMapExists(plan, catalog.listAllSchema()) && -!plan.isInstanceOf[DeserializeToObject] +if (!plan.isInstanceOf[Command] && !plan.isInstanceOf[DeserializeToObject]) { + val catalogs = extractCatalogs(plan) + !isDataMapReplaced(catalog.listAllValidSchema(), catalogs) && + isDataMapExists(catalog.listAllValidSchema(), catalogs) +} else { + false +} + } /** * Check whether datamap table already updated in the query. * - * @param plan * @param mvs * @return */ - def isDataMapExists(plan: LogicalPlan, mvs: Array[SummaryDataset]): Boolean = { -val catalogs = plan collect { - case l: LogicalRelation => l.catalogTable -} -catalogs.isEmpty || catalogs.exists { c => + def isDataMapReplaced( + mvs: Array[SummaryDataset], + catalogs: Seq[Option[CatalogTable]]): Boolean = { +catalogs.exists { c => mvs.exists { mv => val identifier = mv.dataMapSchema.getRelationIdentifier identifier.getTableName.equals(c.get.identifier.table) && identifier.getDatabaseName.equals(c.get.database) } } } + + /** + * Check whether any suitable datamaps exists for this plan. + * + * @param mvs + * @return + */ + def isDataMapExists(mvs: Array[SummaryDataset], catalogs: Seq[Option[CatalogTable]]): Boolean = { +catalogs.exists { c => + mvs.exists { mv => +mv.dataMapSchema.getParentTables.asScala.exists { identifier => + identifier.getTableName.equals(c.get.identifier.table) && + identifier.getDatabaseName.equals(c.get.database) +} + } +} + } + + private def extractCatalogs(plan: LogicalPlan) = { --- End diff -- ok ---
[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2579#discussion_r206914703 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala --- @@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends Rule[LogicalPlan] { } def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): Boolean = { --- End diff -- ok ---
[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2579#discussion_r206913481 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala --- @@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends Rule[LogicalPlan] { } def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): Boolean = { -!plan.isInstanceOf[Command] && !isDataMapExists(plan, catalog.listAllSchema()) && -!plan.isInstanceOf[DeserializeToObject] +if (!plan.isInstanceOf[Command] && !plan.isInstanceOf[DeserializeToObject]) { + val catalogs = extractCatalogs(plan) + !isDataMapReplaced(catalog.listAllValidSchema(), catalogs) && + isDataMapExists(catalog.listAllValidSchema(), catalogs) +} else { + false +} + } /** * Check whether datamap table already updated in the query. * - * @param plan * @param mvs * @return */ - def isDataMapExists(plan: LogicalPlan, mvs: Array[SummaryDataset]): Boolean = { -val catalogs = plan collect { - case l: LogicalRelation => l.catalogTable -} -catalogs.isEmpty || catalogs.exists { c => + def isDataMapReplaced( + mvs: Array[SummaryDataset], + catalogs: Seq[Option[CatalogTable]]): Boolean = { +catalogs.exists { c => mvs.exists { mv => val identifier = mv.dataMapSchema.getRelationIdentifier identifier.getTableName.equals(c.get.identifier.table) && identifier.getDatabaseName.equals(c.get.database) } } } + + /** + * Check whether any suitable datamaps exists for this plan. --- End diff -- yes, initial match of parent table. Updated the comment ---
[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2579#discussion_r206912038 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala --- @@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends Rule[LogicalPlan] { } def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): Boolean = { -!plan.isInstanceOf[Command] && !isDataMapExists(plan, catalog.listAllSchema()) && -!plan.isInstanceOf[DeserializeToObject] +if (!plan.isInstanceOf[Command] && !plan.isInstanceOf[DeserializeToObject]) { + val catalogs = extractCatalogs(plan) + !isDataMapReplaced(catalog.listAllValidSchema(), catalogs) && + isDataMapExists(catalog.listAllValidSchema(), catalogs) +} else { + false +} + } /** * Check whether datamap table already updated in the query. * - * @param plan * @param mvs --- End diff -- ok ---
[GitHub] carbondata pull request #2577: [CARBONDATA-2796][32K]Fix data loading proble...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2577 ---
[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2579#discussion_r206905056 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala --- @@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends Rule[LogicalPlan] { } def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): Boolean = { --- End diff -- can you add comment to this func ---
[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2579#discussion_r206904688 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala --- @@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends Rule[LogicalPlan] { } def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): Boolean = { -!plan.isInstanceOf[Command] && !isDataMapExists(plan, catalog.listAllSchema()) && -!plan.isInstanceOf[DeserializeToObject] +if (!plan.isInstanceOf[Command] && !plan.isInstanceOf[DeserializeToObject]) { + val catalogs = extractCatalogs(plan) + !isDataMapReplaced(catalog.listAllValidSchema(), catalogs) && + isDataMapExists(catalog.listAllValidSchema(), catalogs) +} else { + false +} + } /** * Check whether datamap table already updated in the query. * - * @param plan * @param mvs * @return */ - def isDataMapExists(plan: LogicalPlan, mvs: Array[SummaryDataset]): Boolean = { -val catalogs = plan collect { - case l: LogicalRelation => l.catalogTable -} -catalogs.isEmpty || catalogs.exists { c => + def isDataMapReplaced( + mvs: Array[SummaryDataset], + catalogs: Seq[Option[CatalogTable]]): Boolean = { +catalogs.exists { c => mvs.exists { mv => val identifier = mv.dataMapSchema.getRelationIdentifier identifier.getTableName.equals(c.get.identifier.table) && identifier.getDatabaseName.equals(c.get.database) } } } + + /** + * Check whether any suitable datamaps exists for this plan. + * + * @param mvs + * @return + */ + def isDataMapExists(mvs: Array[SummaryDataset], catalogs: Seq[Option[CatalogTable]]): Boolean = { +catalogs.exists { c => + mvs.exists { mv => +mv.dataMapSchema.getParentTables.asScala.exists { identifier => + identifier.getTableName.equals(c.get.identifier.table) && + identifier.getDatabaseName.equals(c.get.database) +} + } +} + } + + private def extractCatalogs(plan: LogicalPlan) = { --- End diff -- please add return value type ---
[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2579#discussion_r206903988 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala --- @@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends Rule[LogicalPlan] { } def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): Boolean = { -!plan.isInstanceOf[Command] && !isDataMapExists(plan, catalog.listAllSchema()) && -!plan.isInstanceOf[DeserializeToObject] +if (!plan.isInstanceOf[Command] && !plan.isInstanceOf[DeserializeToObject]) { + val catalogs = extractCatalogs(plan) + !isDataMapReplaced(catalog.listAllValidSchema(), catalogs) && + isDataMapExists(catalog.listAllValidSchema(), catalogs) +} else { + false +} + } /** * Check whether datamap table already updated in the query. * - * @param plan * @param mvs * @return */ - def isDataMapExists(plan: LogicalPlan, mvs: Array[SummaryDataset]): Boolean = { -val catalogs = plan collect { - case l: LogicalRelation => l.catalogTable -} -catalogs.isEmpty || catalogs.exists { c => + def isDataMapReplaced( + mvs: Array[SummaryDataset], + catalogs: Seq[Option[CatalogTable]]): Boolean = { +catalogs.exists { c => mvs.exists { mv => val identifier = mv.dataMapSchema.getRelationIdentifier identifier.getTableName.equals(c.get.identifier.table) && identifier.getDatabaseName.equals(c.get.database) } } } + + /** + * Check whether any suitable datamaps exists for this plan. --- End diff -- suitable means matched plan? ---
[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2579#discussion_r206903577 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala --- @@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends Rule[LogicalPlan] { } def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): Boolean = { -!plan.isInstanceOf[Command] && !isDataMapExists(plan, catalog.listAllSchema()) && -!plan.isInstanceOf[DeserializeToObject] +if (!plan.isInstanceOf[Command] && !plan.isInstanceOf[DeserializeToObject]) { + val catalogs = extractCatalogs(plan) + !isDataMapReplaced(catalog.listAllValidSchema(), catalogs) && + isDataMapExists(catalog.listAllValidSchema(), catalogs) +} else { + false +} + } /** * Check whether datamap table already updated in the query. * - * @param plan * @param mvs --- End diff -- can you provide the comment for parameter and return value ---
[GitHub] carbondata issue #2577: [CARBONDATA-2796][32K]Fix data loading problem when ...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2577 LGTM ---
[GitHub] carbondata pull request #2587: [CARBONDATA-2806] Delete delete delta files u...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2587 ---
[GitHub] carbondata pull request #2595: [Documentation] [Unsafe Configuration] Added ...
GitHub user manishgupta88 opened a pull request: https://github.com/apache/carbondata/pull/2595 [Documentation] [Unsafe Configuration] Added carbon.unsafe.driver.working.memory.in.mb parameter to differentiate between driver and executor unsafe memory Added carbon.unsafe.driver.working.memory.in.mb parameter to differentiate between driver and executor unsafe memory - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? Yes. Updated - [ ] Testing done Verified manually - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/carbondata unsafe_driver_property Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2595.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2595 commit 8196be0a50d9d0f38452cfdf5fbe0b0485973e87 Author: manishgupta88 Date: 2018-08-01T14:08:30Z Added carbon.unsafe.driver.working.memory.in.mb parameter to differentiate between driver and executor unsafe memory ---
[GitHub] carbondata issue #2587: [CARBONDATA-2806] Delete delete delta files upon cle...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2587 LGTM ---
[GitHub] carbondata pull request #2581: [CARBONDATA-2800][Doc] Add useful tips about ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2581 ---
[GitHub] carbondata issue #2581: [CARBONDATA-2800][Doc] Add useful tips about bloomfi...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2581 LGTM ---
[GitHub] carbondata pull request #2572: [CARBONDATA-2793][32k][Doc] Add 32k support i...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2572 ---