[GitHub] carbondata issue #2571: [CARBONDATA-2792][schema restructure] Create externa...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2571
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7721/



---


[GitHub] carbondata issue #2597: [CARBONDATA-2802][BloomDataMap] Remove clearing cach...

2018-08-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2597
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6111/



---


[GitHub] carbondata issue #2598: [CARBONDATA-2811][BloomDataMap] Add query test case ...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2598
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6446/



---


[GitHub] carbondata issue #2598: [CARBONDATA-2811][BloomDataMap] Add query test case ...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2598
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7720/



---


[GitHub] carbondata pull request #2583: [CARBONDATA-2803]fix wrong datasize calculati...

2018-08-01 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2583#discussion_r207103282
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -2651,8 +2651,17 @@ public static int isFilterPresent(byte[][] 
filterValues,
   carbonIndexSize = getCarbonIndexSize(fileStore, locationMap);
   for (Map.Entry> entry : 
indexFilesMap.entrySet()) {
 // get the size of carbondata files
+String tempBlockFilePath = null;
 for (String blockFile : entry.getValue()) {
-  carbonDataSize += FileFactory.getCarbonFile(blockFile).getSize();
+  // the indexFileMap contains all the blocklets and index file 
mapping. For example, if one
+  // block contains 3 blocklets, then entry.getValue() will list 
all the blocklets of all
+  // the block present in it. Since all the three blocklets will 
have the same block path,
+  // so just get the size of one block path for exact data size 
and avoid wrong datasize
+  // calculation.
+  if (!blockFile.equals(tempBlockFilePath)) {
--- End diff --

ok, i will check and remove this change


---


[GitHub] carbondata issue #2598: [CARBONDATA-2811][BloomDataMap] Add query test case ...

2018-08-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2598
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6110/



---


[GitHub] carbondata issue #2595: [Documentation] [Unsafe Configuration] Added carbon....

2018-08-01 Thread manishgupta88
Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/2595
  
@xuchuanyin Usually in production scenarios driver memory will be less 
than the executor memory. Now we are using unsafe for caching block/blocklet 
dataMap in driver.  Current unsafe memory configured fo executor is getting 
used for driver also which is not a good idea.
Therefore it is required to separate out driver and executor unsafe memory.
You can observe the same in spark configuration also that spark has given 
different parameters for configuring driver and executor memory overhead to 
control the unsafe memory usage.
spark.yarn.driver.memoryOverhead and spark.yarn.executor.memoryOverhead


---


[GitHub] carbondata issue #2599: [CARBONDATA-2812] Implement freeMemory for complex p...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2599
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6444/



---


[GitHub] carbondata pull request #2593: [CARBONDATA-2753][Compatibility] Merge Index ...

2018-08-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2593


---


[GitHub] carbondata issue #2598: [CARBONDATA-2811][BloomDataMap] Add query test case ...

2018-08-01 Thread kevinjmh
Github user kevinjmh commented on the issue:

https://github.com/apache/carbondata/pull/2598
  
retest this please


---


[GitHub] carbondata issue #2593: [CARBONDATA-2753][Compatibility] Merge Index file no...

2018-08-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2593
  
LGTM


---


[GitHub] carbondata pull request #2583: [CARBONDATA-2803]fix wrong datasize calculati...

2018-08-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2583#discussion_r207097923
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -2651,8 +2651,17 @@ public static int isFilterPresent(byte[][] 
filterValues,
   carbonIndexSize = getCarbonIndexSize(fileStore, locationMap);
   for (Map.Entry> entry : 
indexFilesMap.entrySet()) {
 // get the size of carbondata files
+String tempBlockFilePath = null;
 for (String blockFile : entry.getValue()) {
-  carbonDataSize += FileFactory.getCarbonFile(blockFile).getSize();
+  // the indexFileMap contains all the blocklets and index file 
mapping. For example, if one
+  // block contains 3 blocklets, then entry.getValue() will list 
all the blocklets of all
+  // the block present in it. Since all the three blocklets will 
have the same block path,
+  // so just get the size of one block path for exact data size 
and avoid wrong datasize
+  // calculation.
+  if (!blockFile.equals(tempBlockFilePath)) {
--- End diff --

I feel this fix is not required, Please check PR 
https://github.com/apache/carbondata/pull/2596 to avoid duplicates from 
indexfileMap


---


[GitHub] carbondata issue #2599: [CARBONDATA-2812] Implement freeMemory for complex p...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2599
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7718/



---


[GitHub] carbondata issue #2598: [CARBONDATA-2811][BloomDataMap] Add query test case ...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2598
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7719/



---


[GitHub] carbondata issue #2597: [CARBONDATA-2802][BloomDataMap] Remove clearing cach...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2597
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6443/



---


[GitHub] carbondata issue #2597: [CARBONDATA-2802][BloomDataMap] Remove clearing cach...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2597
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7717/



---


[GitHub] carbondata issue #2599: [CARBONDATA-2812] Implement freeMemory for complex p...

2018-08-01 Thread dhatchayani
Github user dhatchayani commented on the issue:

https://github.com/apache/carbondata/pull/2599
  
Retest sdv please


---


[GitHub] carbondata issue #2599: [CARBONDATA-2812] Implement freeMemory for complex p...

2018-08-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2599
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6109/



---


[GitHub] carbondata issue #2597: [CARBONDATA-2802][BloomDataMap] Remove clearing cach...

2018-08-01 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2597
  
This modification is not a final complete fix for ISSUE2802


---


[GitHub] carbondata issue #2598: [CARBONDATA-2811][BloomDataMap] Add query test case ...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2598
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6445/



---


[jira] [Resolved] (CARBONDATA-2478) Add datamap-developer-guide.md file in readme

2018-08-01 Thread Liang Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Chen resolved CARBONDATA-2478.

   Resolution: Fixed
Fix Version/s: 1.4.1
   1.5.0

> Add datamap-developer-guide.md file in readme
> -
>
> Key: CARBONDATA-2478
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2478
> Project: CarbonData
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 1.4.0
>Reporter: Vandana Yadav
>Assignee: Vandana Yadav
>Priority: Trivial
> Fix For: 1.5.0, 1.4.1
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Add datamap-developer-guide.md file in readme
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-2478) Add datamap-developer-guide.md file in readme

2018-08-01 Thread Liang Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Chen reassigned CARBONDATA-2478:
--

Assignee: Vandana Yadav

> Add datamap-developer-guide.md file in readme
> -
>
> Key: CARBONDATA-2478
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2478
> Project: CarbonData
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 1.4.0
>Reporter: Vandana Yadav
>Assignee: Vandana Yadav
>Priority: Trivial
> Fix For: 1.5.0, 1.4.1
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Add datamap-developer-guide.md file in readme
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2579: [HOTFIX][PR 2575] Fixed modular plan creation only i...

2018-08-01 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2579
  
LGTM


---


[GitHub] carbondata pull request #2305: [CARBONDATA-2478] Added datamap-developer-gui...

2018-08-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2305


---


[GitHub] carbondata pull request #2599: [CARBONDATA-2812] Implement freeMemory for co...

2018-08-01 Thread dhatchayani
GitHub user dhatchayani opened a pull request:

https://github.com/apache/carbondata/pull/2599

[CARBONDATA-2812] Implement freeMemory for complex pages

**Problem:**
The memory used by the ColumnPageWrapper (for complex data types) is not 
cleared and so it requires more memory to Load and Query.

**Solution:**
Clear the used memory in the freeMemory method.

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [x] Testing done
Manual Testing
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dhatchayani/carbondata CARBONDATA-2812

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2599.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2599


commit 28a9876274e97e71f7d65dcfbfce23fc173b6727
Author: dhatchayani 
Date:   2018-08-02T03:00:32Z

[CARBONDATA-2812] Implement freeMemory for complex pages




---


[jira] [Resolved] (CARBONDATA-2800) Add useful tips for bloomfilter datamap

2018-08-01 Thread xuchuanyin (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-2800.

   Resolution: Fixed
Fix Version/s: 1.4.1

> Add useful tips for bloomfilter datamap
> ---
>
> Key: CARBONDATA-2800
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2800
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xuchuanyin
>Assignee: xuchuanyin
>Priority: Major
> Fix For: 1.4.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2812) Implement freeMemory for complex pages

2018-08-01 Thread dhatchayani (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhatchayani updated CARBONDATA-2812:

Summary: Implement freeMemory for complex pages   (was: Implement free 
memory for complex pages )

> Implement freeMemory for complex pages 
> ---
>
> Key: CARBONDATA-2812
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2812
> Project: CarbonData
>  Issue Type: Bug
>Reporter: dhatchayani
>Assignee: dhatchayani
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2793) Add document for 32k feature

2018-08-01 Thread xuchuanyin (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin resolved CARBONDATA-2793.

   Resolution: Fixed
Fix Version/s: 1.4.1

> Add document for 32k feature
> 
>
> Key: CARBONDATA-2793
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2793
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xuchuanyin
>Assignee: xuchuanyin
>Priority: Major
> Fix For: 1.4.1
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-2800) Add useful tips for bloomfilter datamap

2018-08-01 Thread xuchuanyin (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuchuanyin reassigned CARBONDATA-2800:
--

Assignee: xuchuanyin

> Add useful tips for bloomfilter datamap
> ---
>
> Key: CARBONDATA-2800
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2800
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xuchuanyin
>Assignee: xuchuanyin
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2598: [CARBONDATA-2811][BloomDataMap] Add query tes...

2018-08-01 Thread kevinjmh
GitHub user kevinjmh opened a pull request:

https://github.com/apache/carbondata/pull/2598

[CARBONDATA-2811][BloomDataMap] Add query test case using search mode on 
table with bloom filter

Add query test case using search mode on table with bloom filter

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinjmh/carbondata bloom_searchmode

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2598.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2598


commit 2d6c9b9aeb306607305fbaf66a885e6cecc5676b
Author: Manhua 
Date:   2018-08-02T02:21:16Z

add search mode test case with bloom




---


[GitHub] carbondata pull request #2597: [CARBONDATA-2802][BloomDataMap] Remove cleari...

2018-08-01 Thread xuchuanyin
GitHub user xuchuanyin opened a pull request:

https://github.com/apache/carbondata/pull/2597

[CARBONDATA-2802][BloomDataMap] Remove clearing cache after rebuiding index 
datamap

This is no need to clear cache after rebuilding index datamap due to the
following reasons:
1. currently it will clear all the caches for all index datamaps, not
only for the current rebuilding one
2. the life cycle of table data and index datamap data is the same,
there is no need to clear it. (once the index datamap is created or
once the main table is loaded, data of the datamap will be generated too
-- in both scenarios, data of the datamap is up to date with the main
table.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [x] Any interfaces changed?
 `NO`
 - [x] Any backward compatibility impacted?
 `NO`
 - [x] Document update required?
`NO`
 - [x] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
`NO`
- How it is tested? Please attach test report.
`Tested in local machine`
- Is it a performance related change? Please attach the performance 
test report.
`NO`
- Any additional information to help reviewers in testing this 
change.
`NA`
   
 - [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
`NA`



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuchuanyin/carbondata 
0802_remove_clear_dm_4_index_dm

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2597.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2597


commit f3c816745c8f7a829384d3241934196b1bb391ee
Author: xuchuanyin 
Date:   2018-08-02T02:45:17Z

Remove clearing cache after rebuiding index datamap

This is no need to clear cache after rebuilding index datamap due to the
following reasons:
1. currently it will clear all the caches for all index datamaps, not
only for the current rebuilding one
2. the life cycle of table data and index datamap data is the same,
there is no need to clear it. (once the index datamap is created or
once the main table is loaded, data of the datamap will be generated too
-- in both scenarios, data of the datamap is up to date with the main
table.




---


[jira] [Created] (CARBONDATA-2812) Implement free memory for complex pages

2018-08-01 Thread dhatchayani (JIRA)
dhatchayani created CARBONDATA-2812:
---

 Summary: Implement free memory for complex pages 
 Key: CARBONDATA-2812
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2812
 Project: CarbonData
  Issue Type: Bug
Reporter: dhatchayani
Assignee: dhatchayani






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2811) Add query test case using search mode on table with bloom filter

2018-08-01 Thread jiangmanhua (JIRA)
jiangmanhua created CARBONDATA-2811:
---

 Summary: Add query test case using search mode on table with bloom 
filter
 Key: CARBONDATA-2811
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2811
 Project: CarbonData
  Issue Type: Sub-task
Reporter: jiangmanhua
Assignee: jiangmanhua






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2594: [CARBONDATA-2809][DataMap] Skip rebuilding for non-l...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2594
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6441/



---


[GitHub] carbondata issue #2594: [CARBONDATA-2809][DataMap] Skip rebuilding for non-l...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2594
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7715/



---


[GitHub] carbondata issue #2594: [CARBONDATA-2809][DataMap] Skip rebuilding for non-l...

2018-08-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2594
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6108/



---


[jira] [Resolved] (CARBONDATA-2783) Update document of bloom filter datamap

2018-08-01 Thread jiangmanhua (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiangmanhua resolved CARBONDATA-2783.
-
   Resolution: Fixed
Fix Version/s: 1.4.1

> Update document of bloom filter datamap
> ---
>
> Key: CARBONDATA-2783
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2783
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: jiangmanhua
>Assignee: jiangmanhua
>Priority: Major
> Fix For: 1.4.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-2783) Update document of bloom filter datamap

2018-08-01 Thread jiangmanhua (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiangmanhua reassigned CARBONDATA-2783:
---

Assignee: jiangmanhua

> Update document of bloom filter datamap
> ---
>
> Key: CARBONDATA-2783
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2783
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: jiangmanhua
>Assignee: jiangmanhua
>Priority: Major
> Fix For: 1.4.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-08-01 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207084686
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
- **Local Dictionary Configuration**
  
- Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype 
columns which are not included in dictionary include. It helps in:
  1. Getting more compression on dimension columns with less 
cardinality.
  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
  3. Reducing the store size and memory footprint as only unique values 
will be stored as part of local dictionary and corresponding data will be 
stored as encoded data.
-   
- By default, Local Dictionary will be enabled and generated for all 
no-dictionary string/varchar datatype columns.
+ 
+ **The cost for Local Dictionary:** The memory size will increase when 
local dictionary is enabled.
+ 
+ **NOTE:** Following Data Types are not Supported for Local Dictionary: 
+  * SMALLINT
+  * INTEGER
+  * BIGINT
+  * DOUBLE
+  * DECIMAL
+  * TIMESTAMP
+  * DATE
+  * CHAR
--- End diff --

Why is `CHAR` not supported? As I know, SparkSQL treat both varchar and 
char as string, so in carbon data we actually see string.


---


[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-08-01 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207085356
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
- **Local Dictionary Configuration**
  
- Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype 
columns which are not included in dictionary include. It helps in:
  1. Getting more compression on dimension columns with less 
cardinality.
  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
  3. Reducing the store size and memory footprint as only unique values 
will be stored as part of local dictionary and corresponding data will be 
stored as encoded data.
-   
- By default, Local Dictionary will be enabled and generated for all 
no-dictionary string/varchar datatype columns.
+ 
+ **The cost for Local Dictionary:** The memory size will increase when 
local dictionary is enabled.
+ 
+ **NOTE:** Following Data Types are not Supported for Local Dictionary: 
+  * SMALLINT
+  * INTEGER
+  * BIGINT
+  * DOUBLE
+  * DECIMAL
+  * TIMESTAMP
+  * DATE
+  * CHAR
+  * BOOLEAN
+  
+ By default, Local Dictionary will be disabled.
   
  Users will be able to pass following properties in create table 
command: 
   
  | Properties | Default value | Description |
  | -- | - | --- |
- | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will 
not be enabled for the table | 
- | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for 
local dictionary generation (range- 1000 to 10) |
- | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns 
| Columns for which Local Dictionary is generated. |
+ | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will 
be disabled for the table |
+ | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for 
local dictionary generation (maximum - 10) |
+ | LOCAL_DICTIONARY_INCLUDE | all string/varchar columns not specified 
in dictionary include| Columns for which Local Dictionary is generated. |
  | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local 
Dictionary is not generated |
 
   **NOTE:**  If the cardinality exceeds the threshold, this column 
will not use local dictionary encoding. And in this case, the data loading 
performance will decrease since there is a rollback procedure for local 
dictionary encoding.
--- End diff --

For line 149 (162):
```
Encoded data and Actual data are both stored when Local Dictionary is 
enabled.
```
please change it to:
```
Encoded data with & without Local dictionary are both stored when Local 
Dictionary is enabled during data loading, so it requires more memory than 
before.
```


---


[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-08-01 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207084952
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
- **Local Dictionary Configuration**
  
- Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype 
columns which are not included in dictionary include. It helps in:
  1. Getting more compression on dimension columns with less 
cardinality.
  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
  3. Reducing the store size and memory footprint as only unique values 
will be stored as part of local dictionary and corresponding data will be 
stored as encoded data.
-   
- By default, Local Dictionary will be enabled and generated for all 
no-dictionary string/varchar datatype columns.
+ 
+ **The cost for Local Dictionary:** The memory size will increase when 
local dictionary is enabled.
+ 
+ **NOTE:** Following Data Types are not Supported for Local Dictionary: 
+  * SMALLINT
+  * INTEGER
+  * BIGINT
+  * DOUBLE
+  * DECIMAL
+  * TIMESTAMP
+  * DATE
+  * CHAR
--- End diff --

what about complex? you didn't mention it


---


[GitHub] carbondata pull request #2592: [WIP]Updated & enhanced Documentation of Carb...

2018-08-01 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2592#discussion_r207084144
  
--- Diff: docs/useful-tips-on-carbondata.md ---
@@ -30,16 +30,16 @@
 
   - **Table Column Description**
 
-  | Column Name | Data Type | Cardinality | Attribution |
-  |-|---|-|-|
-  | msisdn  | String| 30 million  | Dimension   |
-  | BEGIN_TIME  | BigInt| 10 Thousand | Dimension   |
-  | HOST| String| 1 million   | Dimension   |
-  | Dime_1  | String| 1 Thousand  | Dimension   |
-  | counter_1   | Decimal   | NA  | Measure |
-  | counter_2   | Numeric(20,0) | NA  | Measure |
-  | ... | ...   | NA  | Measure |
-  | counter_100 | Decimal   | NA  | Measure |
+| Column Name | Data Type | Cardinality | Attribution |
+|-|---|-|-|
+| msisdn  | String| 30 million  | Dimension   |
+| BEGIN_TIME  | BigInt| 10 Thousand | Dimension   |
+| HOST| String| 1 million   | Dimension   |
+| Dime_1  | String| 1 Thousand  | Dimension   |
+| counter_1   | Decimal   | NA  | Measure |
+| counter_2   | Numeric(20,0) | NA  | Measure |
+| ... | ...   | NA  | Measure |
+| counter_100 | Decimal   | NA  | Measure |
 
 
   - **Put the frequently-used column filter in the beginning**
--- End diff --

For the following section, it is the similar with it.


---


[GitHub] carbondata pull request #2592: [WIP]Updated & enhanced Documentation of Carb...

2018-08-01 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2592#discussion_r207083747
  
--- Diff: docs/useful-tips-on-carbondata.md ---
@@ -158,18 +156,18 @@
   Recently we did some performance POC on CarbonData for Finance and 
telecommunication Field. It involved detailed queries and aggregation
   scenarios. After the completion of POC, some of the configurations 
impacting the performance have been identified and tabulated below :
 
-  | Parameter | Location | Used For  | Description | Tuning |
-  
|--|---|---|--||
-  | carbon.sort.intermediate.files.limit | 
spark/carbonlib/carbon.properties | Data loading | During the loading of data, 
local temp is used to sort the data. This number specifies the minimum number 
of intermediate files after which the  merge sort has to be initiated. | 
Increasing the parameter to a higher value will improve the load performance. 
For example, when we increase the value from 20 to 100, it increases the data 
load performance from 35MB/S to more than 50MB/S. Higher values of this 
parameter consumes  more memory during the load. |
-  | carbon.number.of.cores.while.loading | 
spark/carbonlib/carbon.properties | Data loading | Specifies the number of 
cores used for data processing during data loading in CarbonData. | If you have 
more number of CPUs, then you can increase the number of CPUs, which will 
increase the performance. For example if we increase the value from 2 to 4 then 
the CSV reading performance can increase about 1 times |
-  | carbon.compaction.level.threshold | spark/carbonlib/carbon.properties 
| Data loading and Querying | For minor compaction, specifies the number of 
segments to be merged in stage 1 and number of compacted segments to be merged 
in stage 2. | Each CarbonData load will create one segment, if every load is 
small in size it will generate many small file over a period of time impacting 
the query performance. Configuring this parameter will merge the small segment 
to one big segment which will sort the data and improve the performance. For 
Example in one telecommunication scenario, the performance improves about 2 
times after minor compaction. |
-  | spark.sql.shuffle.partitions | spark/conf/spark-defaults.conf | 
Querying | The number of task started when spark shuffle. | The value can be 1 
to 2 times as much as the executor cores. In an aggregation scenario, reducing 
the number from 200 to 32 reduced the query time from 17 to 9 seconds. |
-  | spark.executor.instances/spark.executor.cores/spark.executor.memory | 
spark/conf/spark-defaults.conf | Querying | The number of executors, CPU cores, 
and memory used for CarbonData query. | In the bank scenario, we provide the 4 
CPUs cores and 15 GB for each executor which can get good performance. This 2 
value does not mean more the better. It needs to be configured properly in case 
of limited resources. For example, In the bank scenario, it has enough CPU 32 
cores each node but less memory 64 GB each node. So we cannot give more CPU but 
less memory. For example, when 4 cores and 12GB for each executor. It sometimes 
happens GC during the query which impact the query performance very much from 
the 3 second to more than 15 seconds. In this scenario need to increase the 
memory or decrease the CPU cores. |
-  | carbon.detail.batch.size | spark/carbonlib/carbon.properties | Data 
loading | The buffer size to store records, returned from the block scan. | In 
limit scenario this parameter is very important. For example your query limit 
is 1000. But if we set this value to 3000 that means we get 3000 records from 
scan but spark will only take 1000 rows. So the 2000 remaining are useless. In 
one Finance test case after we set it to 100, in the limit 1000 scenario the 
performance increase about 2 times in comparison to if we set this value to 
12000. |
-  | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data 
loading | Whether use YARN local directories for multi-table load disk 

[GitHub] carbondata pull request #2592: [WIP]Updated & enhanced Documentation of Carb...

2018-08-01 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2592#discussion_r207084014
  
--- Diff: docs/useful-tips-on-carbondata.md ---
@@ -30,16 +30,16 @@
 
   - **Table Column Description**
 
-  | Column Name | Data Type | Cardinality | Attribution |
-  |-|---|-|-|
-  | msisdn  | String| 30 million  | Dimension   |
-  | BEGIN_TIME  | BigInt| 10 Thousand | Dimension   |
-  | HOST| String| 1 million   | Dimension   |
-  | Dime_1  | String| 1 Thousand  | Dimension   |
-  | counter_1   | Decimal   | NA  | Measure |
-  | counter_2   | Numeric(20,0) | NA  | Measure |
-  | ... | ...   | NA  | Measure |
-  | counter_100 | Decimal   | NA  | Measure |
+| Column Name | Data Type | Cardinality | Attribution |
+|-|---|-|-|
+| msisdn  | String| 30 million  | Dimension   |
+| BEGIN_TIME  | BigInt| 10 Thousand | Dimension   |
+| HOST| String| 1 million   | Dimension   |
+| Dime_1  | String| 1 Thousand  | Dimension   |
+| counter_1   | Decimal   | NA  | Measure |
+| counter_2   | Numeric(20,0) | NA  | Measure |
+| ... | ...   | NA  | Measure |
+| counter_100 | Decimal   | NA  | Measure |
 
 
   - **Put the frequently-used column filter in the beginning**
--- End diff --

Since we have changed the default behavior of sort_columns, I think this 
section can be removed. OR we can change it to `Put the fequently-used column 
filter in the beginning of sort_columns`


---


[GitHub] carbondata issue #2595: [Documentation] [Unsafe Configuration] Added carbon....

2018-08-01 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2595
  
Why driver needs this unsafe memory?


---


[jira] [Comment Edited] (CARBONDATA-2802) Creation of Bloomfilter Datamap is failing after UID,compaction,pre-aggregate datamap creation

2018-08-01 Thread xuchuanyin (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566207#comment-16566207
 ] 

xuchuanyin edited comment on CARBONDATA-2802 at 8/2/18 1:32 AM:


The error is not due to BloomFilter datamap. We can reproduce it this way:
 1. create base table;
 2. load data;
 3. create index datamap
 4. create preagg
 5. query on preagg
 6. clear datamaps for table (this will cause the problem) . In test code ,we 
can call 
 ```
 val carbonTable = CarbonEnv.getCarbonTable("default", "table")(sparkSession)
 val tableIdentifier = carbonTable.getAbsoluteTableIdentifier
 DatamapStoreManager.getInstance().clearDataMaps(tableIdentifier)
 ```

If we skip step3 or step5, the result is OK


was (Author: xuchuanyin):
The error is not due to BloomFilter datamap. We can reproduce it this way:
1. create base table;
2. load data;
3. create index datamap
4. create preagg
5. query on preagg
6. clear datamaps for table (this will cause the problem) . In test code ,we 
can call 
```
val carbonTable = CarbonEnv.getCarbonTable("default", "table")(sparkSession)
val tableIdentifier = carbonTable.getAbsoluteTableIdentifier
DatamapStoreManager.getInstance().clearDataMaps(tableIdentifier)
```

If we skip step4 or step5, the result is OK

> Creation of Bloomfilter Datamap is failing after UID,compaction,pre-aggregate 
> datamap creation
> --
>
> Key: CARBONDATA-2802
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2802
> Project: CarbonData
>  Issue Type: Bug
>  Components: other
>Affects Versions: 1.4.1
> Environment: Spark 2.2
>Reporter: Rahul Singha
>Priority: Minor
>  Labels: bloom-filter
>
> *Steps :*
> 1.CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format';
> 2.LOAD DATA INPATH 'hdfs://hacluster/user/rahul/2000_UniqData.csv' into table 
> uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> 3.update uniqdata set (active_emui_version) = ('ACTIVE_EMUI_VERSION_1') 
> where cust_id = 9000;
> 4.delete from uniqdata where cust_id = 9000;
> 5.insert into uniqdata select 
> 9000,'CUST_NAME_0','ACTIVE_EMUI_VERSION_0','1970-01-01 
> 01:00:03.0','1970-01-01 
> 02:00:03.0',123372036854,-223372036854,12345678901.123400,22345678901.123400,1.12345674897976E10,
>  -1.12345674897976E10,1;
> 6.alter table uniqdata compact 'major';
> 7.create datamap uniqdata_agg on table uniqdata using 'preaggregate' as 
> select cust_name, avg(cust_id) from uniqdata group by cust_id, cust_name;
> 8.CREATE DATAMAP bloom_dob ON TABLE uniqdata USING 'bloomfilter' DMPROPERTIES 
> ('INDEX_COLUMNS' = 'dob', 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> *Actual output :*
> 0: jdbc:hive2://ha-cluster/default> CREATE DATAMAP bloom_dob ON TABLE 
> uniqdata USING 'bloomfilter' DMPROPERTIES ('INDEX_COLUMNS' = 'dob', 
> 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 1 in stage 199.0 failed 4 times, most recent failure: Lost task 1.3 in 
> stage 199.0 (TID 484, BLR125336, executor 182): 
> java.io.InvalidClassException: 
> scala.collection.convert.Wrappers$MutableSetWrapper; no valid constructor
>  at 
> java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:157)
>  at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:862)
>  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2041)
>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
>  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
>  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
>  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
>  at java.util.ArrayList.readObject(ArrayList.java:797)
>  at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
>  at 

[jira] [Comment Edited] (CARBONDATA-2802) Creation of Bloomfilter Datamap is failing after UID,compaction,pre-aggregate datamap creation

2018-08-01 Thread xuchuanyin (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566207#comment-16566207
 ] 

xuchuanyin edited comment on CARBONDATA-2802 at 8/2/18 1:09 AM:


The error is not due to BloomFilter datamap. We can reproduce it this way:
1. create base table;
2. load data;
3. create index datamap
4. create preagg
5. query on preagg
6. clear datamaps for table (this will cause the problem) . In test code ,we 
can call 
```
val carbonTable = CarbonEnv.getCarbonTable("default", "table")(sparkSession)
val tableIdentifier = carbonTable.getAbsoluteTableIdentifier
DatamapStoreManager.getInstance().clearDataMaps(tableIdentifier)
```

If we skip step4 or step5, the result is OK


was (Author: xuchuanyin):
The error is not due to BloomFilter datamap. We can reproduce it this way:
1. create base table;
2. load data;
3. create index datamap
4. create preagg
5. query on preagg
6. clear datamaps for table (this will cause the problem) . In test code ,we 
can call 
```
val carbonTable = CarbonEnv.getCarbonTable("default", "table")(sparkSession)
val tableIdentifier = carbonTable.getAbsoluteTableIdentifier
DatamapStoreManager.getInstance().clearDataMaps(tableIdentifier)
```

> Creation of Bloomfilter Datamap is failing after UID,compaction,pre-aggregate 
> datamap creation
> --
>
> Key: CARBONDATA-2802
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2802
> Project: CarbonData
>  Issue Type: Bug
>  Components: other
>Affects Versions: 1.4.1
> Environment: Spark 2.2
>Reporter: Rahul Singha
>Priority: Minor
>  Labels: bloom-filter
>
> *Steps :*
> 1.CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format';
> 2.LOAD DATA INPATH 'hdfs://hacluster/user/rahul/2000_UniqData.csv' into table 
> uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> 3.update uniqdata set (active_emui_version) = ('ACTIVE_EMUI_VERSION_1') 
> where cust_id = 9000;
> 4.delete from uniqdata where cust_id = 9000;
> 5.insert into uniqdata select 
> 9000,'CUST_NAME_0','ACTIVE_EMUI_VERSION_0','1970-01-01 
> 01:00:03.0','1970-01-01 
> 02:00:03.0',123372036854,-223372036854,12345678901.123400,22345678901.123400,1.12345674897976E10,
>  -1.12345674897976E10,1;
> 6.alter table uniqdata compact 'major';
> 7.create datamap uniqdata_agg on table uniqdata using 'preaggregate' as 
> select cust_name, avg(cust_id) from uniqdata group by cust_id, cust_name;
> 8.CREATE DATAMAP bloom_dob ON TABLE uniqdata USING 'bloomfilter' DMPROPERTIES 
> ('INDEX_COLUMNS' = 'dob', 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> *Actual output :*
> 0: jdbc:hive2://ha-cluster/default> CREATE DATAMAP bloom_dob ON TABLE 
> uniqdata USING 'bloomfilter' DMPROPERTIES ('INDEX_COLUMNS' = 'dob', 
> 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 1 in stage 199.0 failed 4 times, most recent failure: Lost task 1.3 in 
> stage 199.0 (TID 484, BLR125336, executor 182): 
> java.io.InvalidClassException: 
> scala.collection.convert.Wrappers$MutableSetWrapper; no valid constructor
>  at 
> java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:157)
>  at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:862)
>  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2041)
>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
>  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
>  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
>  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
>  at java.util.ArrayList.readObject(ArrayList.java:797)
>  at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
>  at 

[jira] [Commented] (CARBONDATA-2802) Creation of Bloomfilter Datamap is failing after UID,compaction,pre-aggregate datamap creation

2018-08-01 Thread xuchuanyin (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566207#comment-16566207
 ] 

xuchuanyin commented on CARBONDATA-2802:


The error is not due to BloomFilter datamap. We can reproduce it this way:
1. create base table;
2. load data;
3. create index datamap
4. create preagg
5. query on preagg
6. clear datamaps for table (this will cause the problem) . In test code ,we 
can call 
```
val carbonTable = CarbonEnv.getCarbonTable("default", "table")(sparkSession)
val tableIdentifier = carbonTable.getAbsoluteTableIdentifier
DatamapStoreManager.getInstance().clearDataMaps(tableIdentifier)
```

> Creation of Bloomfilter Datamap is failing after UID,compaction,pre-aggregate 
> datamap creation
> --
>
> Key: CARBONDATA-2802
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2802
> Project: CarbonData
>  Issue Type: Bug
>  Components: other
>Affects Versions: 1.4.1
> Environment: Spark 2.2
>Reporter: Rahul Singha
>Priority: Minor
>  Labels: bloom-filter
>
> *Steps :*
> 1.CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) STORED BY 'org.apache.carbondata.format';
> 2.LOAD DATA INPATH 'hdfs://hacluster/user/rahul/2000_UniqData.csv' into table 
> uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> 3.update uniqdata set (active_emui_version) = ('ACTIVE_EMUI_VERSION_1') 
> where cust_id = 9000;
> 4.delete from uniqdata where cust_id = 9000;
> 5.insert into uniqdata select 
> 9000,'CUST_NAME_0','ACTIVE_EMUI_VERSION_0','1970-01-01 
> 01:00:03.0','1970-01-01 
> 02:00:03.0',123372036854,-223372036854,12345678901.123400,22345678901.123400,1.12345674897976E10,
>  -1.12345674897976E10,1;
> 6.alter table uniqdata compact 'major';
> 7.create datamap uniqdata_agg on table uniqdata using 'preaggregate' as 
> select cust_name, avg(cust_id) from uniqdata group by cust_id, cust_name;
> 8.CREATE DATAMAP bloom_dob ON TABLE uniqdata USING 'bloomfilter' DMPROPERTIES 
> ('INDEX_COLUMNS' = 'dob', 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> *Actual output :*
> 0: jdbc:hive2://ha-cluster/default> CREATE DATAMAP bloom_dob ON TABLE 
> uniqdata USING 'bloomfilter' DMPROPERTIES ('INDEX_COLUMNS' = 'dob', 
> 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 1 in stage 199.0 failed 4 times, most recent failure: Lost task 1.3 in 
> stage 199.0 (TID 484, BLR125336, executor 182): 
> java.io.InvalidClassException: 
> scala.collection.convert.Wrappers$MutableSetWrapper; no valid constructor
>  at 
> java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:157)
>  at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:862)
>  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2041)
>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
>  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
>  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
>  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
>  at java.util.ArrayList.readObject(ArrayList.java:797)
>  at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
>  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
>  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
>  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
>  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
>  at 

[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207072493
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,64 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+S3 is an Object Storage API on cloud, it is recommended for storing large 
data files. You can use 
+this feature if you want to store data on Amazon cloud or Huawei 
cloud(OBS).
+Since the data is stored on to cloud there are no restrictions on the size 
of data and the data can be accessed from anywhere at any time.
+Carbondata can support any Object Storage that conforms to Amazon S3 API.
--- End diff --

This sentence can be merged with the above sentence "You can use this 
feature if you want to store data "


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207071826
  
--- Diff: docs/configuration-parameters.md ---
@@ -106,7 +106,10 @@ This section provides the details of all the 
configurations required for CarbonD
 
|-|--|-|
 | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used 
during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed 
buffer size is 10485760 byte. |
 | carbon.lock.type | LOCALLOCK | This configuration specifies the type of 
lock to be acquired during concurrent operations on table. There are following 
types of lock implementation: - LOCALLOCK: Lock is created on local file system 
as file. This lock is useful when only one spark driver (thrift server) runs on 
a machine and no other CarbonData spark application is launched concurrently. - 
HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when 
multiple CarbonData spark applications are launched and no ZooKeeper is running 
on cluster and HDFS supports file based locking. |
-| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to be created. Recommended to configure zookeeper lock 
type or configure HDFS lock path(to this property) in case of S3 file system as 
locking is not feasible on S3.
+| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to 
--- End diff --

add a brief description as to why locks are used in carbondata.what is 
TABLEPATH ?


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207073807
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,64 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+S3 is an Object Storage API on cloud, it is recommended for storing large 
data files. You can use 
+this feature if you want to store data on Amazon cloud or Huawei 
cloud(OBS).
+Since the data is stored on to cloud there are no restrictions on the size 
of data and the data can be accessed from anywhere at any time.
+Carbondata can support any Object Storage that conforms to Amazon S3 API.
+
+#Writing to Object Storage
+To store carbondata files on to Object Store location, you need to set 
`carbon
+.storelocation` property to Object Store path in CarbonProperties file. 
For example, carbon
+.storelocation=s3a://mybucket/carbonstore. By setting this property, all 
the tables will be created on the specified Object Store path.
+
+If your existing store is HDFS, and you want to store specific tables on 
S3 location, then `location` parameter has to be set during create 
--- End diff --

If you don't wish to change the existing store location and would wish to 
store only specific tables onto S3,it can be done by setting the 'location' 
option parameter in the create table ddl command


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207072154
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -730,6 +736,8 @@ Users can specify which columns to include and exclude 
for local dictionary gene
   * If the IGNORE option is used, then bad records are neither loaded nor 
written to the separate CSV file.
   * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION 
is invalid and the load operation fails.
   * The maximum number of characters per column is 32000. If there are 
more than 32000 characters in a column, data loading will fail.
+  * Since Bad Records Path can be specified in both create, load and 
carbon properties. 
--- End diff --

entire sentence to be reformed. not a grammatically correct statement


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207071907
  
--- Diff: docs/configuration-parameters.md ---
@@ -106,7 +106,10 @@ This section provides the details of all the 
configurations required for CarbonD
 
|-|--|-|
 | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used 
during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed 
buffer size is 10485760 byte. |
 | carbon.lock.type | LOCALLOCK | This configuration specifies the type of 
lock to be acquired during concurrent operations on table. There are following 
types of lock implementation: - LOCALLOCK: Lock is created on local file system 
as file. This lock is useful when only one spark driver (thrift server) runs on 
a machine and no other CarbonData spark application is launched concurrently. - 
HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when 
multiple CarbonData spark applications are launched and no ZooKeeper is running 
on cluster and HDFS supports file based locking. |
-| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to be created. Recommended to configure zookeeper lock 
type or configure HDFS lock path(to this property) in case of S3 file system as 
locking is not feasible on S3.
+| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to 
+be created. Recommended to configure HDFS lock path(to this property) in 
case of S3 file system 
+as locking is not feasible on S3. 
+**Note:** If this property is not set to HDFS location for S3 store, then 
there is a possibility of data corruption. 
--- End diff --

can add a brief sentence as to why corruption might happen


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207074600
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,64 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+S3 is an Object Storage API on cloud, it is recommended for storing large 
data files. You can use 
+this feature if you want to store data on Amazon cloud or Huawei 
cloud(OBS).
+Since the data is stored on to cloud there are no restrictions on the size 
of data and the data can be accessed from anywhere at any time.
+Carbondata can support any Object Storage that conforms to Amazon S3 API.
+
+#Writing to Object Storage
+To store carbondata files on to Object Store location, you need to set 
`carbon
+.storelocation` property to Object Store path in CarbonProperties file. 
For example, carbon
+.storelocation=s3a://mybucket/carbonstore. By setting this property, all 
the tables will be created on the specified Object Store path.
+
+If your existing store is HDFS, and you want to store specific tables on 
S3 location, then `location` parameter has to be set during create 
+table. 
+For example:
+
+```
+CREATE TABLE IF NOT EXISTS db1.table1(col1 string, col2 int) STORED AS 
carbondata LOCATION 's3a://mybucket/carbonstore'
+``` 
+
+For more details on create table, Refer 
[data-management-on-carbondata](https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#create-table)
+
+#Authentication
+You need to set authentication properties to store the carbondata files on 
to S3 location. For 
+more details on authentication properties, refer 
+[hadoop authentication 
document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties)
+
+Another way of setting the authentication parameters is as follows:
+
+```
+ SparkSession
+ .builder()
+ .master(masterURL)
+ .appName("S3Example")
+ .config("spark.driver.host", "localhost")
+ .config("spark.hadoop.fs.s3a.access.key", "")
+ .config("spark.hadoop.fs.s3a.secret.key", "")
+ .config("spark.hadoop.fs.s3a.endpoint", "1.1.1.1")
+ .getOrCreateCarbonSession()
+```
+
+#Recommendations
+1. Object Storage like S3 does not support file leasing 
mechanism(supported by HDFS) that is 
+required to take locks which ensure consistency between concurrent 
operations therefore, it is 
+recommended to set the configurable lock path 
property([carbon.lock.path](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md#miscellaneous-configuration))
+ to a HDFS directory.
+2. As Object Storage are eventual consistent meaning that any put request 
can take some time to 
--- End diff --

Concurrent data manipulation operations are not supported. object stores 
follow eventual consistency semantics,ie.,any put request might take some time 
to reflect when trying to list.This behaviour causes not to ensure the data 
read is always consistent or latest.


---


[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207070998
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
- **Local Dictionary Configuration**
  
- Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype 
columns which are not included in dictionary include. It helps in:
  1. Getting more compression on dimension columns with less 
cardinality.
  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
  3. Reducing the store size and memory footprint as only unique values 
will be stored as part of local dictionary and corresponding data will be 
stored as encoded data.
-   
- By default, Local Dictionary will be enabled and generated for all 
no-dictionary string/varchar datatype columns.
+ 
+ **The cost for Local Dictionary:** The memory size will increase when 
local dictionary is enabled.
+ 
+ **NOTE:** Following Data Types are not Supported for Local Dictionary: 
+  * SMALLINT
+  * INTEGER
+  * BIGINT
+  * DOUBLE
+  * DECIMAL
+  * TIMESTAMP
+  * DATE
+  * CHAR
+  * BOOLEAN
+  
+ By default, Local Dictionary will be disabled.
   
  Users will be able to pass following properties in create table 
command: 
   
  | Properties | Default value | Description |
  | -- | - | --- |
- | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will 
not be enabled for the table | 
- | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for 
local dictionary generation (range- 1000 to 10) |
- | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns 
| Columns for which Local Dictionary is generated. |
+ | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will 
be disabled for the table |
+ | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for 
local dictionary generation (maximum - 10) |
+ | LOCAL_DICTIONARY_INCLUDE | all string/varchar columns not specified 
in dictionary include| Columns for which Local Dictionary is generated. |
  | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local 
Dictionary is not generated |
 
   **NOTE:**  If the cardinality exceeds the threshold, this column 
will not use local dictionary encoding. And in this case, the data loading 
performance will decrease since there is a rollback procedure for local 
dictionary encoding.
--- End diff --

fallback?


---


[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207069841
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,20 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
- **Local Dictionary Configuration**
  
- Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype 
columns which are not included in dictionary include. It helps in:
--- End diff --

Add a small sentence on what local dictionary means


---


[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207069983
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
- **Local Dictionary Configuration**
  
- Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype 
columns which are not included in dictionary include. It helps in:
  1. Getting more compression on dimension columns with less 
cardinality.
  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
--- End diff --

remove No-Dictionary


---


[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207071127
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -170,6 +183,9 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
  
TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='true','LOCAL_DICTIONARY_THRESHOLD'='1000',
  
'LOCAL_DICTIONARY_INCLUDE'='column1','LOCAL_DICTIONARY_EXCLUDE'='column2')
```
+   
--- End diff --

sentence not easy to understand. need simpler language to explain the reason


---


[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207070762
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
- **Local Dictionary Configuration**
  
- Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype 
columns which are not included in dictionary include. It helps in:
  1. Getting more compression on dimension columns with less 
cardinality.
  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
  3. Reducing the store size and memory footprint as only unique values 
will be stored as part of local dictionary and corresponding data will be 
stored as encoded data.
-   
- By default, Local Dictionary will be enabled and generated for all 
no-dictionary string/varchar datatype columns.
+ 
+ **The cost for Local Dictionary:** The memory size will increase when 
local dictionary is enabled.
+ 
+ **NOTE:** Following Data Types are not Supported for Local Dictionary: 
+  * SMALLINT
+  * INTEGER
+  * BIGINT
+  * DOUBLE
+  * DECIMAL
+  * TIMESTAMP
+  * DATE
+  * CHAR
+  * BOOLEAN
+  
+ By default, Local Dictionary will be disabled.
   
  Users will be able to pass following properties in create table 
command: 
   
  | Properties | Default value | Description |
  | -- | - | --- |
- | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will 
not be enabled for the table | 
- | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for 
local dictionary generation (range- 1000 to 10) |
- | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns 
| Columns for which Local Dictionary is generated. |
+ | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will 
be disabled for the table |
+ | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for 
local dictionary generation (maximum - 10) |
--- End diff --

description not correct. need to explain what threshold means and what 
happens beyond threshold


---


[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207071343
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -524,6 +540,9 @@ Users can specify which columns to include and exclude 
for local dictionary gene
 ```
ALTER TABLE tablename UNSET 
TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE','LOCAL_DICTIONARY_THRESHOLD','LOCAL_DICTIONARY_INCLUDE','LOCAL_DICTIONARY_EXCLUDE')
 ```
+
+   **NOTE:** For old tables, by default, local dictionary is disabled. If 
user wants local dictionary, user can enable/disable local dictionary for new 
data on those tables at their discretion. 
--- End diff --

local dictionary is disabled for new tables also.Need to mention it can be 
enabled for old tables also


---


[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207070525
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
- **Local Dictionary Configuration**
  
- Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype 
columns which are not included in dictionary include. It helps in:
  1. Getting more compression on dimension columns with less 
cardinality.
  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
  3. Reducing the store size and memory footprint as only unique values 
will be stored as part of local dictionary and corresponding data will be 
stored as encoded data.
-   
- By default, Local Dictionary will be enabled and generated for all 
no-dictionary string/varchar datatype columns.
+ 
+ **The cost for Local Dictionary:** The memory size will increase when 
local dictionary is enabled.
--- End diff --

can add a sentence as to why it will increase


---


[GitHub] carbondata pull request #2590: [CARBONDATA-2750] Updated documentation on Lo...

2018-08-01 Thread sraghunandan
Github user sraghunandan commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2590#discussion_r207070939
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -126,20 +126,33 @@ This tutorial is going to introduce all commands and 
data operations on CarbonDa
 
- **Local Dictionary Configuration**
  
- Local Dictionary is generated only for no-dictionary string/varchar 
datatype columns. It helps in:
+ Local Dictionary is generated only for string/varchar datatype 
columns which are not included in dictionary include. It helps in:
  1. Getting more compression on dimension columns with less 
cardinality.
  2. Filter queries and full scan queries on No-dictionary columns with 
local dictionary will be faster as filter will be done on encoded data.
  3. Reducing the store size and memory footprint as only unique values 
will be stored as part of local dictionary and corresponding data will be 
stored as encoded data.
-   
- By default, Local Dictionary will be enabled and generated for all 
no-dictionary string/varchar datatype columns.
+ 
+ **The cost for Local Dictionary:** The memory size will increase when 
local dictionary is enabled.
+ 
+ **NOTE:** Following Data Types are not Supported for Local Dictionary: 
+  * SMALLINT
+  * INTEGER
+  * BIGINT
+  * DOUBLE
+  * DECIMAL
+  * TIMESTAMP
+  * DATE
+  * CHAR
+  * BOOLEAN
+  
+ By default, Local Dictionary will be disabled.
   
  Users will be able to pass following properties in create table 
command: 
   
  | Properties | Default value | Description |
  | -- | - | --- |
- | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will 
not be enabled for the table | 
- | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for 
local dictionary generation (range- 1000 to 10) |
- | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns 
| Columns for which Local Dictionary is generated. |
+ | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will 
be disabled for the table |
+ | LOCAL_DICTIONARY_THRESHOLD | 1 | The maximum cardinality for 
local dictionary generation (maximum - 10) |
+ | LOCAL_DICTIONARY_INCLUDE | all string/varchar columns not specified 
in dictionary include| Columns for which Local Dictionary is generated. |
--- End diff --

if i don't specify this property, what is the behaviour?


---


[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-08-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2524
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6107/



---


[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2589
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6106/



---


[GitHub] carbondata issue #2596: [CARBONDATA-2806] Fix clean carbondata files when ta...

2018-08-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2596
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6105/



---


[GitHub] carbondata issue #2579: [HOTFIX][PR 2575] Fixed modular plan creation only i...

2018-08-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2579
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6104/



---


[GitHub] carbondata issue #2595: [Documentation] [Unsafe Configuration] Added carbon....

2018-08-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2595
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6103/



---


[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2524
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6440/



---


[GitHub] carbondata issue #2577: [CARBONDATA-2796][32K]Fix data loading problem when ...

2018-08-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2577
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6101/



---


[GitHub] carbondata issue #2524: [CARBONDATA-2532][Integration] Carbon to support spa...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2524
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7714/



---


[GitHub] carbondata issue #2596: [CARBONDATA-2806] Fix clean carbondata files when ta...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2596
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6438/



---


[GitHub] carbondata issue #2596: [CARBONDATA-2806] Fix clean carbondata files when ta...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2596
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7712/



---


[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2589
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6439/



---


[GitHub] carbondata issue #2589: [WIP][CARBONSTORE] Refactor CarbonStore API

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2589
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7713/



---


[GitHub] carbondata issue #2590: [CARBONDATA-2750] Updated documentation on Local Dic...

2018-08-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2590
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6100/



---


[jira] [Commented] (CARBONDATA-2539) MV Dataset - Subqueries is not accessing the data from the MV datamap.

2018-08-01 Thread Ravindra Pesala (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565596#comment-16565596
 ] 

Ravindra Pesala commented on CARBONDATA-2539:
-

After datamap creation you should rebuild datamap before accessing it.otherwise 
it will be disabled.

> MV Dataset - Subqueries is not accessing the data from the MV datamap.
> --
>
> Key: CARBONDATA-2539
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2539
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node opensource ANT cluster.
>Reporter: Prasanna Ravichandran
>Assignee: Ravindra Pesala
>Priority: Minor
> Fix For: 1.5.0, 1.4.1
>
> Attachments: data.csv
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Inner subquery is not accessing the data from the MV datamap. It is accessing 
> the data from the main table.
> Test queries - Spark shell:
> scala> carbon.sql("drop table if exists origintable").show()
> ++
> ||
> ++
> ++
>  scala> carbon.sql("CREATE TABLE originTable (empno int, empname String, 
> designation String, doj Timestamp, workgroupcategory int, 
> workgroupcategoryname String, deptno int, deptname String, projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance int, 
> utilization int,salary int) STORED BY 
> 'org.apache.carbondata.format'").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show(200,false)
> ++
> ||
> ++
> ++
>  
> scala> carbon.sql("drop datamap datamap_subqry").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("create datamap datamap_subqry using 'mv' as select 
> min(salary) from originTable group by empno").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("explain SELECT max(empno) FROM originTable WHERE salary IN 
> (select min(salary) from originTable group by empno ) group by 
> empname").show(200,false)
> ++
> |plan |
> 

[jira] [Commented] (CARBONDATA-2534) MV Dataset - MV creation is not working with the substring()

2018-08-01 Thread Ravindra Pesala (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565578#comment-16565578
 ] 

Ravindra Pesala commented on CARBONDATA-2534:
-

After datamap creation you should rebuild datamap before accessing it.otherwise 
it will be disabled.

> MV Dataset - MV creation is not working with the substring() 
> -
>
> Key: CARBONDATA-2534
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2534
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node opensource ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: CarbonData, MV, Materialistic_Views
> Fix For: 1.5.0, 1.4.1
>
> Attachments: MV_substring.docx, data.csv
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> MV creation is not working with the sub string function. We are getting the 
> spark.sql.AnalysisException while trying to create a MV with the substring 
> and aggregate function. 
> *Spark -shell test queries:*
>  scala> carbon.sql("create datamap mv_substr using 'mv' as select 
> sum(salary),substring(empname,2,5),designation from originTable group by 
> substring(empname,2,5),designation").show(200,false)
> *org.apache.spark.sql.AnalysisException: Cannot create a table having a 
> column whose name contains commas in Hive metastore. Table: 
> `default`.`mv_substr_table`; Column: substring_empname,_2,_5;*
>  *at* 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:150)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:148)
>  at scala.collection.immutable.List.foreach(List.scala:381)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema(HiveExternalCatalog.scala:148)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:222)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:110)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:316)
>  at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:119)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97)
>  at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:155)
>  at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95)
>  at 
> org.apache.spark.sql.execution.command.table.CarbonCreateTableCommand.processMetadata(CarbonCreateTableCommand.scala:126)
>  at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:68)
>  at 
> org.apache.carbondata.mv.datamap.MVHelper$.createMVDataMap(MVHelper.scala:103)
>  at 
> org.apache.carbondata.mv.datamap.MVDataMapProvider.initMeta(MVDataMapProvider.scala:53)
>  at 
> org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:118)
>  at 
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:90)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97)
>  at 

[GitHub] carbondata pull request #2596: [CARBONDATA-2806] Fix clean carbondata files ...

2018-08-01 Thread ravipesala
GitHub user ravipesala opened a pull request:

https://github.com/apache/carbondata/pull/2596

[CARBONDATA-2806] Fix clean carbondata files when task has multiple 
carbondata files

Problem:
When task has multiple blocks and blocklets then clean files is not 
cleaning properly.
Solution:
SegmentFile read contains duplicate block paths so cleaning is aborting in 
between.So remove the duplicate block paths.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata 
flat-folder-delete-new

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2596.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2596


commit 6edadc82d488fc0e251a4c63b62e056aac3968ee
Author: ravipesala 
Date:   2018-08-01T16:05:59Z

Fix clean carbondata files when task has multiple carbondata files




---


[GitHub] carbondata issue #2579: [HOTFIX][PR 2575] Fixed modular plan creation only i...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2579
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6437/



---


[GitHub] carbondata issue #2579: [HOTFIX][PR 2575] Fixed modular plan creation only i...

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2579
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7711/



---


[GitHub] carbondata issue #2552: [CARBONDATA-2781] Added fix for Null Pointer Excpeti...

2018-08-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2552
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6099/



---


[GitHub] carbondata issue #2595: [Documentation] [Unsafe Configuration] Added carbon....

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2595
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6436/



---


[GitHub] carbondata issue #2595: [Documentation] [Unsafe Configuration] Added carbon....

2018-08-01 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2595
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7710/



---


[GitHub] carbondata issue #2594: [CARBONDATA-2809][DataMap] Skip rebuilding for non-l...

2018-08-01 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2594
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/6098/



---


[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...

2018-08-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2579#discussion_r206914734
  
--- Diff: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala
 ---
@@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) 
extends Rule[LogicalPlan] {
   }
 
   def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): 
Boolean = {
-!plan.isInstanceOf[Command] && !isDataMapExists(plan, 
catalog.listAllSchema()) &&
-!plan.isInstanceOf[DeserializeToObject]
+if (!plan.isInstanceOf[Command]  && 
!plan.isInstanceOf[DeserializeToObject]) {
+  val catalogs = extractCatalogs(plan)
+  !isDataMapReplaced(catalog.listAllValidSchema(), catalogs) &&
+  isDataMapExists(catalog.listAllValidSchema(), catalogs)
+} else {
+  false
+}
+
   }
   /**
* Check whether datamap table already updated in the query.
*
-   * @param plan
* @param mvs
* @return
*/
-  def isDataMapExists(plan: LogicalPlan, mvs: Array[SummaryDataset]): 
Boolean = {
-val catalogs = plan collect {
-  case l: LogicalRelation => l.catalogTable
-}
-catalogs.isEmpty || catalogs.exists { c =>
+  def isDataMapReplaced(
+  mvs: Array[SummaryDataset],
+  catalogs: Seq[Option[CatalogTable]]): Boolean = {
+catalogs.exists { c =>
   mvs.exists { mv =>
 val identifier = mv.dataMapSchema.getRelationIdentifier
 identifier.getTableName.equals(c.get.identifier.table) &&
 identifier.getDatabaseName.equals(c.get.database)
   }
 }
   }
+
+  /**
+   * Check whether any suitable datamaps exists for this plan.
+   *
+   * @param mvs
+   * @return
+   */
+  def isDataMapExists(mvs: Array[SummaryDataset], catalogs: 
Seq[Option[CatalogTable]]): Boolean = {
+catalogs.exists { c =>
+  mvs.exists { mv =>
+mv.dataMapSchema.getParentTables.asScala.exists { identifier =>
+  identifier.getTableName.equals(c.get.identifier.table) &&
+  identifier.getDatabaseName.equals(c.get.database)
+}
+  }
+}
+  }
+
+  private def extractCatalogs(plan: LogicalPlan) = {
--- End diff --

ok


---


[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...

2018-08-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2579#discussion_r206914703
  
--- Diff: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala
 ---
@@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) 
extends Rule[LogicalPlan] {
   }
 
   def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): 
Boolean = {
--- End diff --

ok


---


[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...

2018-08-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2579#discussion_r206913481
  
--- Diff: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala
 ---
@@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) 
extends Rule[LogicalPlan] {
   }
 
   def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): 
Boolean = {
-!plan.isInstanceOf[Command] && !isDataMapExists(plan, 
catalog.listAllSchema()) &&
-!plan.isInstanceOf[DeserializeToObject]
+if (!plan.isInstanceOf[Command]  && 
!plan.isInstanceOf[DeserializeToObject]) {
+  val catalogs = extractCatalogs(plan)
+  !isDataMapReplaced(catalog.listAllValidSchema(), catalogs) &&
+  isDataMapExists(catalog.listAllValidSchema(), catalogs)
+} else {
+  false
+}
+
   }
   /**
* Check whether datamap table already updated in the query.
*
-   * @param plan
* @param mvs
* @return
*/
-  def isDataMapExists(plan: LogicalPlan, mvs: Array[SummaryDataset]): 
Boolean = {
-val catalogs = plan collect {
-  case l: LogicalRelation => l.catalogTable
-}
-catalogs.isEmpty || catalogs.exists { c =>
+  def isDataMapReplaced(
+  mvs: Array[SummaryDataset],
+  catalogs: Seq[Option[CatalogTable]]): Boolean = {
+catalogs.exists { c =>
   mvs.exists { mv =>
 val identifier = mv.dataMapSchema.getRelationIdentifier
 identifier.getTableName.equals(c.get.identifier.table) &&
 identifier.getDatabaseName.equals(c.get.database)
   }
 }
   }
+
+  /**
+   * Check whether any suitable datamaps exists for this plan.
--- End diff --

yes, initial match of parent table. Updated the comment


---


[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...

2018-08-01 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2579#discussion_r206912038
  
--- Diff: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala
 ---
@@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) 
extends Rule[LogicalPlan] {
   }
 
   def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): 
Boolean = {
-!plan.isInstanceOf[Command] && !isDataMapExists(plan, 
catalog.listAllSchema()) &&
-!plan.isInstanceOf[DeserializeToObject]
+if (!plan.isInstanceOf[Command]  && 
!plan.isInstanceOf[DeserializeToObject]) {
+  val catalogs = extractCatalogs(plan)
+  !isDataMapReplaced(catalog.listAllValidSchema(), catalogs) &&
+  isDataMapExists(catalog.listAllValidSchema(), catalogs)
+} else {
+  false
+}
+
   }
   /**
* Check whether datamap table already updated in the query.
*
-   * @param plan
* @param mvs
--- End diff --

ok


---


[GitHub] carbondata pull request #2577: [CARBONDATA-2796][32K]Fix data loading proble...

2018-08-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2577


---


[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...

2018-08-01 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2579#discussion_r206905056
  
--- Diff: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala
 ---
@@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) 
extends Rule[LogicalPlan] {
   }
 
   def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): 
Boolean = {
--- End diff --

can you add comment to this func


---


[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...

2018-08-01 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2579#discussion_r206904688
  
--- Diff: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala
 ---
@@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) 
extends Rule[LogicalPlan] {
   }
 
   def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): 
Boolean = {
-!plan.isInstanceOf[Command] && !isDataMapExists(plan, 
catalog.listAllSchema()) &&
-!plan.isInstanceOf[DeserializeToObject]
+if (!plan.isInstanceOf[Command]  && 
!plan.isInstanceOf[DeserializeToObject]) {
+  val catalogs = extractCatalogs(plan)
+  !isDataMapReplaced(catalog.listAllValidSchema(), catalogs) &&
+  isDataMapExists(catalog.listAllValidSchema(), catalogs)
+} else {
+  false
+}
+
   }
   /**
* Check whether datamap table already updated in the query.
*
-   * @param plan
* @param mvs
* @return
*/
-  def isDataMapExists(plan: LogicalPlan, mvs: Array[SummaryDataset]): 
Boolean = {
-val catalogs = plan collect {
-  case l: LogicalRelation => l.catalogTable
-}
-catalogs.isEmpty || catalogs.exists { c =>
+  def isDataMapReplaced(
+  mvs: Array[SummaryDataset],
+  catalogs: Seq[Option[CatalogTable]]): Boolean = {
+catalogs.exists { c =>
   mvs.exists { mv =>
 val identifier = mv.dataMapSchema.getRelationIdentifier
 identifier.getTableName.equals(c.get.identifier.table) &&
 identifier.getDatabaseName.equals(c.get.database)
   }
 }
   }
+
+  /**
+   * Check whether any suitable datamaps exists for this plan.
+   *
+   * @param mvs
+   * @return
+   */
+  def isDataMapExists(mvs: Array[SummaryDataset], catalogs: 
Seq[Option[CatalogTable]]): Boolean = {
+catalogs.exists { c =>
+  mvs.exists { mv =>
+mv.dataMapSchema.getParentTables.asScala.exists { identifier =>
+  identifier.getTableName.equals(c.get.identifier.table) &&
+  identifier.getDatabaseName.equals(c.get.database)
+}
+  }
+}
+  }
+
+  private def extractCatalogs(plan: LogicalPlan) = {
--- End diff --

please add return value type


---


[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...

2018-08-01 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2579#discussion_r206903988
  
--- Diff: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala
 ---
@@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) 
extends Rule[LogicalPlan] {
   }
 
   def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): 
Boolean = {
-!plan.isInstanceOf[Command] && !isDataMapExists(plan, 
catalog.listAllSchema()) &&
-!plan.isInstanceOf[DeserializeToObject]
+if (!plan.isInstanceOf[Command]  && 
!plan.isInstanceOf[DeserializeToObject]) {
+  val catalogs = extractCatalogs(plan)
+  !isDataMapReplaced(catalog.listAllValidSchema(), catalogs) &&
+  isDataMapExists(catalog.listAllValidSchema(), catalogs)
+} else {
+  false
+}
+
   }
   /**
* Check whether datamap table already updated in the query.
*
-   * @param plan
* @param mvs
* @return
*/
-  def isDataMapExists(plan: LogicalPlan, mvs: Array[SummaryDataset]): 
Boolean = {
-val catalogs = plan collect {
-  case l: LogicalRelation => l.catalogTable
-}
-catalogs.isEmpty || catalogs.exists { c =>
+  def isDataMapReplaced(
+  mvs: Array[SummaryDataset],
+  catalogs: Seq[Option[CatalogTable]]): Boolean = {
+catalogs.exists { c =>
   mvs.exists { mv =>
 val identifier = mv.dataMapSchema.getRelationIdentifier
 identifier.getTableName.equals(c.get.identifier.table) &&
 identifier.getDatabaseName.equals(c.get.database)
   }
 }
   }
+
+  /**
+   * Check whether any suitable datamaps exists for this plan.
--- End diff --

suitable means matched plan?


---


[GitHub] carbondata pull request #2579: [HOTFIX][PR 2575] Fixed modular plan creation...

2018-08-01 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2579#discussion_r206903577
  
--- Diff: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala
 ---
@@ -80,26 +83,54 @@ class MVAnalyzerRule(sparkSession: SparkSession) 
extends Rule[LogicalPlan] {
   }
 
   def isValidPlan(plan: LogicalPlan, catalog: SummaryDatasetCatalog): 
Boolean = {
-!plan.isInstanceOf[Command] && !isDataMapExists(plan, 
catalog.listAllSchema()) &&
-!plan.isInstanceOf[DeserializeToObject]
+if (!plan.isInstanceOf[Command]  && 
!plan.isInstanceOf[DeserializeToObject]) {
+  val catalogs = extractCatalogs(plan)
+  !isDataMapReplaced(catalog.listAllValidSchema(), catalogs) &&
+  isDataMapExists(catalog.listAllValidSchema(), catalogs)
+} else {
+  false
+}
+
   }
   /**
* Check whether datamap table already updated in the query.
*
-   * @param plan
* @param mvs
--- End diff --

can you provide the comment for parameter and return value


---


[GitHub] carbondata issue #2577: [CARBONDATA-2796][32K]Fix data loading problem when ...

2018-08-01 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2577
  
LGTM


---


[GitHub] carbondata pull request #2587: [CARBONDATA-2806] Delete delete delta files u...

2018-08-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2587


---


[GitHub] carbondata pull request #2595: [Documentation] [Unsafe Configuration] Added ...

2018-08-01 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/carbondata/pull/2595

[Documentation] [Unsafe Configuration] Added 
carbon.unsafe.driver.working.memory.in.mb parameter to differentiate between 
driver and executor unsafe memory

Added carbon.unsafe.driver.working.memory.in.mb parameter to differentiate 
between driver and executor unsafe memory

 - [ ] Any interfaces changed?
No 
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
Yes. Updated
 - [ ] Testing done
Verified manually   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/carbondata 
unsafe_driver_property

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2595.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2595


commit 8196be0a50d9d0f38452cfdf5fbe0b0485973e87
Author: manishgupta88 
Date:   2018-08-01T14:08:30Z

Added carbon.unsafe.driver.working.memory.in.mb parameter to differentiate 
between driver and executor unsafe memory




---


[GitHub] carbondata issue #2587: [CARBONDATA-2806] Delete delete delta files upon cle...

2018-08-01 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2587
  
LGTM


---


[GitHub] carbondata pull request #2581: [CARBONDATA-2800][Doc] Add useful tips about ...

2018-08-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2581


---


[GitHub] carbondata issue #2581: [CARBONDATA-2800][Doc] Add useful tips about bloomfi...

2018-08-01 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2581
  
LGTM


---


[GitHub] carbondata pull request #2572: [CARBONDATA-2793][32k][Doc] Add 32k support i...

2018-08-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2572


---


  1   2   >