date:20170920

[GitHub] carbondata issue #1376: [WIP] Get the detailed blocklet information using de...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1376
  
Build Success with Spark 1.6, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/129/



---

[GitHub] carbondata issue #1376: [WIP] Get the detailed blocklet information using de...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1376
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/253/



---

[GitHub] carbondata issue #1376: [WIP] Get the detailed blocklet information using de...

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1376
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/883/



---

[GitHub] carbondata issue #1376: [WIP] Get the detailed blocklet information using de...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1376
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/252/



---

[GitHub] carbondata issue #1376: [WIP] Get the detailed blocklet information using de...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1376
  
Build Failed with Spark 1.6, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/128/



---

[jira] [Resolved] (CARBONDATA-1491) Dictionary_exclude columns are not going into no_dictionary flow

2017-09-20 Thread Ravindra Pesala (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala resolved CARBONDATA-1491.
-
   Resolution: Fixed
Fix Version/s: 1.2.0

> Dictionary_exclude columns are not going into no_dictionary flow
> 
>
> Key: CARBONDATA-1491
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1491
> Project: CarbonData
>  Issue Type: Bug
>Reporter: dhatchayani
>Assignee: dhatchayani
> Fix For: 1.2.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Dictionary_exclude columns are not going into dictionary_exclude flow. This 
> is visible in case of alter add measure column with default value.
> Alter add int column and do select column, below issue exists.
> java.lang.IllegalArgumentException: Wrong length: 2, expected 4
>   at 
> org.apache.carbondata.core.util.ByteUtil.explainWrongLengthOrOffset(ByteUtil.java:581)
>   at org.apache.carbondata.core.util.ByteUtil.toInt(ByteUtil.java:500)
>   at 
> org.apache.carbondata.core.scan.executor.util.RestructureUtil.getNoDictionaryDefaultValue(RestructureUtil.java:222)
>   at 
> org.apache.carbondata.core.scan.executor.util.RestructureUtil.validateAndGetDefaultValue(RestructureUtil.java:162)
>   at 
> org.apache.carbondata.core.scan.executor.util.RestructureUtil.createDimensionInfoAndGetCurrentBlockQueryDimension(RestructureUtil.java:118)
>   at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfoForBlock(AbstractQueryExecutor.java:259)
>   at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:223)
>   at 
> org.apache.carbondata.core.scan.executor.impl.VectorDetailQueryExecutor.execute(VectorDetailQueryExecutor.java:36)
>   at 
> org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.initialize(VectorizedCarbonRecordReader.java:116)
>   at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:229)
>   at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:62)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:99)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata pull request #1374: [CARBONDATA-1491] Dictionary_exclude columns ...

2017-09-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1374


---

[GitHub] carbondata issue #1374: [CARBONDATA-1491] Dictionary_exclude columns are not...

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1374
  
LGTM


---

[GitHub] carbondata issue #1359: [CARBONDATA-1480]Min Max Index Example for DataMap

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1359
  
@sounakr Please add example based on the PR 
https://github.com/apache/carbondata/pull/1376 .


---

[GitHub] carbondata pull request #1376: [WIP] Get the detailed blocklet information u...

2017-09-20 Thread ravipesala

GitHub user ravipesala opened a pull request:

https://github.com/apache/carbondata/pull/1376

[WIP] Get the detailed blocklet information using default BlockletDataMap 
from other datamaps

All the detail information of blocklet which is need for exceuting query is 
present only BlockletDataMap. It is actually default datamap.
So if new datamap is added then it gives only information of blocklet and 
blockid, it is insuffucient information to exceute query.
Now this PR adds the functionality of retrieving detailed blocklet 
information from the BlockletDataMap based on block and blocklet id.  So now 
new datamaps can only concentrate on business logic and return only block and 
blockletid. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata 
datamap-refactor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1376.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1376


commit 5dce79bf6ac1bb279c24b0f56734778abc4664d7
Author: Ravindra Pesala 
Date:   2017-09-21T05:40:57Z

Refactor datamap to get the detailed blocklet information from default 
BlockletDataMap




---

[GitHub] carbondata issue #1374: [CARBONDATA-1491] Dictionary_exclude columns are not...

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1374
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/882/



---

[jira] [Created] (CARBONDATA-1502) Compaction Support: Enhance Test Cases for Struct DataType

2017-09-20 Thread Pawan Malwal (JIRA)

Pawan Malwal created CARBONDATA-1502:


 Summary: Compaction Support: Enhance Test Cases for Struct DataType
 Key: CARBONDATA-1502
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1502
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Pawan Malwal
Assignee: Pawan Malwal
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1374: [CARBONDATA-1491] Dictionary_exclude columns are not...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1374
  
Build Success with Spark 1.6, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/127/



---

[GitHub] carbondata issue #1374: [CARBONDATA-1491] Dictionary_exclude columns are not...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1374
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/251/



---

[GitHub] carbondata pull request #1374: [CARBONDATA-1491] Dictionary_exclude columns ...

2017-09-20 Thread dhatchayani

Github user dhatchayani commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1374#discussion_r140146423
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -617,7 +617,13 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 // by default consider all String cols as dims and if any dictionary 
include isn't present then
 // add it to noDictionaryDims list. consider all dictionary 
excludes/include cols as dims
 fields.foreach { field =>
-  if (dictIncludeCols.exists(x => x.equalsIgnoreCase(field.column))) {
+  if (dictExcludeCols.exists(x => x.equalsIgnoreCase(field.column))) {
+val dataType = 
DataTypeUtil.getDataType(field.dataType.get.toUpperCase())
+if (dataType != DataType.DATE) {
--- End diff --

we are already having a whitelist for dictionary_exclude that will filter 
out date. so this check can be removed


---

[GitHub] carbondata issue #1359: [CARBONDATA-1480]Min Max Index Example for DataMap

2017-09-20 Thread sounakr

Github user sounakr commented on the issue:

https://github.com/apache/carbondata/pull/1359
  
@ravipesala and @jackylk , sure will make it simple. Will check if some 
more interfaces needs to be opened.


---

[GitHub] carbondata issue #1352: [CARBONDATA-1174] Streaming Ingestion - schema valid...

2017-09-20 Thread aniketadnaik

Github user aniketadnaik commented on the issue:

https://github.com/apache/carbondata/pull/1352
  
* Please merge this PR (Branch: **StreamIngest-1174**) to  
"**streaming_ingest**" branch.

* Following is the build and test report
$>mvn -Pspark-2.1 -Dspark.version=2.1.0 clean verify
.

[INFO] 

[INFO] Reactor Summary:
[INFO] 
[INFO] Apache CarbonData :: Parent  SUCCESS [  
1.829 s]
[INFO] Apache CarbonData :: Common  SUCCESS [  
4.067 s]
[INFO] Apache CarbonData :: Core .. SUCCESS [ 
46.276 s]
[INFO] Apache CarbonData :: Processing  SUCCESS [ 
13.728 s]
[INFO] Apache CarbonData :: Hadoop  SUCCESS [ 
12.789 s]
[INFO] Apache CarbonData :: Spark Common .. SUCCESS [ 
29.121 s]
[INFO] Apache CarbonData :: Spark2  SUCCESS [03:02 
min]
[INFO] Apache CarbonData :: Spark Common Test . SUCCESS [08:54 
min]
[INFO] Apache CarbonData :: Assembly .. SUCCESS [  
1.730 s]
[INFO] Apache CarbonData :: Hive .. SUCCESS [ 
13.845 s]
[INFO] Apache CarbonData :: presto  SUCCESS [ 
34.772 s]
[INFO] Apache CarbonData :: Spark2 Examples ... SUCCESS [  
9.362 s]
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 14:44 min
[INFO] Finished at: 2017-09-20T18:47:44-07:00
[INFO] Final Memory: 174M/1827M
[INFO] 




$>mvn clean verify
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache CarbonData :: Parent  SUCCESS [  
1.839 s]
[INFO] Apache CarbonData :: Common  SUCCESS [  
3.985 s]
[INFO] Apache CarbonData :: Core .. SUCCESS [ 
46.732 s]
[INFO] Apache CarbonData :: Processing  SUCCESS [ 
13.321 s]
[INFO] Apache CarbonData :: Hadoop  SUCCESS [ 
13.376 s]
[INFO] Apache CarbonData :: Spark Common .. SUCCESS [ 
29.065 s]
[INFO] Apache CarbonData :: Spark2  SUCCESS [03:02 
min]
[INFO] Apache CarbonData :: Spark Common Test . SUCCESS [08:45 
min]
[INFO] Apache CarbonData :: Assembly .. SUCCESS [  
1.772 s]
[INFO] Apache CarbonData :: Hive .. SUCCESS [ 
12.770 s]
[INFO] Apache CarbonData :: presto  SUCCESS [ 
38.861 s]
[INFO] Apache CarbonData :: Spark2 Examples ... SUCCESS [  
9.452 s]
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 14:39 min
[INFO] Finished at: 2017-09-20T18:30:11-07:00
[INFO] Final Memory: 173M/1628M
[INFO] 



---

[GitHub] carbondata issue #1359: [CARBONDATA-1480]Min Max Index Example for DataMap

2017-09-20 Thread jackylk

Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1359
  
@sounakr I feel this same as Ravindra, let's make the example in a simplest 
way, so that developers can understand the concept of datamap and the usage of 
it in short time.  


---

[GitHub] carbondata issue #1352: [CARBONDATA-1174] Streaming Ingestion - schema valid...

2017-09-20 Thread aniketadnaik

Github user aniketadnaik commented on the issue:

https://github.com/apache/carbondata/pull/1352
  
Please rebase "streaming_ingest" branch from latest "master" to take care 
of presto test failures. 


---

[GitHub] carbondata issue #1361: [CARBONDATA-1481] Compaction support global sort

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1361
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/881/



---

[GitHub] carbondata issue #1367: [CARBONDATA-1398] [WIP] Support query from specified...

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1367
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/880/



---

[GitHub] carbondata issue #1361: [CARBONDATA-1481] Compaction support global sort

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1361
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/250/



---

[GitHub] carbondata issue #1361: [CARBONDATA-1481] Compaction support global sort

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1361
  
Build Success with Spark 1.6, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/126/



---

[GitHub] carbondata issue #1367: [CARBONDATA-1398] [WIP] Support query from specified...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1367
  
Build Failed with Spark 1.6, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/125/



---

[GitHub] carbondata issue #1367: [CARBONDATA-1398] [WIP] Support query from specified...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1367
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/249/



---

[jira] [Created] (CARBONDATA-1501) Update Array values

2017-09-20 Thread Venkata Ramana G (JIRA)

Venkata Ramana G created CARBONDATA-1501:


 Summary: Update Array values
 Key: CARBONDATA-1501
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1501
 Project: CarbonData
  Issue Type: Sub-task
  Components: core, spark-integration
Reporter: Venkata Ramana G
Priority: Minor
 Fix For: 1.3.0


Update Array values.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1500) Support Alter table to add and remove Array column

2017-09-20 Thread Venkata Ramana G (JIRA)

Venkata Ramana G created CARBONDATA-1500:


 Summary: Support Alter table to add and remove Array column
 Key: CARBONDATA-1500
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1500
 Project: CarbonData
  Issue Type: Sub-task
  Components: core
Reporter: Venkata Ramana G
Priority: Minor


implement DDL and requires default value handling.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1499) Support Array type to be a measure

2017-09-20 Thread Venkata Ramana G (JIRA)

Venkata Ramana G created CARBONDATA-1499:


 Summary: Support Array type to be a measure
 Key: CARBONDATA-1499
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1499
 Project: CarbonData
  Issue Type: Sub-task
  Components: core, data-load, data-query
Reporter: Venkata Ramana G
 Fix For: 1.3.0


Currently supports only dimensions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1498) Support multilevel Array

2017-09-20 Thread Venkata Ramana G (JIRA)

Venkata Ramana G created CARBONDATA-1498:


 Summary: Support multilevel Array
 Key: CARBONDATA-1498
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1498
 Project: CarbonData
  Issue Type: Sub-task
  Components: core, spark-integration
Reporter: Venkata Ramana G
 Fix For: 1.3.0


currently DDL is validated to allow only 2 levels, remove this restriction



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1497) Support DDL for Array fields Dictionary include and Dictionary Exclude

2017-09-20 Thread Venkata Ramana G (JIRA)

Venkata Ramana G created CARBONDATA-1497:


 Summary: Support DDL for Array fields Dictionary include and 
Dictionary Exclude
 Key: CARBONDATA-1497
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1497
 Project: CarbonData
  Issue Type: Sub-task
  Components: spark-integration
Reporter: Venkata Ramana G
 Fix For: 1.3.0


Also needs to handle CarbonDictionaryDecoder to handle the same.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1496) Array type : insert into table support

2017-09-20 Thread Venkata Ramana G (JIRA)

Venkata Ramana G created CARBONDATA-1496:


 Summary: Array type : insert into table support
 Key: CARBONDATA-1496
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1496
 Project: CarbonData
  Issue Type: Sub-task
  Components: data-load
Reporter: Venkata Ramana G
 Fix For: 1.3.0


Source table data containing Array data needs to convert from spark datatype to 
string , as carbon takes string as input row



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1495) Array type : Compaction support

2017-09-20 Thread Venkata Ramana G (JIRA)

Venkata Ramana G created CARBONDATA-1495:


 Summary: Array type : Compaction support
 Key: CARBONDATA-1495
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1495
 Project: CarbonData
  Issue Type: Sub-task
  Components: data-load
Reporter: Venkata Ramana G


As compaction works at byte level, no changes required. Needs to add test-cases



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (CARBONDATA-1494) Load, query, filter, NULL values, UDFs, Describe support

2017-09-20 Thread Venkata Ramana G (JIRA)

Venkata Ramana G created CARBONDATA-1494:


 Summary: Load, query, filter, NULL values, UDFs, Describe support
 Key: CARBONDATA-1494
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1494
 Project: CarbonData
  Issue Type: Sub-task
  Components: core, sql
Reporter: Venkata Ramana G
 Fix For: 1.3.0


Implementation in place needs to add test-cases and bug fix



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (CARBONDATA-1489) Stabilize Array DataType Support

2017-09-20 Thread Venkata Ramana G (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Ramana G updated CARBONDATA-1489:
-
Issue Type: Improvement  (was: Bug)

> Stabilize Array DataType Support
> 
>
> Key: CARBONDATA-1489
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1489
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core, sql
>Affects Versions: 1.1.1
>Reporter: Venkata Ramana G
>Assignee: Venkata Ramana G
>
> Stabilize Array DataType Support. This is umbrella jira to track all related 
> tasks.
> Following cases needs to  be handled:
> ||Sub feature||Pending activity||Remarks||
> |query, filter, NULL values, UDFs, Describe support|Test + Fix|Implementation 
> in place needs to add test-cases and bug fix|
> |Compaction support | Test + fix | As compaction works at byte level, no 
> changes required. Needs to add test-cases|
> |Insert into table| Develop | Source table data containing Array data needs 
> to convert from spark datatype to string , as carbon takes string as input 
> row |
> |Support DDL for Array fields Dictionary include and Dictionary Exclude | 
> Develop | Also needs to handle CarbonDictionaryDecoder  to handle the same. |
> |Support multilevel Array | Develop | currently DDL is validated to allow 
> only 2 levels, remove this restriction|
> |Support Array value to be a measure | Develop | Currently supports only 
> dimensions |
> |Support Alter table to add and remove Array column | Develop | implement DDL 
> and requires default value handling |
> |Projections of Array values push down to carbon | Develop | this is an 
> optimization, when more number of values are present in Array |
> |Filter Array values push down to carbon | Develop | this is an optimization, 
> when more number of values are present in Array |
> |Update Array values | Develop | update array value|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1361: [CARBONDATA-1481] Compaction support global sort

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1361
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/879/



---

[jira] [Updated] (CARBONDATA-1457) Stabilize Struct DataType Support

2017-09-20 Thread Kanaka Kumar Avvaru (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kanaka Kumar Avvaru updated CARBONDATA-1457:

Description: 
Stabilize Struct DataType Support. This is umbrella jira to track all related 
tasks.
Following cases needs to  be handled:

||Sub feature||Pending activity||Remarks||
|Load,query, filter, NULL values, UDFs, Describe support|Test + 
Fix|Implementation in place needs to add test-cases and bug fix|
|Compaction support | Test + fix | As compaction works at byte level, no 
changes required. Needs to add test-cases|
|Insert into table| Develop | Source table data containing struct data needs to 
convert from spark datatype to string , as carbon takes string as input row |
|Support DDL for Struct fields Dictionary include and Dictionary Exclude | 
Develop | Also needs to handle CarbonDictionaryDecoder  to handle the same. |
|Support multilevel struct | Develop | currently DDL is validated to allow only 
2 levels, remove this restriction|
|Support struct field to be a measure | Develop | Currently supports only 
dimensions |
|Support Alter table to add and remove struct column | Develop | implement DDL 
and requires default value handling |
|Update & Delete Support| Test + Fix |  implementation is in place. Need to 
enhance test-cases and bug fix|
|Projections of struct fields push down to carbon | Develop | this is an 
optimization, when more number of fields are present in struct |
|Filter struct fields push down to carbon | Develop | this is an optimization, 
when more number of fields are present in struct |
|Struct fields participate in sort column | Develop | This is can be low 
priority |



  was:
Stabilize Struct DataType Support. This is umbrella jira to track all related 
tasks.
Following cases needs to  be handled:

||Sub feature||Pending activity||Remarks||
|Load,query, filter, NULL values, UDFs, Describe support|Test + 
Fix|Implementation in place needs to add test-cases and bug fix|
|Compaction support | Test + fix | As compaction works at byte level, no 
changes required. Needs to add test-cases|
|Insert into table| Develop | Source table data containing struct data needs to 
convert from spark datatype to string , as carbon takes string as input row |
|Support DDL for Struct fields Dictionary include and Dictionary Exclude | 
Develop | Also needs to handle CarbonDictionaryDecoder  to handle the same. |
|Support multilevel struct | Develop | currently DDL is validated to allow only 
2 levels, remove this restriction|
|Support struct field to be a measure | Develop | Currently supports only 
dimensions |
|Support Alter table to add and remove struct column | Develop | implement DDL 
and requires default value handling |
|Projections of struct fields push down to carbon | Develop | this is an 
optimization, when more number of fields are present in struct |
|Filter struct fields push down to carbon | Develop | this is an optimization, 
when more number of fields are present in struct |
|Struct fields participate in sort column | Develop | This is can be low 
priority |




> Stabilize Struct DataType Support
> -
>
> Key: CARBONDATA-1457
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1457
> Project: CarbonData
>  Issue Type: New Feature
>  Components: core, sql
>Affects Versions: 1.1.1
>Reporter: Sharanabasappa G Keriwaddi
>Assignee: Venkata Ramana G
> Fix For: 1.3.0
>
>
> Stabilize Struct DataType Support. This is umbrella jira to track all related 
> tasks.
> Following cases needs to  be handled:
> ||Sub feature||Pending activity||Remarks||
> |Load,query, filter, NULL values, UDFs, Describe support|Test + 
> Fix|Implementation in place needs to add test-cases and bug fix|
> |Compaction support | Test + fix | As compaction works at byte level, no 
> changes required. Needs to add test-cases|
> |Insert into table| Develop | Source table data containing struct data needs 
> to convert from spark datatype to string , as carbon takes string as input 
> row |
> |Support DDL for Struct fields Dictionary include and Dictionary Exclude | 
> Develop | Also needs to handle CarbonDictionaryDecoder  to handle the same. |
> |Support multilevel struct | Develop | currently DDL is validated to allow 
> only 2 levels, remove this restriction|
> |Support struct field to be a measure | Develop | Currently supports only 
> dimensions |
> |Support Alter table to add and remove struct column | Develop | implement 
> DDL and requires default value handling |
> |Update & Delete Support| Test + Fix |  implementation is in place. Need to 
> enhance test-cases and bug fix|
> |Projections of struct fields push down to carbon | Develop | this is an 
> optimization, when more number of fields are present in struct |
> |Filter struct fields push down to

[GitHub] carbondata issue #1367: [CARBONDATA-1398] [WIP] Support query from specified...

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1367
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/878/



---

[jira] [Updated] (CARBONDATA-1457) Stabilize Struct DataType Support

2017-09-20 Thread Kanaka Kumar Avvaru (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kanaka Kumar Avvaru updated CARBONDATA-1457:

Description: 
Stabilize Struct DataType Support. This is umbrella jira to track all related 
tasks.
Following cases needs to  be handled:

||Sub feature||Pending activity||Remarks||
|Load,query, filter, NULL values, UDFs, Describe support|Test + 
Fix|Implementation in place needs to add test-cases and bug fix|
|Compaction support | Test + fix | As compaction works at byte level, no 
changes required. Needs to add test-cases|
|Insert into table| Develop | Source table data containing struct data needs to 
convert from spark datatype to string , as carbon takes string as input row |
|Support DDL for Struct fields Dictionary include and Dictionary Exclude | 
Develop | Also needs to handle CarbonDictionaryDecoder  to handle the same. |
|Support multilevel struct | Develop | currently DDL is validated to allow only 
2 levels, remove this restriction|
|Support struct field to be a measure | Develop | Currently supports only 
dimensions |
|Support Alter table to add and remove struct column | Develop | implement DDL 
and requires default value handling |
|Projections of struct fields push down to carbon | Develop | this is an 
optimization, when more number of fields are present in struct |
|Filter struct fields push down to carbon | Develop | this is an optimization, 
when more number of fields are present in struct |
|Struct fields participate in sort column | Develop | This is can be low 
priority |



  was:
Stabilize Struct DataType Support. This is umbrella jira to track all related 
tasks.
Following cases needs to  be handled:

||Sub feature||Pending activity||Remarks||
|query, filter, NULL values, UDFs, Describe support|Test + Fix|Implementation 
in place needs to add test-cases and bug fix|
|Compaction support | Test + fix | As compaction works at byte level, no 
changes required. Needs to add test-cases|
|Insert into table| Develop | Source table data containing struct data needs to 
convert from spark datatype to string , as carbon takes string as input row |
|Support DDL for Struct fields Dictionary include and Dictionary Exclude | 
Develop | Also needs to handle CarbonDictionaryDecoder  to handle the same. |
|Support multilevel struct | Develop | currently DDL is validated to allow only 
2 levels, remove this restriction|
|Support struct field to be a measure | Develop | Currently supports only 
dimensions |
|Support Alter table to add and remove struct column | Develop | implement DDL 
and requires default value handling |
|Projections of struct fields push down to carbon | Develop | this is an 
optimization, when more number of fields are present in struct |
|Filter struct fields push down to carbon | Develop | this is an optimization, 
when more number of fields are present in struct |
|Struct fields participate in sort column | Develop | This is can be low 
priority |




> Stabilize Struct DataType Support
> -
>
> Key: CARBONDATA-1457
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1457
> Project: CarbonData
>  Issue Type: New Feature
>  Components: core, sql
>Affects Versions: 1.1.1
>Reporter: Sharanabasappa G Keriwaddi
>Assignee: Venkata Ramana G
> Fix For: 1.3.0
>
>
> Stabilize Struct DataType Support. This is umbrella jira to track all related 
> tasks.
> Following cases needs to  be handled:
> ||Sub feature||Pending activity||Remarks||
> |Load,query, filter, NULL values, UDFs, Describe support|Test + 
> Fix|Implementation in place needs to add test-cases and bug fix|
> |Compaction support | Test + fix | As compaction works at byte level, no 
> changes required. Needs to add test-cases|
> |Insert into table| Develop | Source table data containing struct data needs 
> to convert from spark datatype to string , as carbon takes string as input 
> row |
> |Support DDL for Struct fields Dictionary include and Dictionary Exclude | 
> Develop | Also needs to handle CarbonDictionaryDecoder  to handle the same. |
> |Support multilevel struct | Develop | currently DDL is validated to allow 
> only 2 levels, remove this restriction|
> |Support struct field to be a measure | Develop | Currently supports only 
> dimensions |
> |Support Alter table to add and remove struct column | Develop | implement 
> DDL and requires default value handling |
> |Projections of struct fields push down to carbon | Develop | this is an 
> optimization, when more number of fields are present in struct |
> |Filter struct fields push down to carbon | Develop | this is an 
> optimization, when more number of fields are present in struct |
> |Struct fields participate in sort column | Develop | This is can be low 
> priority |



--
This message was sent by Atlassian

[GitHub] carbondata pull request #1374: [CARBONDATA-1491] Dictionary_exclude columns ...

2017-09-20 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1374#discussion_r139970827
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -617,7 +617,13 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 // by default consider all String cols as dims and if any dictionary 
include isn't present then
 // add it to noDictionaryDims list. consider all dictionary 
excludes/include cols as dims
 fields.foreach { field =>
-  if (dictIncludeCols.exists(x => x.equalsIgnoreCase(field.column))) {
+  if (dictExcludeCols.exists(x => x.equalsIgnoreCase(field.column))) {
+val dataType = 
DataTypeUtil.getDataType(field.dataType.get.toUpperCase())
+if (dataType != DataType.DATE) {
--- End diff --

yes, please add


---

[GitHub] carbondata issue #1361: [CARBONDATA-1481] Compaction support global sort

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1361
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/248/



---

[GitHub] carbondata issue #1361: [CARBONDATA-1481] Compaction support global sort

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1361
  
Build Success with Spark 1.6, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/124/



---

[GitHub] carbondata issue #1267: [CARBONDATA-1326] Findbug fixes

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1267
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/877/



---

[GitHub] carbondata issue #1367: [CARBONDATA-1398] [WIP] Support query from specified...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1367
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/247/



---

[GitHub] carbondata issue #1367: [CARBONDATA-1398] [WIP] Support query from specified...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1367
  
Build Failed with Spark 1.6, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/123/



---

[GitHub] carbondata issue #1267: [CARBONDATA-1326] Findbug fixes

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1267
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/246/



---

[GitHub] carbondata pull request #1374: [CARBONDATA-1491] Dictionary_exclude columns ...

2017-09-20 Thread dhatchayani

Github user dhatchayani commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1374#discussion_r139958907
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -617,7 +617,13 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 // by default consider all String cols as dims and if any dictionary 
include isn't present then
 // add it to noDictionaryDims list. consider all dictionary 
excludes/include cols as dims
 fields.foreach { field =>
-  if (dictIncludeCols.exists(x => x.equalsIgnoreCase(field.column))) {
+  if (dictExcludeCols.exists(x => x.equalsIgnoreCase(field.column))) {
+val dataType = 
DataTypeUtil.getDataType(field.dataType.get.toUpperCase())
+if (dataType != DataType.DATE) {
--- End diff --

double decimal and float will throw exception as unsupported datatype, may 
be we can add date also to that list


---

[GitHub] carbondata issue #1267: [CARBONDATA-1326] Findbug fixes

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1267
  
Build Success with Spark 1.6, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/122/



---

[jira] [Comment Edited] (CARBONDATA-1281) Disk hotspot found during data loading

2017-09-20 Thread xuchuanyin (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104607#comment-16104607
 ] 

xuchuanyin edited comment on CARBONDATA-1281 at 9/20/17 12:38 PM:
--

Here I will provide the configuration used in my test for others to reference.

# ENV

3 HUAWEI RH2288 nodes, each has 24 Cores(E5-2667@2.90GHz), 256GB MEM, 11 
Disks(SAS)

We use JDBCServer to do loading test. 
We have 4 executor in total (3 executor on each node + 1 driver executor).
executor: 20 cores, 128GB  per exector
driver executor: 1 core, 20GB

# USE CASE

88Million Recods with CSV format

340+ columns per record

NO Dictionary column

TABLE_BLOCKSIZE 64

INVERTED_INDEX about 9 columns

# CONF

parameter   valueorigin-value
carbon.number.of.cores  20 
 carbon.number.of.cores.while.loading   14 
sort.inmemory.size.inmb 2048   1024
offheap.sort.chunk.size.inmb128 64
carbon.sort.intermediate.files.limit20  20
carbon.sort.file.buffer.size50  20
carbon.use.local.dirtruefalse
carbon.use.multiple.dir true false

# RESULT

Using `LOAD  DATA INPATH `, the loading cost about 6min

Observing the NMON, each disk IO usage is quite average.


was (Author: xuchuanyin):
Here I will provide the configuration used in my test for others to reference.

# ENV

3 HUAWEI RH2288 nodes, each has 24 Cores(E5-2667@2.90GHz), 256GB MEM, 11 
Disks(SAS)

We use JDBCServer to do loading test. 
We have 4 executor in total (3 executor on each node + 1 driver executor).
executor: 20 cores, 128GB  per exector
driver executor: 1 core, 20GB

# USE CASE

88Billion Recods with CSV format

340+ columns per record

NO Dictionary column

TABLE_BLOCKSIZE 64

INVERTED_INDEX about 9 columns

# CONF

parameter   valueorigin-value
carbon.number.of.cores  20 
 carbon.number.of.cores.while.loading   14 
sort.inmemory.size.inmb 2048   1024
offheap.sort.chunk.size.inmb128 64
carbon.sort.intermediate.files.limit20  20
carbon.sort.file.buffer.size50  20
carbon.use.local.dirtruefalse
carbon.use.multiple.dir true false

# RESULT

Using `LOAD  DATA INPATH `, the loading cost about 6min

Observing the NMON, each disk IO usage is quite average.

> Disk hotspot found during data loading
> --
>
> Key: CARBONDATA-1281
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1281
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core, data-load
>Affects Versions: 1.1.0
>Reporter: xuchuanyin
>Assignee: xuchuanyin
> Fix For: 1.2.0
>
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>
> # Scenario
> Currently we have done a massive data loading. The input data is about 71GB 
> in CSV format，and have about 88million records. When using carbondata, we do 
> not use any dictionary encoding. Our testing environment has three nodes and 
> each of them have 11 disks as yarn executor directory. We submit the loading 
> command through JDBCServer.The JDBCServer instance have three executors in 
> total, one on each node respectively. The loading takes about 10minutes 
> (+-3min vary from each time).
> We have observed the nmon information during the loading and find：
> 1. lots of CPU waits in the first half of loading;
> 2. only one single disk has many writes and almost reaches its bottleneck 
> (Avg. 80M/s, Max. 150M/s on SAS Disk)
> 3. the other disks are quite idel
> # Analyze
> When do data loading, carbondata read and sort data locally(default scope) 
> and write the temp files to local disk. In my case, there is only one 
> executor in one node, so carbondata write all the temp file to one 
> disk(container directory or yarn local directory), thus resulting into single 
> disk hotspot.
> # Modification
> We should support multiple directory for writing temp files to avoid disk 
> hotspot.
> Ps: I have improved this in my environment and the result is pretty 
> optimistic: the loading takes about 6minutes (10 minutes before improving).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1267: [CARBONDATA-1326] Findbug fixes

2017-09-20 Thread ManoharVanam

Github user ManoharVanam commented on the issue:

https://github.com/apache/carbondata/pull/1267
  
retest this please


---

[GitHub] carbondata pull request #1267: [CARBONDATA-1326] Findbug fixes

2017-09-20 Thread ManoharVanam

Github user ManoharVanam closed the pull request at:

https://github.com/apache/carbondata/pull/1267


---

[GitHub] carbondata pull request #1267: [CARBONDATA-1326] Findbug fixes

2017-09-20 Thread ManoharVanam

GitHub user ManoharVanam reopened a pull request:

https://github.com/apache/carbondata/pull/1267

[CARBONDATA-1326] Findbug fixes

Findbug fixes

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ManoharVanam/incubator-carbondata Fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1267


commit bec91f44ab3285c3a88e856653affe56be25f5e3
Author: Manohar 
Date:   2017-08-18T15:46:23Z

Findbug fixes




---

[GitHub] carbondata issue #1361: [CARBONDATA-1481] Compaction support global sort

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1361
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/876/



---

[GitHub] carbondata pull request #1361: [CARBONDATA-1481] Compaction support global s...

2017-09-20 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1361#discussion_r139949848
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/CompactionSupportGlobalSortPerformanceTest.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.spark.testsuite.datacompaction
+
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
+
+import org.apache.spark.sql.test.TestQueryExecutor
+import org.apache.spark.sql.test.util.QueryTest
+
+class CompactionSupportGlobalSortPerformanceTest extends QueryTest with 
BeforeAndAfterEach with BeforeAndAfterAll {
--- End diff --

remove this test case 


---

[GitHub] carbondata pull request #1374: [CARBONDATA-1491] Dictionary_exclude columns ...

2017-09-20 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1374#discussion_r139947636
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -617,7 +617,13 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 // by default consider all String cols as dims and if any dictionary 
include isn't present then
 // add it to noDictionaryDims list. consider all dictionary 
excludes/include cols as dims
 fields.foreach { field =>
-  if (dictIncludeCols.exists(x => x.equalsIgnoreCase(field.column))) {
+  if (dictExcludeCols.exists(x => x.equalsIgnoreCase(field.column))) {
+val dataType = 
DataTypeUtil.getDataType(field.dataType.get.toUpperCase())
+if (dataType != DataType.DATE) {
--- End diff --

Ok, how about double,decimal and float?


---

[GitHub] carbondata issue #1361: [CARBONDATA-1481] Compaction support global sort

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1361
  
Build Success with Spark 1.6, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/121/



---

[GitHub] carbondata issue #1361: [CARBONDATA-1481] Compaction support global sort

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1361
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/245/



---

[GitHub] carbondata pull request #1374: [CARBONDATA-1491] Dictionary_exclude columns ...

2017-09-20 Thread dhatchayani

Github user dhatchayani commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1374#discussion_r139946254
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -617,7 +617,13 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 // by default consider all String cols as dims and if any dictionary 
include isn't present then
 // add it to noDictionaryDims list. consider all dictionary 
excludes/include cols as dims
 fields.foreach { field =>
-  if (dictIncludeCols.exists(x => x.equalsIgnoreCase(field.column))) {
+  if (dictExcludeCols.exists(x => x.equalsIgnoreCase(field.column))) {
+val dataType = 
DataTypeUtil.getDataType(field.dataType.get.toUpperCase())
+if (dataType != DataType.DATE) {
--- End diff --

timestamp can be excluded, as now we are supporting timestamp in both 
include and exclude


---

[GitHub] carbondata pull request #1374: [CARBONDATA-1491] Dictionary_exclude columns ...

2017-09-20 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1374#discussion_r139945003
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -617,7 +617,13 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 // by default consider all String cols as dims and if any dictionary 
include isn't present then
 // add it to noDictionaryDims list. consider all dictionary 
excludes/include cols as dims
 fields.foreach { field =>
-  if (dictIncludeCols.exists(x => x.equalsIgnoreCase(field.column))) {
+  if (dictExcludeCols.exists(x => x.equalsIgnoreCase(field.column))) {
+val dataType = 
DataTypeUtil.getDataType(field.dataType.get.toUpperCase())
+if (dataType != DataType.DATE) {
--- End diff --

how about timestamp?


---

[GitHub] carbondata pull request #1101: [CARBONDATA-1143] fix for null struct type

2017-09-20 Thread gvramana

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1101#discussion_r139939203
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/datatypes/StructDataType.java
 ---
@@ -145,7 +146,7 @@ public void setSurrogateIndex(int surrIndex) {
   throws IOException, DictionaryGenerationException {
 dataOutputStream.writeInt(children.size());
 if (input == null) {
-  dataOutputStream.writeInt(children.size());
+  // dataOutputStream.writeInt(children.size());
--- End diff --

remove the comment


---

[jira] [Created] (CARBONDATA-1493) Basic Maptype support

2017-09-20 Thread Kunal Kapoor (JIRA)

Kunal Kapoor created CARBONDATA-1493:


 Summary: Basic Maptype support
 Key: CARBONDATA-1493
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1493
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Kunal Kapoor
Assignee: Kunal Kapoor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1375: [CARBONDATA-1288] skip single_pass if dictionary col...

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1375
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/875/



---

[jira] [Commented] (CARBONDATA-1457) Stabilize Struct DataType Support

2017-09-20 Thread Kanaka Kumar Avvaru (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173014#comment-16173014
 ] 

Kanaka Kumar Avvaru commented on CARBONDATA-1457:
-

Linked earlier created jiras on struct data type to this stabilization Umbrella 
jira

> Stabilize Struct DataType Support
> -
>
> Key: CARBONDATA-1457
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1457
> Project: CarbonData
>  Issue Type: New Feature
>  Components: core, sql
>Affects Versions: 1.1.1
>Reporter: Sharanabasappa G Keriwaddi
>Assignee: Venkata Ramana G
> Fix For: 1.3.0
>
>
> Stabilize Struct DataType Support. This is umbrella jira to track all related 
> tasks.
> Following cases needs to  be handled:
> ||Sub feature||Pending activity||Remarks||
> |query, filter, NULL values, UDFs, Describe support|Test + Fix|Implementation 
> in place needs to add test-cases and bug fix|
> |Compaction support | Test + fix | As compaction works at byte level, no 
> changes required. Needs to add test-cases|
> |Insert into table| Develop | Source table data containing struct data needs 
> to convert from spark datatype to string , as carbon takes string as input 
> row |
> |Support DDL for Struct fields Dictionary include and Dictionary Exclude | 
> Develop | Also needs to handle CarbonDictionaryDecoder  to handle the same. |
> |Support multilevel struct | Develop | currently DDL is validated to allow 
> only 2 levels, remove this restriction|
> |Support struct field to be a measure | Develop | Currently supports only 
> dimensions |
> |Support Alter table to add and remove struct column | Develop | implement 
> DDL and requires default value handling |
> |Projections of struct fields push down to carbon | Develop | this is an 
> optimization, when more number of fields are present in struct |
> |Filter struct fields push down to carbon | Develop | this is an 
> optimization, when more number of fields are present in struct |
> |Struct fields participate in sort column | Develop | This is can be low 
> priority |



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1375: [CARBONDATA-1288] skip single_pass if dictionary col...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1375
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/244/



---

[GitHub] carbondata issue #1375: [CARBONDATA-1288] skip single_pass if dictionary col...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1375
  
Build Success with Spark 1.6, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/120/



---

[GitHub] carbondata pull request #1375: [CARBONDATA-1288] skip single_pass if diction...

2017-09-20 Thread kunal642

GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/1375

[CARBONDATA-1288] skip single_pass if dictionary column is not present

Analysis: If none of the columns in the table have dictionary encoding then 
no need to go to single_pass flow as it will start the dictionary client.

Solution: set single_pass as false if no column has dictionary encoding

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata restrict_single_pass

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1375.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1375


commit b3010b9a245adbbd6bc68bad52a8eacb50ef40d9
Author: kunal642 
Date:   2017-09-20T09:53:42Z

skip single_pass if dictionary column is not present




---

[jira] [Created] (CARBONDATA-1492) Alter add and remove struct Column

2017-09-20 Thread dhatchayani (JIRA)

dhatchayani created CARBONDATA-1492:
---

 Summary: Alter add and remove struct Column
 Key: CARBONDATA-1492
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1492
 Project: CarbonData
  Issue Type: Sub-task
Reporter: dhatchayani
Assignee: dhatchayani
Priority: Minor


Alter table add and remove struct columns should be supported as a part of this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1368: [CARBONDATA-1486] Fixed issue of table status updati...

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1368
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/874/



---

[GitHub] carbondata issue #1368: [CARBONDATA-1486] Fixed issue of table status updati...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1368
  
Build Success with Spark 1.6, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/119/



---

[GitHub] carbondata issue #1368: [CARBONDATA-1486] Fixed issue of table status updati...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1368
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/243/



---

[GitHub] carbondata issue #1368: [CARBONDATA-1486] Fixed issue of table status updati...

2017-09-20 Thread manishgupta88

Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1368
  
retest this please


---

[jira] [Resolved] (CARBONDATA-1490) Unnecessary space is being allocated for measures in carbon row

2017-09-20 Thread Venkata Ramana G (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Ramana G resolved CARBONDATA-1490.
--
   Resolution: Fixed
Fix Version/s: 1.2.0

> Unnecessary space is being allocated for measures in carbon row
> ---
>
> Key: CARBONDATA-1490
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1490
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Kunal Kapoor
>Assignee: Kunal Kapoor
>Priority: Minor
> Fix For: 1.2.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Consider the following query which has only one column of complex type. While 
> loading when we create carbon row for each of the records, unnecessary space 
> in the carbon row is being allocated for measures which *do not exist*.
> spark.sql("CREATE TABLE carbon_table( complexData ARRAY) STORED BY 
> 'carbondata'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata pull request #1373: [CARBONDATA-1490] Fixed memory allocation for...

2017-09-20 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1373


---

[GitHub] carbondata issue #1361: [CARBONDATA-1481] Compaction support global sort

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1361
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/873/



---

[GitHub] carbondata issue #1373: [CARBONDATA-1490] Fixed memory allocation for carbon...

2017-09-20 Thread manishgupta88

Github user manishgupta88 commented on the issue:

https://github.com/apache/carbondata/pull/1373
  
LGTM


---

[GitHub] carbondata issue #1361: [CARBONDATA-1481] Compaction support global sort

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1361
  
Build Failed with Spark 1.6, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/118/



---

[GitHub] carbondata issue #1361: [CARBONDATA-1481] Compaction support global sort

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1361
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/242/



---

[jira] [Updated] (CARBONDATA-45) Support MAP type

2017-09-20 Thread Venkata Ramana G (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Ramana G updated CARBONDATA-45:
---
Description: 
{code:sql}
>>CREATE TABLE table1 (
 deviceInformationId int,
 channelsId string,
 props map)
  STORED BY 'org.apache.carbondata.format'

>>insert into table1 select 10,'channel1', map(1,'user1',101, 'root')
{code}

format of data to be read from csv, with '$' as level 1 delimiter and map keys 
terminated by '#'

{code:sql}
>>load data local inpath '/tmp/data.csv' into table1 options 
>>('COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', 
>>'COMPLEX_DELIMITER_FOR_KEY'='#')

20,channel2,2#user2$100#usercommon
30,channel3,3#user3$100#usercommon
40,channel4,4#user3$100#usercommon

>>select channelId, props[100] from table1 where deviceInformationId > 10;

20, usercommon
30, usercommon
40, usercommon

>>select channelId, props from table1 where props[2] = 'user2';

20, {2,'user2', 100, 'usercommon'}
{code}
Following cases needs to  be handled:

||Sub feature||Pending activity||Remarks||
|Basic Maptype support|Develop| Create table DDL, Load map data from CSV, 
select * from maptable|
|Maptype lookup in projection and filter|Develop|Projection and filters needs 
execution at spark|
|NULL values, UDFs, Describe support|Develop||
|Compaction support | Test + fix | As compaction works at byte level, no 
changes required. Needs to add test-cases|
|Insert into table| Develop | Source table data containing Map data needs to 
convert from spark datatype to string , as carbon takes string as input row |
|Support DDL for Map fields Dictionary include and Dictionary Exclude | Develop 
| Also needs to handle CarbonDictionaryDecoder  to handle the same. |
|Support multilevel Map | Develop | currently DDL is validated to allow only 2 
levels, remove this restriction|
|Support Map value to be a measure | Develop | Currently array and struct 
supports only dimensions which needs change|
|Support Alter table to add and remove Map column | Develop | implement DDL and 
requires default value handling |
|Projections of Map loopup push down to carbon | Develop | this is an 
optimization, when more number of values are present in Map |
|Filter map loolup push down to carbon | Develop | this is an optimization, 
when more number of values are present in Map |
|Update Map values | Develop | update map value|

h4. Design suggestion:

Map can be represented internally stored as Array>, So that 
conversion of data is required to Map data type while giving to spark. Schema 
will have new column of map type similar to Array.




  was:
{code:sql}
>>CREATE TABLE table1 (
 deviceInformationId int,
 channelsId string,
 props map)
  STORED BY 'org.apache.carbondata.format'

>>insert into table1 select 10,'channel1', map(1,'user1',101, 'root')
{code}

format of data to be read from csv, with '$' as level 1 delimiter and map keys 
terminated by '#'

{code:sql}
>>load data local inpath '/tmp/data.csv' into table1 options 
>>('COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', 
>>'COMPLEX_DELIMITER_FOR_KEY'='#')

20,channel2,2#user2$100#usercommon
30,channel3,3#user3$100#usercommon
40,channel4,4#user3$100#usercommon

>>select channelId, props[100] from table1 where deviceInformationId > 10;

20, usercommon
30, usercommon
40, usercommon

>>select channelId, props from table1 where props[2] == 'user2';

20, {2,'user2', 100, 'usercommon'}
{code}
Following cases needs to  be handled:

||Sub feature||Pending activity||Remarks||
|Basic Maptype support|Develop| Create table DDL, Load map data from CSV, 
select * from maptable|
|Maptype lookup in projection and filter|Develop|Projection and filters needs 
execution at spark|
|NULL values, UDFs, Describe support|Develop||
|Compaction support | Test + fix | As compaction works at byte level, no 
changes required. Needs to add test-cases|
|Insert into table| Develop | Source table data containing Map data needs to 
convert from spark datatype to string , as carbon takes string as input row |
|Support DDL for Map fields Dictionary include and Dictionary Exclude | Develop 
| Also needs to handle CarbonDictionaryDecoder  to handle the same. |
|Support multilevel Map | Develop | currently DDL is validated to allow only 2 
levels, remove this restriction|
|Support Map value to be a measure | Develop | Currently array and struct 
supports only dimensions which needs change|
|Support Alter table to add and remove Map column | Develop | implement DDL and 
requires default value handling |
|Projections of Map loopup push down to carbon | Develop | this is an 
optimization, when more number of values are present in Map |
|Filter map loolup push down to carbon | Develop | this is an optimization, 
when more number of values are present in Map |
|Up

[jira] [Updated] (CARBONDATA-45) Support MAP type

2017-09-20 Thread Venkata Ramana G (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Ramana G updated CARBONDATA-45:
---
Description: 
{code:sql}
>>CREATE TABLE table1 (
 deviceInformationId int,
 channelsId string,
 props map)
  STORED BY 'org.apache.carbondata.format'

>>insert into table1 select 10,'channel1', map(1,'user1',101, 'root')
{code}

format of data to be read from csv, with '$' as level 1 delimiter and map keys 
terminated by '#'

{code:sql}
>>load data local inpath '/tmp/data.csv' into table1 options 
>>('COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', 
>>'COMPLEX_DELIMITER_FOR_KEY'='#')

20,channel2,2#user2$100#usercommon
30,channel3,3#user3$100#usercommon
40,channel4,4#user3$100#usercommon

>>select channelId, props[100] from table1 where deviceInformationId > 10;

20, usercommon
30, usercommon
40, usercommon

>>select channelId, props from table1 where props[2] == 'user2';

20, {2,'user2', 100, 'usercommon'}
{code}
Following cases needs to  be handled:

||Sub feature||Pending activity||Remarks||
|Basic Maptype support|Develop| Create table DDL, Load map data from CSV, 
select * from maptable|
|Maptype lookup in projection and filter|Develop|Projection and filters needs 
execution at spark|
|NULL values, UDFs, Describe support|Develop||
|Compaction support | Test + fix | As compaction works at byte level, no 
changes required. Needs to add test-cases|
|Insert into table| Develop | Source table data containing Map data needs to 
convert from spark datatype to string , as carbon takes string as input row |
|Support DDL for Map fields Dictionary include and Dictionary Exclude | Develop 
| Also needs to handle CarbonDictionaryDecoder  to handle the same. |
|Support multilevel Map | Develop | currently DDL is validated to allow only 2 
levels, remove this restriction|
|Support Map value to be a measure | Develop | Currently array and struct 
supports only dimensions which needs change|
|Support Alter table to add and remove Map column | Develop | implement DDL and 
requires default value handling |
|Projections of Map loopup push down to carbon | Develop | this is an 
optimization, when more number of values are present in Map |
|Filter map loolup push down to carbon | Develop | this is an optimization, 
when more number of values are present in Map |
|Update Map values | Develop | update map value|

h4. Design suggestion:

Map can be represented internally stored as Array>, So that 
conversion of data is required to Map data type while giving to spark. Schema 
will have new column of map type similar to Array.




  was:
{code:sql}
>>CREATE TABLE table1 (
 deviceInformationId int,
 channelsId string,
 props map)
  STORED BY 'org.apache.carbondata.format'

>>insert into table1 select 10,'channel1', map(1,'user1',101, 'root')
{code}

format of data to be read from csv, with '$' as level 1 delimiter and map keys 
terminated by '#'

{code:sql}
>>load data local inpath '/tmp/data.csv' into table1 options 
>>('COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', 
>>'COMPLEX_DELIMITER_FOR_KEY'='#')

20,channel2,2#user2$100#usercommon
30,channel3,3#user3$100#usercommon
40,channel4,4#user3$100#usercommon

>>select channelId, props[100] from table1 where deviceInformationId > 10;

20, usercommon
30, usercommon
40, usercommon

>>select channelId, props from table1 where props[2] == 'user2';

20, {2,'user2', 100, 'usercommon'}
{code}
Following cases needs to  be handled:

||Sub feature||Pending activity||Remarks||
|Basic Maptype support|Develop| Create table DDL, Load map data from CSV, 
select * from maptable|
|Maptype lookup in projection and filter|Develop|Projection and filters needs 
execution at spark|
|NULL values, UDFs, Describe support|Develop||
|Compaction support | Test + fix | As compaction works at byte level, no 
changes required. Needs to add test-cases|
|Insert into table| Develop | Source table data containing Map data needs to 
convert from spark datatype to string , as carbon takes string as input row |
|Support DDL for Map fields Dictionary include and Dictionary Exclude | Develop 
| Also needs to handle CarbonDictionaryDecoder  to handle the same. |
|Support multilevel Map | Develop | currently DDL is validated to allow only 2 
levels, remove this restriction|
|Support Map value to be a measure | Develop | Currently supports only 
dimensions |
|Support Alter table to add and remove Map column | Develop | implement DDL and 
requires default value handling |
|Projections of Map loopup push down to carbon | Develop | this is an 
optimization, when more number of values are present in Map |
|Filter map loolup push down to carbon | Develop | this is an optimization, 
when more number of values are present in Map |
|Update Map values | Develop | update

[jira] [Updated] (CARBONDATA-45) Support MAP type

2017-09-20 Thread Venkata Ramana G (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Ramana G updated CARBONDATA-45:
---
Description: 
{code:sql}
>>CREATE TABLE table1 (
 deviceInformationId int,
 channelsId string,
 props map)
  STORED BY 'org.apache.carbondata.format'

>>insert into table1 select 10,'channel1', map(1,'user1',101, 'root')
{code}

format of data to be read from csv, with '$' as level 1 delimiter and map keys 
terminated by '#'

{code:sql}
>>load data local inpath '/tmp/data.csv' into table1 options 
>>('COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', 
>>'COMPLEX_DELIMITER_FOR_KEY'='#')

20,channel2,2#user2$100#usercommon
30,channel3,3#user3$100#usercommon
40,channel4,4#user3$100#usercommon

>>select channelId, props[100] from table1 where deviceInformationId > 10;

20, usercommon
30, usercommon
40, usercommon

>>select channelId, props from table1 where props[2] == 'user2';

20, {2,'user2', 100, 'usercommon'}
{code}
Following cases needs to  be handled:

||Sub feature||Pending activity||Remarks||
|Basic Maptype support|Develop| Create table DDL, Load map data from CSV, 
select * from maptable|
|Maptype lookup in projection and filter|Develop|Projection and filters needs 
execution at spark|
|NULL values, UDFs, Describe support|Develop||
|Compaction support | Test + fix | As compaction works at byte level, no 
changes required. Needs to add test-cases|
|Insert into table| Develop | Source table data containing Map data needs to 
convert from spark datatype to string , as carbon takes string as input row |
|Support DDL for Map fields Dictionary include and Dictionary Exclude | Develop 
| Also needs to handle CarbonDictionaryDecoder  to handle the same. |
|Support multilevel Map | Develop | currently DDL is validated to allow only 2 
levels, remove this restriction|
|Support Map value to be a measure | Develop | Currently supports only 
dimensions |
|Support Alter table to add and remove Map column | Develop | implement DDL and 
requires default value handling |
|Projections of Map loopup push down to carbon | Develop | this is an 
optimization, when more number of values are present in Map |
|Filter map loolup push down to carbon | Develop | this is an optimization, 
when more number of values are present in Map |
|Update Map values | Develop | update map value|

h4. Design suggestion:

Map can be represented internally stored as Array>, So that 
conversion of data is required to Map data type while giving to spark. Schema 
will have new column of map type similar to Array.




  was:

{code:java}
>>CREATE TABLE table1 (
 deviceInformationId int,
 channelsId string,
 props map)
  STORED BY 'org.apache.carbondata.format'

>>insert into table1 select 10,'channel1', map(1,'user1',101, 'root')
{code}

format of data to be read from csv, with '$' as level 1 delimiter and map keys 
terminated by '#'

>>load data local inpath '/tmp/data.csv' into table1 options 
>>('COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', 
>>'COMPLEX_DELIMITER_FOR_KEY'='#')

20,channel2,2#user2$100#usercommon
30,channel3,3#user3$100#usercommon
40,channel4,4#user3$100#usercommon

>>select channelId, props[100] from table1 where deviceInformationId > 10;

20, usercommon
30, usercommon
40, usercommon

>>select channelId, props from table1 where props[2] == 'user2';

20, {2,'user2', 100, 'usercommon'}

Following cases needs to  be handled:

||Sub feature||Pending activity||Remarks||
|Basic Maptype support|Develop| Create table DDL, Load map data from CSV, 
select * from maptable|
|Maptype lookup in projection and filter|Develop|Projection and filters needs 
execution at spark|
|NULL values, UDFs, Describe support|Develop||
|Compaction support | Test + fix | As compaction works at byte level, no 
changes required. Needs to add test-cases|
|Insert into table| Develop | Source table data containing Map data needs to 
convert from spark datatype to string , as carbon takes string as input row |
|Support DDL for Map fields Dictionary include and Dictionary Exclude | Develop 
| Also needs to handle CarbonDictionaryDecoder  to handle the same. |
|Support multilevel Map | Develop | currently DDL is validated to allow only 2 
levels, remove this restriction|
|Support Map value to be a measure | Develop | Currently supports only 
dimensions |
|Support Alter table to add and remove Map column | Develop | implement DDL and 
requires default value handling |
|Projections of Map loopup push down to carbon | Develop | this is an 
optimization, when more number of values are present in Map |
|Filter map loolup push down to carbon | Develop | this is an optimization, 
when more number of values are present in Map |
|Update Map values | Develop | update map value|

Design suggestion:
Map can be represe

[jira] [Updated] (CARBONDATA-45) Support MAP type

2017-09-20 Thread Venkata Ramana G (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Ramana G updated CARBONDATA-45:
---
Description: 

{code:java}
>>CREATE TABLE table1 (
 deviceInformationId int,
 channelsId string,
 props map)
  STORED BY 'org.apache.carbondata.format'

>>insert into table1 select 10,'channel1', map(1,'user1',101, 'root')
{code}

format of data to be read from csv, with '$' as level 1 delimiter and map keys 
terminated by '#'

>>load data local inpath '/tmp/data.csv' into table1 options 
>>('COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', 
>>'COMPLEX_DELIMITER_FOR_KEY'='#')

20,channel2,2#user2$100#usercommon
30,channel3,3#user3$100#usercommon
40,channel4,4#user3$100#usercommon

>>select channelId, props[100] from table1 where deviceInformationId > 10;

20, usercommon
30, usercommon
40, usercommon

>>select channelId, props from table1 where props[2] == 'user2';

20, {2,'user2', 100, 'usercommon'}

Following cases needs to  be handled:

||Sub feature||Pending activity||Remarks||
|Basic Maptype support|Develop| Create table DDL, Load map data from CSV, 
select * from maptable|
|Maptype lookup in projection and filter|Develop|Projection and filters needs 
execution at spark|
|NULL values, UDFs, Describe support|Develop||
|Compaction support | Test + fix | As compaction works at byte level, no 
changes required. Needs to add test-cases|
|Insert into table| Develop | Source table data containing Map data needs to 
convert from spark datatype to string , as carbon takes string as input row |
|Support DDL for Map fields Dictionary include and Dictionary Exclude | Develop 
| Also needs to handle CarbonDictionaryDecoder  to handle the same. |
|Support multilevel Map | Develop | currently DDL is validated to allow only 2 
levels, remove this restriction|
|Support Map value to be a measure | Develop | Currently supports only 
dimensions |
|Support Alter table to add and remove Map column | Develop | implement DDL and 
requires default value handling |
|Projections of Map loopup push down to carbon | Develop | this is an 
optimization, when more number of values are present in Map |
|Filter map loolup push down to carbon | Develop | this is an optimization, 
when more number of values are present in Map |
|Update Map values | Develop | update map value|

Design suggestion:
Map can be represented internally stored as Array>, So that 
conversion of data is required to Map data type while giving to spark. Schema 
will have new column of map type similar to Array.




  was:
We have many tables which use map type, and general file format orc and parquet 
support map type. So can carbondata support map type?

As sql "select map['id'] from table", orc will read all keys in map type. Can 
we just read key 'id' ?





> Support MAP type
> 
>
> Key: CARBONDATA-45
> URL: https://issues.apache.org/jira/browse/CARBONDATA-45
> Project: CarbonData
>  Issue Type: New Feature
>  Components: core, sql
>Reporter: cen yuhai
>Assignee: Venkata Ramana G
> Fix For: 1.3.0
>
>
> {code:java}
> >>CREATE TABLE table1 (
>  deviceInformationId int,
>  channelsId string,
>  props map)
>   STORED BY 'org.apache.carbondata.format'
> >>insert into table1 select 10,'channel1', map(1,'user1',101, 'root')
> {code}
> format of data to be read from csv, with '$' as level 1 delimiter and map 
> keys terminated by '#'
> >>load data local inpath '/tmp/data.csv' into table1 options 
> >>('COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', 
> >>'COMPLEX_DELIMITER_FOR_KEY'='#')
> 20,channel2,2#user2$100#usercommon
> 30,channel3,3#user3$100#usercommon
> 40,channel4,4#user3$100#usercommon
> >>select channelId, props[100] from table1 where deviceInformationId > 10;
> 20, usercommon
> 30, usercommon
> 40, usercommon
> >>select channelId, props from table1 where props[2] == 'user2';
> 20, {2,'user2', 100, 'usercommon'}
> Following cases needs to  be handled:
> ||Sub feature||Pending activity||Remarks||
> |Basic Maptype support|Develop| Create table DDL, Load map data from CSV, 
> select * from maptable|
> |Maptype lookup in projection and filter|Develop|Projection and filters needs 
> execution at spark|
> |NULL values, UDFs, Describe support|Develop||
> |Compaction support | Test + fix | As compaction works at byte level, no 
> changes required. Needs to add test-cases|
> |Insert into table| Develop | Source table data containing Map data needs to 
> convert from spark datatype to string , as carbon takes string as input row |
> |Support DDL for Map fields Dictionary include and Dictionary Exclude | 
> Develop | Also needs to handle CarbonDictionaryDecoder  to handle the same. |
> |Support mult

[GitHub] carbondata issue #1361: [CARBONDATA-1481] Compaction support global sort

2017-09-20 Thread xubo245

Github user xubo245 commented on the issue:

https://github.com/apache/carbondata/pull/1361
  

@QiangCai Please review it


---

[GitHub] carbondata issue #1374: [CARBONDATA-1491] Dictionary_exclude columns are not...

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1374
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/872/



---

[GitHub] carbondata issue #1374: [CARBONDATA-1491] Dictionary_exclude columns are not...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1374
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/241/



---

[jira] [Assigned] (CARBONDATA-1489) Stabilize Array DataType Support

2017-09-20 Thread Venkata Ramana G (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Ramana G reassigned CARBONDATA-1489:


Assignee: Venkata Ramana G

> Stabilize Array DataType Support
> 
>
> Key: CARBONDATA-1489
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1489
> Project: CarbonData
>  Issue Type: Bug
>  Components: core, sql
>Affects Versions: 1.1.1
>Reporter: Venkata Ramana G
>Assignee: Venkata Ramana G
>
> Stabilize Array DataType Support. This is umbrella jira to track all related 
> tasks.
> Following cases needs to  be handled:
> ||Sub feature||Pending activity||Remarks||
> |query, filter, NULL values, UDFs, Describe support|Test + Fix|Implementation 
> in place needs to add test-cases and bug fix|
> |Compaction support | Test + fix | As compaction works at byte level, no 
> changes required. Needs to add test-cases|
> |Insert into table| Develop | Source table data containing Array data needs 
> to convert from spark datatype to string , as carbon takes string as input 
> row |
> |Support DDL for Array fields Dictionary include and Dictionary Exclude | 
> Develop | Also needs to handle CarbonDictionaryDecoder  to handle the same. |
> |Support multilevel Array | Develop | currently DDL is validated to allow 
> only 2 levels, remove this restriction|
> |Support Array value to be a measure | Develop | Currently supports only 
> dimensions |
> |Support Alter table to add and remove Array column | Develop | implement DDL 
> and requires default value handling |
> |Projections of Array values push down to carbon | Develop | this is an 
> optimization, when more number of values are present in Array |
> |Filter Array values push down to carbon | Develop | this is an optimization, 
> when more number of values are present in Array |
> |Update Array values | Develop | update array value|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] carbondata issue #1374: [CARBONDATA-1491] Dictionary_exclude columns are not...

2017-09-20 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1374
  
Build Success with Spark 1.6, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/117/



---

[GitHub] carbondata issue #1374: [CARBONDATA-1491] Dictionary_exclude columns are not...

2017-09-20 Thread dhatchayani

Github user dhatchayani commented on the issue:

https://github.com/apache/carbondata/pull/1374
  
retest this please


---

[GitHub] carbondata issue #1374: [CARBONDATA-1491] Dictionary_exclude columns are not...

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1374
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/871/



---

[GitHub] carbondata issue #1359: [CARBONDATA-1480]Min Max Index Example for DataMap

2017-09-20 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1359
  
@sounakr can you make it simpler. Please add datamap that can just return 
blocklet details with block+blockletid. Lets worn on integration on other PR.


---

[GitHub] carbondata pull request #1359: [CARBONDATA-1480]Min Max Index Example for Da...

2017-09-20 Thread sounakr

Github user sounakr commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1359#discussion_r139890369
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
@@ -42,6 +44,15 @@
*/
   List prune(FilterResolverIntf filterExp);
 
+  /**
+   * Prune the datamap with blockletId. It returns the list of
+   * blocklets where these filters can exist.
+   *
+   * @param filterExp
+   * @param blockletId
+   * @return
+   */
+  List pruneBlockletFromBlockId(FilterResolverIntf filterExp, 
int blockletId);
--- End diff --

BlockletId is the output of Min Max DataMap and the same is passed to 
BlockletDataMap in order to form the complete blocklet. 
Instead of declaring the method pruneBlockletFromBlockId in DataMap, the 
same can be made a local function to blockletId.



---

[GitHub] carbondata pull request #1359: [CARBONDATA-1480]Min Max Index Example for Da...

2017-09-20 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1359#discussion_r139889234
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMap.java ---
@@ -42,6 +44,15 @@
*/
   List prune(FilterResolverIntf filterExp);
 
+  /**
+   * Prune the datamap with blockletId. It returns the list of
+   * blocklets where these filters can exist.
+   *
+   * @param filterExp
+   * @param blockletId
+   * @return
+   */
+  List pruneBlockletFromBlockId(FilterResolverIntf filterExp, 
int blockletId);
--- End diff --

what is blockletId? I don't think this method is required in datamap


---

90 matches

Mail list logo