[GitHub] [carbondata] CarbonDataQA commented on issue #3084: [CARBONDATA-3258] Add more test case for mv datamap

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3084: [CARBONDATA-3258] Add more test case for 
mv datamap
URL: https://github.com/apache/carbondata/pull/3084#issuecomment-497907568
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3635/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3084: [CARBONDATA-3258] Add more test case for mv datamap

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3084: [CARBONDATA-3258] Add more test case for 
mv datamap
URL: https://github.com/apache/carbondata/pull/3084#issuecomment-497905227
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11699/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3084: [CARBONDATA-3258] Add more test case for mv datamap

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3084: [CARBONDATA-3258] Add more test case for 
mv datamap
URL: https://github.com/apache/carbondata/pull/3084#issuecomment-497899912
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3432/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] qiuchenjian commented on a change in pull request #3084: [CARBONDATA-3258] Add more test case for mv datamap

2019-05-31 Thread GitBox
qiuchenjian commented on a change in pull request #3084: [CARBONDATA-3258] Add 
more test case for mv datamap
URL: https://github.com/apache/carbondata/pull/3084#discussion_r289583768
 
 

 ##
 File path: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVValidFunctionTest.scala
 ##
 @@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.mv.rewrite
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+class MVValidFunctionTest extends QueryTest with BeforeAndAfterAll {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] qiuchenjian commented on a change in pull request #3084: [CARBONDATA-3258] Add more test case for mv datamap

2019-05-31 Thread GitBox
qiuchenjian commented on a change in pull request #3084: [CARBONDATA-3258] Add 
more test case for mv datamap
URL: https://github.com/apache/carbondata/pull/3084#discussion_r289583566
 
 

 ##
 File path: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVInvalidTestCase.scala
 ##
 @@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.mv.rewrite
+
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+class MVInvalidTestCase  extends QueryTest with BeforeAndAfterAll {
 
 Review comment:
   I think we should put different scenes in different class, It's ok


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] qiuchenjian commented on a change in pull request #3084: [CARBONDATA-3258] Add more test case for mv datamap

2019-05-31 Thread GitBox
qiuchenjian commented on a change in pull request #3084: [CARBONDATA-3258] Add 
more test case for mv datamap
URL: https://github.com/apache/carbondata/pull/3084#discussion_r289583498
 
 

 ##
 File path: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVValidFunctionTest.scala
 ##
 @@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.mv.rewrite
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+class MVValidFunctionTest extends QueryTest with BeforeAndAfterAll {
+
+  override def beforeAll(): Unit = {
+drop
+sql("create table main_table (name string,age int,height int) stored by 
'carbondata'")
+sql("create table dim_table (name string,age int,height int) stored by 
'carbondata'")
+sql("create table sdr_table (name varchar(20),score int) stored by 
'carbondata'")
+  }
+
+  def drop() {
+sql("drop table if exists main_table")
+sql("drop datamap if exists main_table_mv")
 
 Review comment:
   same as above


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] qiuchenjian commented on a change in pull request #3084: [CARBONDATA-3258] Add more test case for mv datamap

2019-05-31 Thread GitBox
qiuchenjian commented on a change in pull request #3084: [CARBONDATA-3258] Add 
more test case for mv datamap
URL: https://github.com/apache/carbondata/pull/3084#discussion_r289583399
 
 

 ##
 File path: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVExceptionTestCase.scala
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.mv.rewrite
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedDataMapCommandException
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+class MVExceptionTestCase  extends QueryTest with BeforeAndAfterAll {
+  override def beforeAll: Unit = {
+drop()
+sql("create table main_table (name string,age int,height int) stored by 
'carbondata'")
+  }
+
+  test("test mv no base table") {
+val ex = intercept[NoSuchTableException] {
+  sql("create datamap main_table_mv on table main_table_error using 'mv' 
as select sum(age),name from main_table group by name")
+}
+assertResult("Table or view 'main_table_error' not found in database 
'default';")(ex.getMessage())
+  }
+
+  test("test mv reduplicate mv table") {
+val ex = intercept[MalformedDataMapCommandException] {
+  sql("create datamap main_table_mv1 on table main_table using 'mv' as 
select sum(age),name from main_table group by name")
+  sql("create datamap main_table_mv1 on table main_table using 'mv' as 
select sum(age),name from main_table group by name")
+}
+assertResult("DataMap with name main_table_mv1 already exists in 
storage")(ex.getMessage)
+  }
+
+  def drop(): Unit = {
+sql("drop table IF EXISTS main_table")
+sql("drop table if exists main_table_error")
+sql("drop datamap if exists main_table_mv")
 
 Review comment:
   No,  it's for this datamap affect other table, not this testcase's main 
table,thus it is needed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497845883
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3634/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497840788
 
 
   Build Failed  with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11698/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497807607
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3431/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497803722
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11697/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497788764
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3633/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, 
Sum query failure when MV is created on single projection column
URL: https://github.com/apache/carbondata/pull/3249#issuecomment-497773116
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11695/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3253: Carbondata 3410 support udfsql function for binary

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3253: Carbondata 3410 support udfsql function 
for binary
URL: https://github.com/apache/carbondata/pull/3253#issuecomment-497768304
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11693/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession partition support binary data type

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession 
partition support binary data type
URL: https://github.com/apache/carbondata/pull/3251#issuecomment-497767402
 
 
   Build Failed  with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11694/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, 
Sum query failure when MV is created on single projection column
URL: https://github.com/apache/carbondata/pull/3249#issuecomment-497765183
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3629/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497765185
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3430/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497751990
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3429/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3202: [CARBONDATA-3350] Enhance custom compaction to resort old single segment by new sort_columns

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3202: [CARBONDATA-3350] Enhance custom 
compaction to resort old single segment by new sort_columns
URL: https://github.com/apache/carbondata/pull/3202#issuecomment-497747606
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11692/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession partition support binary data type

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession 
partition support binary data type
URL: https://github.com/apache/carbondata/pull/3251#issuecomment-497747050
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3631/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3253: Carbondata 3410 support udfsql function for binary

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3253: Carbondata 3410 support udfsql function 
for binary
URL: https://github.com/apache/carbondata/pull/3253#issuecomment-497745986
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3630/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497737731
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11691/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497733079
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3428/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession partition support binary data type

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession 
partition support binary data type
URL: https://github.com/apache/carbondata/pull/3251#issuecomment-497730373
 
 
   Build Failed  with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11690/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, 
Sum query failure when MV is created on single projection column
URL: https://github.com/apache/carbondata/pull/3249#issuecomment-497729366
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3427/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession partition support binary data type

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession 
partition support binary data type
URL: https://github.com/apache/carbondata/pull/3251#issuecomment-497725968
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3426/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3253: Carbondata 3410 support udfsql function for binary

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3253: Carbondata 3410 support udfsql function 
for binary
URL: https://github.com/apache/carbondata/pull/3253#issuecomment-497722645
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3425/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497720633
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3628/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3252: [CARBONDATA-3409] Fix Concurrent dataloading Issue with mv

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3252: [CARBONDATA-3409] Fix Concurrent 
dataloading Issue with mv
URL: https://github.com/apache/carbondata/pull/3252#issuecomment-497719424
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11689/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession partition support binary data type

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession 
partition support binary data type
URL: https://github.com/apache/carbondata/pull/3251#issuecomment-497717751
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3627/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3202: [CARBONDATA-3350] Enhance custom compaction to resort old single segment by new sort_columns

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3202: [CARBONDATA-3350] Enhance custom 
compaction to resort old single segment by new sort_columns
URL: https://github.com/apache/carbondata/pull/3202#issuecomment-497713237
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3625/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, 
Sum query failure when MV is created on single projection column
URL: https://github.com/apache/carbondata/pull/3249#issuecomment-497709086
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11687/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3202: [CARBONDATA-3350] Enhance custom compaction to resort old single segment by new sort_columns

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3202: [CARBONDATA-3350] Enhance custom 
compaction to resort old single segment by new sort_columns
URL: https://github.com/apache/carbondata/pull/3202#issuecomment-497705949
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11688/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497698232
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11686/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3252: [CARBONDATA-3409] Fix Concurrent dataloading Issue with mv

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3252: [CARBONDATA-3409] Fix Concurrent 
dataloading Issue with mv
URL: https://github.com/apache/carbondata/pull/3252#issuecomment-497697434
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3626/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] xubo245 opened a new pull request #3253: Carbondata 3410 support udfsql function for binary

2019-05-31 Thread GitBox
xubo245 opened a new pull request #3253: Carbondata 3410 support udfsql 
function for binary
URL: https://github.com/apache/carbondata/pull/3253
 
 
   Be sure to do all of the following checklist to help us incorporate 
   your contribution quickly and easily:
   
- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?
   
- [ ] Testing done
   Please provide details on 
   - Whether new unit test cases have been added or why no new tests 
are required?
   - How it is tested? Please attach test report.
   - Is it a performance related change? Please attach the performance 
test report.
   - Any additional information to help reviewers in testing this 
change.
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column

2019-05-31 Thread GitBox
akashrn5 commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum 
query failure when MV is created on single projection column
URL: https://github.com/apache/carbondata/pull/3249#issuecomment-497694086
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, 
Sum query failure when MV is created on single projection column
URL: https://github.com/apache/carbondata/pull/3249#issuecomment-497693575
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3624/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (CARBONDATA-3410) Add UDF, Hex/Base64 SQL functions for binary

2019-05-31 Thread xubo245 (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852972#comment-16852972
 ] 

xubo245 commented on CARBONDATA-3410:
-

CREATE TABLE uniqdata (CUST_ID int,CUST_NAME binary,ACTIVE_EMUI_VERSION string, 
DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) STORED BY 'org.apache.carbondata.format' 
TBLPROPERTIES('table_blocksize'='2000');
LOAD DATA inpath 'hdfs://hacluster/chetan/2000_UniqData.csv' into table 
uniqdata OPTIONS('DELIMITER'=',' 
,'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

Select query with average function for substring of binary column is executed.

select 
max(substr(CUST_NAME,1,2)),min(substr(CUST_NAME,1,2)),avg(substr(CUST_NAME,1,2)),count(substr(CUST_NAME,1,2)),sum(substr(CUST_NAME,1,2)),variance(substr(CUST_NAME,1,2))
 from uniqdata where CUST_ID IS NULL or DOB IS NOT NULL or BIGINT_COLUMN1 
=1233720368578 or DECIMAL_COLUMN1 = 12345678901.123458 or Double_COLUMN1 = 
1.12345674897976E10 or INTEGER_COLUMN1 IS NULL limit 10;

select 
max(substring(CUST_NAME,1,2)),min(substring(CUST_NAME,1,2)),avg(substring(CUST_NAME,1,2)),count(substring(CUST_NAME,1,2)),sum(substring(CUST_NAME,1,2)),variance(substring(CUST_NAME,1,2))
 from uniqdata where CUST_ID IS NULL or DOB IS NOT NULL or BIGINT_COLUMN1 
=1233720368578 or DECIMAL_COLUMN1 = 12345678901.123458 or Double_COLUMN1 = 
1.12345674897976E10 or INTEGER_COLUMN1 IS NULL limit 10;

> Add UDF, Hex/Base64 SQL functions for binary
> 
>
> Key: CARBONDATA-3410
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3410
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>
> Add UDF, Hex/Base64 SQL functions for binary



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3410) Add UDF, Hex/Base64 SQL functions for binary

2019-05-31 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3410:
---

 Summary: Add UDF, Hex/Base64 SQL functions for binary
 Key: CARBONDATA-3410
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3410
 Project: CarbonData
  Issue Type: Sub-task
Reporter: xubo245
Assignee: xubo245


Add UDF, Hex/Base64 SQL functions for binary



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [carbondata] CarbonDataQA commented on issue #3202: [CARBONDATA-3350] Enhance custom compaction to resort old single segment by new sort_columns

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3202: [CARBONDATA-3350] Enhance custom 
compaction to resort old single segment by new sort_columns
URL: https://github.com/apache/carbondata/pull/3202#issuecomment-497692048
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3424/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (CARBONDATA-3351) Support Binary Data Type

2019-05-31 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 resolved CARBONDATA-3351.
-
Resolution: Fixed

> Support Binary Data Type
> 
>
> Key: CARBONDATA-3351
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3351
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>  Time Spent: 35h
>  Remaining Estimate: 0h
>
> Background :
> Binary is basic data type and widely used in various scenarios. So it’s 
> better to support binary data type in CarbonData. Download data from S3 will 
> be slow when dataset has lots of small binary data. The majority of 
> application scenarios are  related to storage small binary data type into 
> CarbonData, which can avoid small binary files problem and speed up S3 access 
> performance, also can decrease cost of accessing OBS by decreasing the number 
> of calling S3 API. It also will easier to manage structure data and 
> Unstructured data(binary) by storing them into CarbonData. 
> Goals:
> 1. Supporting write binary data type by Carbon Java SDK.
> 2. Supporting read binary data type by Spark Carbon file format(carbon 
> datasource) and CarbonSession.
> 3. Supporting read binary data type by Carbon SDK
> 4. Supporting write binary by spark
> Approach and Detail:
>   1.Supporting write binary data type by Carbon Java SDK [Formal]:
>   1.1 Java SDK needs support write data with specific data types, 
> like int, double, byte[ ] data type, no need to convert all data type to 
> string array. User read binary file as byte[], then SDK writes byte[] into 
> binary column.=>Done
>   1.2 CarbonData compress binary column because now the compressor is 
> table level.=>Done
>   1.3 CarbonData stores binary as dimension. => Done
>   1.4 Support configure page size for binary data type because binary 
> data usually is big, such as 200k. Otherwise it will be very big for one 
> blocklet (32000 rows). =>Done
>   1.5 Avro, JSON convert need consider
>   •   AVRO fixed and variable length binary can be supported
>   => Avro don't support binary data type => No 
> need
>Support read binary from JSON  => done.
>   1.6 Binay data type as a child columns in Struct, Map   
>   
>=> support it in the future, but priority is not very 
> high, not in 1.5.4
>   1.7 Verify what is the maximum size of the binary value supportred  
> => snappy only support about 1.71 G, the max data size should be 2 GB, 
> but need confirm
>   
>   2. Supporting read and manage binary data type by Spark Carbon file 
> format(carbon DataSource) and CarbonSession.[Formal]
>   2.1 Supporting read binary data type from non-transaction table, 
> read binary column and return as byte[] =>Done
>   2.2 Support create table with binary column, table property doesn’t 
> support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
> column => Done
>=> CARBON Datasource don't support dictionary include column
>=>support  carbon.column.compressor= snappy,zstd,gzip for binary, 
> compress is for all columns(table level)
>   2.3 Support CTAS for binary=> transaction/non-transaction,  
> Carbon/Hive/Parquet => Done 
>   2.4 Support external table for binary=> Done
>   2.5 Support projection for binary column=> Done
>   2.6 Support desc formatted=> Done
>=> Carbon Datasource don't support  ALTER TABLE add 
> columns sql
>support  ALTER TABLE for(add column, rename, drop column) 
> binary data type in carbon session=> Done
>Don't support change the data type for binary by alter 
> table => Done
>   2.7 Don’t support PARTITION, BUCKETCOLUMNS  for binary  => Done
>   2.8 Support compaction for binary=> Done
>   2.9 datamap? Don’t support bloomfilter, lucene, timeseries datamap, 
>  no need min max datamap for binary, support mv and pre-aggregate in the 
> future=> TODO
>   2.10 CSDK / python SDK support binary in the future.=> TODO
>   2.11 Support S3=> Done
> 2.12 support UDF, hex, base64, cast:.=> TODO
>select hex(bin) from carbon_table..=> TODO
> 
> 2.15 support filter for binary => Done
> 2.16 select CAST(s AS BINARY) from carbon_table. => Done
>   3. Supporting read binary data type by Carbon SDK
>   3.1 Supporting read binary data type from non-transaction table, 
> read binary column and return as byte[]=> Done
>   3.2 Supporting projection for binary column=> Done
>  

[GitHub] [carbondata] CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, 
Sum query failure when MV is created on single projection column
URL: https://github.com/apache/carbondata/pull/3249#issuecomment-497689460
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3423/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession partition support binary data type

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession 
partition support binary data type
URL: https://github.com/apache/carbondata/pull/3251#issuecomment-497686992
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3422/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3252: [CARBONDATA-3409] Fix Concurrent dataloading Issue with mv

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3252: [CARBONDATA-3409] Fix Concurrent 
dataloading Issue with mv
URL: https://github.com/apache/carbondata/pull/3252#issuecomment-497684524
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3421/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, 
Sum query failure when MV is created on single projection column
URL: https://github.com/apache/carbondata/pull/3249#issuecomment-497682090
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11681/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3202: [CARBONDATA-3350] Enhance custom compaction to resort old single segment by new sort_columns

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3202: [CARBONDATA-3350] Enhance custom 
compaction to resort old single segment by new sort_columns
URL: https://github.com/apache/carbondata/pull/3202#issuecomment-497680694
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11683/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession partition support binary data type

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession 
partition support binary data type
URL: https://github.com/apache/carbondata/pull/3251#issuecomment-497679861
 
 
   Build Failed  with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11685/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497678953
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3623/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-05-31 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.=>Done
1.2 CarbonData compress binary column because now the compressor is 
table level.=>Done
=>TODO, support configuration for compress  and no compress, 
default no compress because binary usually is already compressed, like jpg 
format image. So no need to uncompress for binary column. 1.5.4 will support 
column level compression, after that, we can implement no compress for binary. 
We can talk with community.
1.3 CarbonData stores binary as dimension. => Done
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows). =>Done
1.5 Avro, JSON convert need consider
•   AVRO fixed and variable length binary can be supported
=> Avro don't support binary data type => No 
need
 Support read binary from JSON  => done.
1.6 Binay data type as a child columns in Struct, Map   
  
 => support it in the future, but priority is not very 
high, not in 1.5.4
1.7 Verify what is the maximum size of the binary value supportred  
=> snappy only support about 1.71 G, the max data size should be 2 GB, but 
need confirm


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[] =>Done
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column => Done
   => CARBON Datasource don't support dictionary include column
   =>support  carbon.column.compressor= snappy,zstd,gzip for binary, 
compress is for all columns(table level)
2.3 Support CTAS for binary=> transaction/non-transaction,  
Carbon/Hive/Parquet => Done 
2.4 Support external table for binary=> Done
2.5 Support projection for binary column=> Done
2.6 Support desc formatted=> Done
   => Carbon Datasource don't support  ALTER TABLE add columns 
sql
   support  ALTER TABLE for(add column, rename, drop column) 
binary data type in carbon session=> Done
   Don't support change the data type for binary by alter table 
=> Done
2.7 Don’t BUCKETCOLUMNS  for binary => Done
2.8 Support compaction for binary=> Done
2.9 datamap
Support bloomfilter,mv and pre-aggregate
Don’t support lucene, timeseries datamap,  no need min max 
datamap for binary
=>Done
2.10 CSDK / python SDK support binary in the future.=> TODO, python 
sdk already merge to pycarbon
2.11 Support S3=> Done
2.12 support UDF, hex, base64, cast:.=> TODO
   select hex(bin) from carbon_table..=> TODO
  
2.13 support configurable decode for query, support base64 and Hex 
decode.=> Done
2.15 How big data size binary data type can support for writing and 
reading?=> TODO
2.16 support filter for binary => Done
2.17 select CAST(s AS BINARY) from carbon_table. => 

[GitHub] [carbondata] xubo245 commented on issue #3251: [CARBONDATA-3408] CarbonSession partition support binary data type

2019-05-31 Thread GitBox
xubo245 commented on issue #3251: [CARBONDATA-3408] CarbonSession partition 
support binary data type
URL: https://github.com/apache/carbondata/pull/3251#issuecomment-497677622
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497675051
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3618/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession partition support binary data type

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession 
partition support binary data type
URL: https://github.com/apache/carbondata/pull/3251#issuecomment-497674393
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3621/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] Indhumathi27 opened a new pull request #3252: [CARBONDATA-3409] Fix Concurrent dataloading Issue with mv

2019-05-31 Thread GitBox
Indhumathi27 opened a new pull request #3252: [CARBONDATA-3409] Fix Concurrent 
dataloading Issue with mv
URL: https://github.com/apache/carbondata/pull/3252
 
 
   Problem:
   While performing concurrent dataloading to MV datamap, if any of the loads 
was not able to get TableStatusLock, then because newLoadName and segmentMap 
was empty, it was doing full rebuild.
   
   Solution:
   If load was not able to take tablestatuslock, then disable the datamap and 
return
   
- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?
   
- [ ] Testing done
   manually tested
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497672940
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3420/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] ajantha-bhat commented on issue #3202: [CARBONDATA-3350] Enhance custom compaction to resort old single segment by new sort_columns

2019-05-31 Thread GitBox
ajantha-bhat commented on issue #3202: [CARBONDATA-3350] Enhance custom 
compaction to resort old single segment by new sort_columns
URL: https://github.com/apache/carbondata/pull/3202#issuecomment-497672139
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] ravipesala commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
ravipesala commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497670305
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497670219
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11682/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] asfgit closed pull request #3247: [CARBONDATA-3405] Fix getSplits() should clear the cache in SDK

2019-05-31 Thread GitBox
asfgit closed pull request #3247: [CARBONDATA-3405] Fix getSplits() should 
clear the cache in SDK
URL: https://github.com/apache/carbondata/pull/3247
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (CARBONDATA-3405) SDK reader getSplits() must clear the cache.

2019-05-31 Thread Kunal Kapoor (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3405.
--
   Resolution: Fixed
Fix Version/s: 1.6.0

> SDK reader getSplits() must clear the cache. 
> -
>
> Key: CARBONDATA-3405
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3405
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ajantha Bhat
>Priority: Minor
> Fix For: 1.6.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> a. cache key is not filled during sdk reader, its always with null table 
> name. fill this
> b. clear the cache, after splits are obtained in getSplits()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [carbondata] ajantha-bhat commented on issue #3202: [CARBONDATA-3350] Enhance custom compaction to resort old single segment by new sort_columns

2019-05-31 Thread GitBox
ajantha-bhat commented on issue #3202: [CARBONDATA-3350] Enhance custom 
compaction to resort old single segment by new sort_columns
URL: https://github.com/apache/carbondata/pull/3202#issuecomment-497668660
 
 
   @QiangCai : build has failed , please check


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column

2019-05-31 Thread GitBox
akashrn5 commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum 
query failure when MV is created on single projection column
URL: https://github.com/apache/carbondata/pull/3249#issuecomment-497663076
 
 
   > @akashrn5 Please rebase
   
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] ravipesala commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column

2019-05-31 Thread GitBox
ravipesala commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum 
query failure when MV is created on single projection column
URL: https://github.com/apache/carbondata/pull/3249#issuecomment-497660661
 
 
   @akashrn5 Please rebase


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] ravipesala commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column

2019-05-31 Thread GitBox
ravipesala commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum 
query failure when MV is created on single projection column
URL: https://github.com/apache/carbondata/pull/3249#issuecomment-497660519
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (CARBONDATA-3409) Fix Concurrent dataloading Issue with mv

2019-05-31 Thread Indhumathi Muthumurugesh (JIRA)
Indhumathi Muthumurugesh created CARBONDATA-3409:


 Summary: Fix Concurrent dataloading Issue with mv
 Key: CARBONDATA-3409
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3409
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [carbondata] kunal642 commented on a change in pull request #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
kunal642 commented on a change in pull request #3245: [CARBONDATA-3398] Handled 
show cache for index server and MV
URL: https://github.com/apache/carbondata/pull/3245#discussion_r289334912
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedShowCacheRDD.scala
 ##
 @@ -43,15 +44,25 @@ class DistributedShowCacheRDD(@transient private val ss: 
SparkSession, tableName
 
   override def internalCompute(split: Partition, context: TaskContext): 
Iterator[String] = {
 val dataMaps = DataMapStoreManager.getInstance().getAllDataMaps.asScala
+val tableList = tableName.split(",")
 val iterator = dataMaps.collect {
   case (table, tableDataMaps) if table.isEmpty ||
- (tableName.nonEmpty && 
tableName.equalsIgnoreCase(table)) =>
+ (tableName.nonEmpty && 
tableList.contains(table)) =>
 val sizeAndIndexLengths = tableDataMaps.asScala
-  .map(_.getBlockletDetailsFetcher.getCacheSize)
-// return tableName_indexFileLength_indexCachesize for each executor.
-sizeAndIndexLengths.map {
-  x => s"$table:$x"
-}
+  .map { dataMap =>
+if (!dataMap.getDataMapSchema.getProviderName
+  .equals(DataMapClassProvider.BLOOMFILTER.getShortName)) {
 
 Review comment:
   changed the condition to check if dataMapName is null (for BlockletDataMap) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] kunal642 commented on a change in pull request #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
kunal642 commented on a change in pull request #3245: [CARBONDATA-3398] Handled 
show cache for index server and MV
URL: https://github.com/apache/carbondata/pull/3245#discussion_r289334935
 
 

 ##
 File path: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datamap/CGDataMapTestCase.scala
 ##
 @@ -559,4 +559,6 @@ class CGDataMapTestCase extends QueryTest with 
BeforeAndAfterAll {
 CarbonCommonConstants.ENABLE_QUERY_STATISTICS_DEFAULT)
   }
 
+
+
 
 Review comment:
   removed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] kunal642 commented on a change in pull request #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
kunal642 commented on a change in pull request #3245: [CARBONDATA-3398] Handled 
show cache for index server and MV
URL: https://github.com/apache/carbondata/pull/3245#discussion_r289334957
 
 

 ##
 File path: 
datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapFactory.java
 ##
 @@ -112,4 +112,5 @@ public DataMapLevel getDataMapLevel() {
 return false;
 }
   }
+
 
 Review comment:
   reverted


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3202: [CARBONDATA-3350] Enhance custom compaction to resort old single segment by new sort_columns

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3202: [CARBONDATA-3350] Enhance custom 
compaction to resort old single segment by new sort_columns
URL: https://github.com/apache/carbondata/pull/3202#issuecomment-497657356
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3619/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497655508
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3419/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (CARBONDATA-3403) MV is not working for like and filter AND and OR queries

2019-05-31 Thread Ravindra Pesala (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala resolved CARBONDATA-3403.
-
   Resolution: Fixed
Fix Version/s: 1.6.0

> MV is not working for like and filter AND and OR queries
> 
>
> Key: CARBONDATA-3403
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3403
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Priority: Minor
> Fix For: 1.6.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> MV is not working for like and filter AND and OR queries
>  
> Steps:
> create table brinjal (imei string,AMSize string,channelsId 
> string,ActiveCountry string, Activecity string,gamePointId 
> double,deviceInformationId double,productionDate Timestamp,deliveryDate 
> timestamp,deliverycharge double) STORED BY 'org.apache.carbondata.format' ;
>  
> create datamap brinjal_mv_tab_nlz_aa016 on table brinjal using 'mv' as select 
> imei,AMSize,channelsId from brinjal where ActiveCountry NOT LIKE 'US' group 
> by imei,AMSize,channelsId;
> create datamap brinjal_mv_tab_nlz_aa018 on table brinjal using 'mv' as select 
> imei,AMSize,channelsId,ActiveCountry from brinjal where ActiveCountry 
> ='Chinese' or channelsId =4 group by imei,AMSize,channelsId,ActiveCountry;
>  
> then 
> select imei,AMSize,channelsId from brinjal where ActiveCountry NOT LIKE 'US' 
> group by imei,AMSize,channelsId; and 
>   select imei,AMSize,channelsId,ActiveCountry from brinjal where 
> ActiveCountry ='Chinese' or channelsId =4 group by 
> imei,AMSize,channelsId,ActiveCountry;
> are not hitting the datamap cretaed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [carbondata] CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession partition support binary data type

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession 
partition support binary data type
URL: https://github.com/apache/carbondata/pull/3251#issuecomment-497654952
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3418/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] asfgit closed pull request #3242: [CARBONDATA-3403]Fix MV is not working for like and filter AND and OR queries

2019-05-31 Thread GitBox
asfgit closed pull request #3242: [CARBONDATA-3403]Fix MV is not working for 
like and filter AND and OR queries
URL: https://github.com/apache/carbondata/pull/3242
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3249: [CARBONDATA-3407]Fix distinct, count, 
Sum query failure when MV is created on single projection column
URL: https://github.com/apache/carbondata/pull/3249#issuecomment-497654447
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3617/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (CARBONDATA-3336) Support Binary Data Type

2019-05-31 Thread xubo245 (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xubo245 updated CARBONDATA-3336:

Description: 
CarbonData supports binary data type



Version Changes Owner   Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10

Background :
Binary is basic data type and widely used in various scenarios. So it’s better 
to support binary data type in CarbonData. Download data from S3 will be slow 
when dataset has lots of small binary data. The majority of application 
scenarios are  related to storage small binary data type into CarbonData, which 
can avoid small binary files problem and speed up S3 access performance, also 
can decrease cost of accessing OBS by decreasing the number of calling S3 API. 
It also will easier to manage structure data and Unstructured data(binary) by 
storing them into CarbonData. 

Goals:
1. Supporting write binary data type by Carbon Java SDK.
2. Supporting read binary data type by Spark Carbon file format(carbon 
datasource) and CarbonSession.
3. Supporting read binary data type by Carbon SDK
4. Supporting write binary by spark


Approach and Detail:
1.Supporting write binary data type by Carbon Java SDK [Formal]:
1.1 Java SDK needs support write data with specific data types, 
like int, double, byte[ ] data type, no need to convert all data type to string 
array. User read binary file as byte[], then SDK writes byte[] into binary 
column.=>Done
1.2 CarbonData compress binary column because now the compressor is 
table level.=>Done
=>TODO, support configuration for compress  and no compress, 
default no compress because binary usually is already compressed, like jpg 
format image. So no need to uncompress for binary column. 1.5.4 will support 
column level compression, after that, we can implement no compress for binary. 
We can talk with community.
1.3 CarbonData stores binary as dimension. => Done
1.4 Support configure page size for binary data type because binary 
data usually is big, such as 200k. Otherwise it will be very big for one 
blocklet (32000 rows). =>Done
1.5 Avro, JSON convert need consider
•   AVRO fixed and variable length binary can be supported
=> Avro don't support binary data type => No 
need
 Support read binary from JSON  => done.
1.6 Binay data type as a child columns in Struct, Map   
  
 => support it in the future, but priority is not very 
high, not in 1.5.4
1.7 Verify what is the maximum size of the binary value supportred  
=> snappy only support about 1.71 G, the max data size should be 2 GB, but 
need confirm


2. Supporting read and manage binary data type by Spark Carbon file 
format(carbon DataSource) and CarbonSession.[Formal]
2.1 Supporting read binary data type from non-transaction table, 
read binary column and return as byte[] =>Done
2.2 Support create table with binary column, table property doesn’t 
support sort_columns, dictionary, COLUMN_META_CACHE, RANGE_COLUMN for binary 
column => Done
   => CARBON Datasource don't support dictionary include column
   =>support  carbon.column.compressor= snappy,zstd,gzip for binary, 
compress is for all columns(table level)
2.3 Support CTAS for binary=> transaction/non-transaction,  
Carbon/Hive/Parquet => Done 
2.4 Support external table for binary=> Done
2.5 Support projection for binary column=> Done
2.6 Support desc formatted=> Done
   => Carbon Datasource don't support  ALTER TABLE add columns 
sql
   support  ALTER TABLE for(add column, rename, drop column) 
binary data type in carbon session=> Done
   Don't support change the data type for binary by alter table 
=> Done
2.7 Don’t support PARTITION, BUCKETCOLUMNS  for binary  => Done
2.8 Support compaction for binary=> Done
2.9 datamap
Support bloomfilter,mv and pre-aggregate
Don’t support lucene, timeseries datamap,  no need min max 
datamap for binary
=>Done
2.10 CSDK / python SDK support binary in the future.=> TODO, python 
sdk already merge to pycarbon
2.11 Support S3=> Done
2.12 support UDF, hex, base64, cast:.=> TODO
   select hex(bin) from carbon_table..=> TODO
  
2.13 support configurable decode for query, support base64 and Hex 
decode.=> Done
2.15 How big data size binary data type can support for writing and 
reading?=> TODO
2.16 support filter for binary => Done
2.17 select CAST(s AS BINARY) 

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3084: [CARBONDATA-3258] Add more test case for mv datamap

2019-05-31 Thread GitBox
akashrn5 commented on a change in pull request #3084: [CARBONDATA-3258] Add 
more test case for mv datamap
URL: https://github.com/apache/carbondata/pull/3084#discussion_r289324774
 
 

 ##
 File path: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVValidFunctionTest.scala
 ##
 @@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.mv.rewrite
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+class MVValidFunctionTest extends QueryTest with BeforeAndAfterAll {
 
 Review comment:
   Can you give the test class some better name?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3084: [CARBONDATA-3258] Add more test case for mv datamap

2019-05-31 Thread GitBox
akashrn5 commented on a change in pull request #3084: [CARBONDATA-3258] Add 
more test case for mv datamap
URL: https://github.com/apache/carbondata/pull/3084#discussion_r289324316
 
 

 ##
 File path: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVValidFunctionTest.scala
 ##
 @@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.mv.rewrite
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+class MVValidFunctionTest extends QueryTest with BeforeAndAfterAll {
+
+  override def beforeAll(): Unit = {
+drop
+sql("create table main_table (name string,age int,height int) stored by 
'carbondata'")
+sql("create table dim_table (name string,age int,height int) stored by 
'carbondata'")
+sql("create table sdr_table (name varchar(20),score int) stored by 
'carbondata'")
+  }
+
+  def drop() {
+sql("drop table if exists main_table")
+sql("drop datamap if exists main_table_mv")
 
 Review comment:
   same as above, you can remove drop datamap query from here


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3084: [CARBONDATA-3258] Add more test case for mv datamap

2019-05-31 Thread GitBox
akashrn5 commented on a change in pull request #3084: [CARBONDATA-3258] Add 
more test case for mv datamap
URL: https://github.com/apache/carbondata/pull/3084#discussion_r289323471
 
 

 ##
 File path: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVExceptionTestCase.scala
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.mv.rewrite
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedDataMapCommandException
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+class MVExceptionTestCase  extends QueryTest with BeforeAndAfterAll {
+  override def beforeAll: Unit = {
+drop()
+sql("create table main_table (name string,age int,height int) stored by 
'carbondata'")
+  }
+
+  test("test mv no base table") {
+val ex = intercept[NoSuchTableException] {
+  sql("create datamap main_table_mv on table main_table_error using 'mv' 
as select sum(age),name from main_table group by name")
+}
+assertResult("Table or view 'main_table_error' not found in database 
'default';")(ex.getMessage())
+  }
+
+  test("test mv reduplicate mv table") {
+val ex = intercept[MalformedDataMapCommandException] {
+  sql("create datamap main_table_mv1 on table main_table using 'mv' as 
select sum(age),name from main_table group by name")
+  sql("create datamap main_table_mv1 on table main_table using 'mv' as 
select sum(age),name from main_table group by name")
+}
+assertResult("DataMap with name main_table_mv1 already exists in 
storage")(ex.getMessage)
+  }
+
+  def drop(): Unit = {
+sql("drop table IF EXISTS main_table")
+sql("drop table if exists main_table_error")
+sql("drop datamap if exists main_table_mv")
 
 Review comment:
   you can remove  drop datamap here, as main table drop will take care to drop 
datamaps also


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on a change in pull request #3084: [CARBONDATA-3258] Add more test case for mv datamap

2019-05-31 Thread GitBox
akashrn5 commented on a change in pull request #3084: [CARBONDATA-3258] Add 
more test case for mv datamap
URL: https://github.com/apache/carbondata/pull/3084#discussion_r289323936
 
 

 ##
 File path: 
datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVInvalidTestCase.scala
 ##
 @@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.mv.rewrite
+
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+class MVInvalidTestCase  extends QueryTest with BeforeAndAfterAll {
 
 Review comment:
   i suggest to add both MVExceptionTestCase.scala tests and this class test in 
one class only as MVdataMapValidationTests.scala, it is much better


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession partition support binary data type

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession 
partition support binary data type
URL: https://github.com/apache/carbondata/pull/3251#issuecomment-497641485
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3417/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession partition support binary data type

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3251: [CARBONDATA-3408] CarbonSession 
partition support binary data type
URL: https://github.com/apache/carbondata/pull/3251#issuecomment-497641208
 
 
   Build Failed  with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11684/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] xubo245 opened a new pull request #3251: [CARBONDATA-3408] CarbonSession partition support binary data type

2019-05-31 Thread GitBox
xubo245 opened a new pull request #3251: [CARBONDATA-3408] CarbonSession 
partition support binary data type
URL: https://github.com/apache/carbondata/pull/3251
 
 
   Be sure to do all of the following checklist to help us incorporate 
   your contribution quickly and easily:
   
- [ ] Any interfaces changed?
No
- [ ] Any backward compatibility impacted?
NA
- [ ] Document update required?
   Yes
- [ ] Testing done
   Please provide details on 
   - Whether new unit test cases have been added or why no new tests 
are required?
   - How it is tested? Please attach test report.
   - Is it a performance related change? Please attach the performance 
test report.
   - Any additional information to help reviewers in testing this 
change.
  Added
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
   JIRA-3336
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] ravipesala commented on issue #3242: [CARBONDATA-3403]Fix MV is not working for like and filter AND and OR queries

2019-05-31 Thread GitBox
ravipesala commented on issue #3242: [CARBONDATA-3403]Fix MV is not working for 
like and filter AND and OR queries
URL: https://github.com/apache/carbondata/pull/3242#issuecomment-497639394
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497639355
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3415/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3202: [CARBONDATA-3350] Enhance custom compaction to resort old single segment by new sort_columns

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3202: [CARBONDATA-3350] Enhance custom 
compaction to resort old single segment by new sort_columns
URL: https://github.com/apache/carbondata/pull/3202#issuecomment-497639174
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3416/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (CARBONDATA-3408) CarbonSession partition support binary data type

2019-05-31 Thread xubo245 (JIRA)
xubo245 created CARBONDATA-3408:
---

 Summary: CarbonSession partition support binary data type
 Key: CARBONDATA-3408
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3408
 Project: CarbonData
  Issue Type: Sub-task
Reporter: xubo245
Assignee: xubo245


CarbonSession partition support binary data type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [carbondata] kumarvishal09 commented on issue #3247: [CARBONDATA-3405] Fix getSplits() should clear the cache in SDK

2019-05-31 Thread GitBox
kumarvishal09 commented on issue #3247: [CARBONDATA-3405] Fix getSplits() 
should clear the cache in SDK
URL: https://github.com/apache/carbondata/pull/3247#issuecomment-497636518
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3084: [CARBONDATA-3258] Add more test case for mv datamap

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3084: [CARBONDATA-3258] Add more test case for 
mv datamap
URL: https://github.com/apache/carbondata/pull/3084#issuecomment-497635726
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11679/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3084: [CARBONDATA-3258] Add more test case for mv datamap

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3084: [CARBONDATA-3258] Add more test case for 
mv datamap
URL: https://github.com/apache/carbondata/pull/3084#issuecomment-497635709
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3615/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3247: [CARBONDATA-3405] Fix getSplits() should clear the cache in SDK

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3247: [CARBONDATA-3405] Fix getSplits() should 
clear the cache in SDK
URL: https://github.com/apache/carbondata/pull/3247#issuecomment-497635427
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3612/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai commented on issue #3202: [CARBONDATA-3350] Enhance custom compaction to resort old single segment by new sort_columns

2019-05-31 Thread GitBox
QiangCai commented on issue #3202: [CARBONDATA-3350] Enhance custom compaction 
to resort old single segment by new sort_columns
URL: https://github.com/apache/carbondata/pull/3202#issuecomment-497630877
 
 
   @ravipesala already squash commits to one, and rebase master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3249: [wip]check

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3249: [wip]check
URL: https://github.com/apache/carbondata/pull/3249#issuecomment-497627009
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3414/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (CARBONDATA-3407) distinct, count, Sum query fails when MV is created on single projection column

2019-05-31 Thread Akash R Nilugal (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal updated CARBONDATA-3407:

Description: 
distinct, count, Sum query fails when MV is created on single projection column

 sql("drop table if exists maintable")
sql("create table maintable(name string, age int, add string) stored by 
'carbondata'")
sql("create datamap single_mv using 'mv' as select age from maintable")
sql("insert into maintable select 'pheobe',31,'NY'")
sql("insert into maintable select 'rachel',32,'NY'")
 sql("select distinct(age) from maintable")
 sql("select sum(age) from maintable")
sql("select count(age) from maintable")

Fails with below Exception:
{quote}requirement failed: Fragment is not supported.  Current frag:
org.apache.carbondata.mv.plans.util.SQLBuildDSL$$anon$1@1f7f2e76
java.lang.IllegalArgumentException: requirement failed: Fragment is not 
supported.  Current frag:
org.apache.carbondata.mv.plans.util.SQLBuildDSL$$anon$1@1f7f2e76
at scala.Predef$.require(Predef.scala:224)
at 
org.apache.carbondata.mv.plans.util.Printers$SQLFragmentCompactPrinter.printFragment(Printers.scala:248)
at 
org.apache.carbondata.mv.plans.util.Printers$FragmentPrinter$$anonfun$print$1.apply(Printers.scala:82)
at 
org.apache.carbondata.mv.plans.util.Printers$FragmentPrinter$$anonfun$print$1.apply(Printers.scala:80)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at 
org.apache.carbondata.mv.plans.util.Printers$FragmentPrinter.print(Printers.scala:80)
at 
org.apache.carbondata.mv.plans.util.Printers$class.render(Printers.scala:318)
at 
org.apache.carbondata.mv.plans.modular.ModularPlan.render(ModularPlan.scala:35)
at 
org.apache.carbondata.mv.plans.util.Printers$class.asCompactString(Printers.scala:323)
at 
org.apache.carbondata.mv.plans.modular.ModularPlan.asCompactString(ModularPlan.scala:35)
at 
org.apache.carbondata.mv.plans.modular.ModularPlan.asCompactSQL(ModularPlan.scala:156)
at 
org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:83)
at 
org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:43)
at 
org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:46){quote}



  was:
distinct, count, Sum query fails when MV is created on single projection column

 sql("drop table if exists maintable")
sql("create table maintable(name string, age int, add string) stored by 
'carbondata'")
sql("create datamap single_mv using 'mv' as select age from maintable")
sql("insert into maintable select 'pheobe',31,'NY'")
sql("insert into maintable select 'rachel',32,'NY'")
 sql("select distinct(age) from maintable")
 sql("select sum(age) from maintable")
sql("select count(age) from maintable")

Fails with below Exception:
requirement failed: Fragment is not supported.  Current frag:
org.apache.carbondata.mv.plans.util.SQLBuildDSL$$anon$1@1f7f2e76
java.lang.IllegalArgumentException: requirement failed: Fragment is not 
supported.  Current frag:
org.apache.carbondata.mv.plans.util.SQLBuildDSL$$anon$1@1f7f2e76
at scala.Predef$.require(Predef.scala:224)
at 
org.apache.carbondata.mv.plans.util.Printers$SQLFragmentCompactPrinter.printFragment(Printers.scala:248)
at 
org.apache.carbondata.mv.plans.util.Printers$FragmentPrinter$$anonfun$print$1.apply(Printers.scala:82)
at 
org.apache.carbondata.mv.plans.util.Printers$FragmentPrinter$$anonfun$print$1.apply(Printers.scala:80)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at 
org.apache.carbondata.mv.plans.util.Printers$FragmentPrinter.print(Printers.scala:80)
at 
org.apache.carbondata.mv.plans.util.Printers$class.render(Printers.scala:318)
at 
org.apache.carbondata.mv.plans.modular.ModularPlan.render(ModularPlan.scala:35)
at 
org.apache.carbondata.mv.plans.util.Printers$class.asCompactString(Printers.scala:323)
at 
org.apache.carbondata.mv.plans.modular.ModularPlan.asCompactString(ModularPlan.scala:35)
at 
org.apache.carbondata.mv.plans.modular.ModularPlan.asCompactSQL(ModularPlan.scala:156)
at 
org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:83)
at 
org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:43)
at 
org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:46)




> distinct, count, Sum query fails when MV is created on single projection 
> column
> ---
>
> 

[jira] [Created] (CARBONDATA-3407) distinct, count, Sum query fails when MV is created on single projection column

2019-05-31 Thread Akash R Nilugal (JIRA)
Akash R Nilugal created CARBONDATA-3407:
---

 Summary: distinct, count, Sum query fails when MV is created on 
single projection column
 Key: CARBONDATA-3407
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3407
 Project: CarbonData
  Issue Type: Bug
Reporter: Akash R Nilugal


distinct, count, Sum query fails when MV is created on single projection column

 sql("drop table if exists maintable")
sql("create table maintable(name string, age int, add string) stored by 
'carbondata'")
sql("create datamap single_mv using 'mv' as select age from maintable")
sql("insert into maintable select 'pheobe',31,'NY'")
sql("insert into maintable select 'rachel',32,'NY'")
 sql("select distinct(age) from maintable")
 sql("select sum(age) from maintable")
sql("select count(age) from maintable")

Fails with below Exception:
requirement failed: Fragment is not supported.  Current frag:
org.apache.carbondata.mv.plans.util.SQLBuildDSL$$anon$1@1f7f2e76
java.lang.IllegalArgumentException: requirement failed: Fragment is not 
supported.  Current frag:
org.apache.carbondata.mv.plans.util.SQLBuildDSL$$anon$1@1f7f2e76
at scala.Predef$.require(Predef.scala:224)
at 
org.apache.carbondata.mv.plans.util.Printers$SQLFragmentCompactPrinter.printFragment(Printers.scala:248)
at 
org.apache.carbondata.mv.plans.util.Printers$FragmentPrinter$$anonfun$print$1.apply(Printers.scala:82)
at 
org.apache.carbondata.mv.plans.util.Printers$FragmentPrinter$$anonfun$print$1.apply(Printers.scala:80)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at 
org.apache.carbondata.mv.plans.util.Printers$FragmentPrinter.print(Printers.scala:80)
at 
org.apache.carbondata.mv.plans.util.Printers$class.render(Printers.scala:318)
at 
org.apache.carbondata.mv.plans.modular.ModularPlan.render(ModularPlan.scala:35)
at 
org.apache.carbondata.mv.plans.util.Printers$class.asCompactString(Printers.scala:323)
at 
org.apache.carbondata.mv.plans.modular.ModularPlan.asCompactString(ModularPlan.scala:35)
at 
org.apache.carbondata.mv.plans.modular.ModularPlan.asCompactSQL(ModularPlan.scala:156)
at 
org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:83)
at 
org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:43)
at 
org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:46)





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [carbondata] CarbonDataQA commented on issue #3084: [CARBONDATA-3258] Add more test case for mv datamap

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3084: [CARBONDATA-3258] Add more test case for 
mv datamap
URL: https://github.com/apache/carbondata/pull/3084#issuecomment-497620253
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/3412/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3245: [CARBONDATA-3398] Handled show cache for 
index server and MV
URL: https://github.com/apache/carbondata/pull/3245#issuecomment-497618754
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11677/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3242: [CARBONDATA-3403]Fix MV is not working for like and filter AND and OR queries

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3242: [CARBONDATA-3403]Fix MV is not working 
for like and filter AND and OR queries
URL: https://github.com/apache/carbondata/pull/3242#issuecomment-497617511
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11678/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3247: [CARBONDATA-3405] Fix getSplits() should clear the cache in SDK

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3247: [CARBONDATA-3405] Fix getSplits() should 
clear the cache in SDK
URL: https://github.com/apache/carbondata/pull/3247#issuecomment-497617273
 
 
   Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/11676/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] ravipesala commented on a change in pull request #3245: [CARBONDATA-3398] Handled show cache for index server and MV

2019-05-31 Thread GitBox
ravipesala commented on a change in pull request #3245: [CARBONDATA-3398] 
Handled show cache for index server and MV
URL: https://github.com/apache/carbondata/pull/3245#discussion_r289284518
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedShowCacheRDD.scala
 ##
 @@ -43,15 +44,25 @@ class DistributedShowCacheRDD(@transient private val ss: 
SparkSession, tableName
 
   override def internalCompute(split: Partition, context: TaskContext): 
Iterator[String] = {
 val dataMaps = DataMapStoreManager.getInstance().getAllDataMaps.asScala
+val tableList = tableName.split(",")
 val iterator = dataMaps.collect {
   case (table, tableDataMaps) if table.isEmpty ||
- (tableName.nonEmpty && 
tableName.equalsIgnoreCase(table)) =>
+ (tableName.nonEmpty && 
tableList.contains(table)) =>
 val sizeAndIndexLengths = tableDataMaps.asScala
-  .map(_.getBlockletDetailsFetcher.getCacheSize)
-// return tableName_indexFileLength_indexCachesize for each executor.
-sizeAndIndexLengths.map {
-  x => s"$table:$x"
-}
+  .map { dataMap =>
+if (!dataMap.getDataMapSchema.getProviderName
+  .equals(DataMapClassProvider.BLOOMFILTER.getShortName)) {
 
 Review comment:
   I don't get why we are specifically checking for `BLOOMFILTER`. We are not 
supposed to check anything specific


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA commented on issue #3242: [CARBONDATA-3403]Fix MV is not working for like and filter AND and OR queries

2019-05-31 Thread GitBox
CarbonDataQA commented on issue #3242: [CARBONDATA-3403]Fix MV is not working 
for like and filter AND and OR queries
URL: https://github.com/apache/carbondata/pull/3242#issuecomment-497607317
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/3614/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >