[GitHub] carbondata issue #2991: [CARBONDATA-3043] Add build script and add test case...

2019-01-11 Thread BJangir
Github user BJangir commented on the issue:

https://github.com/apache/carbondata/pull/2991
  
retest this please


---


[GitHub] carbondata issue #3064: [CARBONDATA-3243] Updated DOC for No-Sort Compaction...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3064
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10525/



---


[GitHub] carbondata issue #2991: [CARBONDATA-3043] Add build script and add test case...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2991
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2268/



---


[GitHub] carbondata issue #3064: [CARBONDATA-3243] Updated DOC for No-Sort Compaction...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3064
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2486/



---


[GitHub] carbondata issue #2991: [CARBONDATA-3043] Add build script and add test case...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2991
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10526/



---


[GitHub] carbondata issue #2991: [CARBONDATA-3043] Add build script and add test case...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2991
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2487/



---


[GitHub] carbondata pull request #3068: [HOTFIX] Fixed NPE during query with Local Di...

2019-01-11 Thread kumarvishal09
GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/carbondata/pull/3068

[HOTFIX] Fixed NPE during query with Local Dictionary

**Problem:**
Query is failing with NPE when some blocklet encoded with local dictionary 
and some without local dictionary.
**Root Cause:** 
This is coming because in carbonvectorProxy setDictionary with null it is 
not setting the dictionary to null because of this it is treated like a local 
dictionary column but column is not encoded with dictionary.
**Solution:**
Set dictionary to null
 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata 
master_102019

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3068.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3068


commit 7fdc042bcf2c7bc05135a809cab8ccd45dfbc01c
Author: kumarvishal09 
Date:   2019-01-11T09:44:53Z

fixed NPE in LocalDictionary Query




---


[GitHub] carbondata issue #3068: [HOTFIX] Fixed NPE during query with Local Dictionar...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3068
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2269/



---


[GitHub] carbondata pull request #3069: [WIP][CARBONDATA-3232] Add test case for allu...

2019-01-11 Thread xubo245
GitHub user xubo245 opened a pull request:

https://github.com/apache/carbondata/pull/3069

[WIP][CARBONDATA-3232] Add test case for alluxio UT

Add test case for alluxio UT
1. install alluxio UT environment, start alluxio minicluster before running 
test case.
2. add CarbonSession test case for UT 
3. add spark carbon file format test case for UT=> TODO
4. add external table test case for UT=> TODO
5. add SDK test case for UT=> TODO


Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 No
 - [ ] Any backward compatibility impacted?
 No
 - [ ] Document update required?
No
 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   yes, add many test case for alluxio
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
No


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/carbondata 
CARBONDATA-3232_AddTestCaseForAlluxio

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3069.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3069


commit 5d014aceb664c0259c3135f8c129fa9752f3abb1
Author: xubo245 
Date:   2019-01-11T10:08:14Z

[CARBONDATA-3232] Add test case for alluxio UT
1. install alluxio UT environment
2. add CarbonSession test case for UT
3. add spark carbon file format test case for UT
4. add external table test case for UT
5. add SDK test case for UT

add




---


[GitHub] carbondata issue #3069: [WIP][CARBONDATA-3232] Add test case for alluxio UT

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3069
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2270/



---


[jira] [Created] (CARBONDATA-3246) SDK reader fails if vectorReader is false for concurrent read scenario and batch size is zero.

2019-01-11 Thread Shardul Singh (JIRA)
Shardul Singh created CARBONDATA-3246:
-

 Summary: SDK reader fails if vectorReader is false for concurrent 
read scenario and batch size is zero.
 Key: CARBONDATA-3246
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3246
 Project: CarbonData
  Issue Type: Bug
Reporter: Shardul Singh
Assignee: Shardul Singh


SDK reader fails if vectorReader is false for concurrent read scenario and 
batch size is zero.

If the batch size is zero, we should assign batch size as 
{color:#9876aa}DETAIL_QUERY_BATCH_SIZE_DEFAULT .
{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #3070: [CARBONDATA-3246]Fix sdk reader issue if batc...

2019-01-11 Thread shardul-cr7
GitHub user shardul-cr7 opened a pull request:

https://github.com/apache/carbondata/pull/3070

[CARBONDATA-3246]Fix sdk reader issue if batch size is given as zero and 
vectorRead False.

This PR is to fix sdk reader issue when batch size is given as zero and 
vectorRead False.

**Problem**  SDK reader is failing if vectorRead is false and detail query 
batch size is given as 0.Compiler is giving stack overflow error after getting 
stuck in ChunkRowIterator.hasnext recurssion.

**Solution**  Since 0 is wrong batch size, we should take 
DETAIL_QUERY_BATCH_SIZE_DEFAULT as the batch size.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [x] Any interfaces changed?- No
 
 - [x] Any backward compatibility impacted? - No
 
 - [x] Document update required? - No


 - [x] Testing done
added test case 
   
 - [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shardul-cr7/carbondata batchSize_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3070.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3070


commit 4c002f80903076ebd7707fe7cf1384e45f823bbd
Author: shardul-cr7 
Date:   2019-01-11T10:40:27Z

[CARBONDATA-3246]Fix sdk reader issue if batch size is given as zero




---


[GitHub] carbondata issue #3068: [HOTFIX] Fixed NPE during query with Local Dictionar...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3068
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2488/



---


[GitHub] carbondata issue #3068: [HOTFIX] Fixed NPE during query with Local Dictionar...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3068
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10527/



---


[GitHub] carbondata issue #3070: [CARBONDATA-3246]Fix sdk reader issue if batch size ...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3070
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2271/



---


[GitHub] carbondata issue #3069: [WIP][CARBONDATA-3232] Add test case for alluxio UT

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3069
  
Build Failed  with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10528/



---


[GitHub] carbondata issue #3065: [HOTFIX] Optimize presto-guide

2019-01-11 Thread zzcclp
Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/3065
  
@xubo245 yup, you are right, but we can't test them one by one, we can note 
that it also supports presto 0.214, is it ok?


---


[GitHub] carbondata issue #3070: [CARBONDATA-3246]Fix sdk reader issue if batch size ...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3070
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2272/



---


[GitHub] carbondata issue #3069: [WIP][CARBONDATA-3232] Add test case for alluxio UT

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3069
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2489/



---


[GitHub] carbondata issue #3067: [HOTFIX] Fix compile error after merging PR#3001

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3067
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2273/



---


[GitHub] carbondata pull request #3064: [CARBONDATA-3243] Updated DOC for No-Sort Com...

2019-01-11 Thread shardul-cr7
Github user shardul-cr7 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3064#discussion_r247096965
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -1201,6 +1202,17 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 }
   }
 
+// Validate SORT_SCOPE
+if(options.exists(_._1.equalsIgnoreCase("SORT_SCOPE"))) {
+  val optionValue: String = options.get("sort_scope").get.head._2
+  if (!CarbonUtil.isValidSortOption(optionValue)) {
+throw new InvalidConfigurationException(
+  s"Passing invalid SORT_SCOPE '$optionValue', valid SORT_SCOPE 
are 'NO_SORT'," +
+  s" 'BATCH_SORT', 'LOCAL_SORT' and 'GLOBAL_SORT' ")
+  }
+
+}
+
--- End diff --



Remove empty lines and properly format the code.



---


[GitHub] carbondata issue #3070: [CARBONDATA-3246]Fix sdk reader issue if batch size ...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3070
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2491/



---


[GitHub] carbondata pull request #3064: [CARBONDATA-3243] Updated DOC for No-Sort Com...

2019-01-11 Thread NamanRastogi
Github user NamanRastogi commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3064#discussion_r247097474
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -1201,6 +1202,17 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 }
   }
 
+// Validate SORT_SCOPE
+if(options.exists(_._1.equalsIgnoreCase("SORT_SCOPE"))) {
+  val optionValue: String = options.get("sort_scope").get.head._2
+  if (!CarbonUtil.isValidSortOption(optionValue)) {
+throw new InvalidConfigurationException(
+  s"Passing invalid SORT_SCOPE '$optionValue', valid SORT_SCOPE 
are 'NO_SORT'," +
+  s" 'BATCH_SORT', 'LOCAL_SORT' and 'GLOBAL_SORT' ")
+  }
+
+}
+
--- End diff --

Done.


---


[GitHub] carbondata issue #3070: [CARBONDATA-3246]Fix sdk reader issue if batch size ...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3070
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10530/



---


[GitHub] carbondata issue #3064: [CARBONDATA-3243] Updated DOC for No-Sort Compaction...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3064
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2274/



---


[GitHub] carbondata issue #3068: [HOTFIX] Fixed NPE during query with Local Dictionar...

2019-01-11 Thread qiuchenjian
Github user qiuchenjian commented on the issue:

https://github.com/apache/carbondata/pull/3068
  
why does one segment have some blocklet encoded with local dictionary and 
some without local dictionary ?


---


[GitHub] carbondata issue #3069: [WIP][CARBONDATA-3232] Add test case for alluxio UT

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3069
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2275/



---


[GitHub] carbondata issue #3067: [HOTFIX] Fix compile error after merging PR#3001

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3067
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2492/



---


[GitHub] carbondata issue #3067: [HOTFIX] Fix compile error after merging PR#3001

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3067
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10531/



---


[GitHub] carbondata issue #3069: [WIP][CARBONDATA-3232] Add test case for alluxio UT

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3069
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2495/



---


[GitHub] carbondata issue #3069: [WIP][CARBONDATA-3232] Add test case for alluxio UT

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3069
  
Build Failed  with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10534/



---


[GitHub] carbondata pull request #3066: [CARBONDATA-3244] Add benchmark for Change Da...

2019-01-11 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3066#discussion_r247119761
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/benchmark/CDCBenchmark.scala
 ---
@@ -0,0 +1,256 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.benchmark
+
+import java.io.File
+import java.sql.Date
+
+import org.apache.commons.lang3.time.DateUtils
+import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
+
+/**
+ * Benchmark for Change Data Capture scenario.
+ * This test simulates updates to history table using CDC table.
+ *
+ * The benchmark shows performance of two update methods:
+ * 1. hive_solution, which uses INSERT OVERWRITE. This is a popular method 
for hive warehouse.
+ * 2. carbon_solution, which uses CarbonData's update syntax to update the 
history table directly.
+ *
+ * When running in a 8-cores laptop, the benchmark shows:
+ *
+ * 1. test one
+ * History table 1M records, update 10K records everyday and insert 10K 
records everyday,
+ * simulated 3 days.
+ * hive_solution: total process time takes 13,516 ms
+ * carbon_solution: total process time takes 7,521 ms
+ *
+ *
+ * 2. test two
+ * History table 10M records, update 10K records everyday and insert 10K 
records everyday,
+ * simulated 3 days.
+ * hive_solution: total process time takes 104,250 ms
+ * carbon_solution: total process time takes 17,384 ms
+ *
+ */
+object CDCBenchmark {
+
+  // Schema for history table
+  // Table name: dw_order
+  // +-+---+-+
+  // | Column name | Data type | Cardinality |
+  // +-+---+-+
+  // | order_id| string| 10,000,000  |
+  // +-+---+-+
+  // | customer_id | string| 10,000,000  |
+  // +-+---+-+
+  // | start_date  | date  | NA  |
+  // +-+---+-+
+  // | end_date| date  | NA  |
+  // +-+---+-+
+  // | state   | int   | 4   |
+  // +-+---+-+
+  case class Order (order_id: String, customer_id: String, start_date: 
Date, end_date: Date,
+  state: Int)
+
+  // Schema for CDC data which is used for update to history table every 
day
+  // Table name: ods_order
+  // +-+---+-+
+  // | Column name | Data type | Cardinality |
+  // +-+---+-+
+  // | order_id| string| 10,000,000  |
+  // +-+---+-+
+  // | customer_id | string| 10,000,000  |
+  // +-+---+-+
+  // | update_date | date  | NA  |
+  // +-+---+-+
+  // | state   | int   | 4   |
+  // +-+---+-+
+  case class CDC (order_id: String, customer_id: String, update_date: 
Date, state: Int)
+
+  // number of records for first day
+  val numOrders = 1000
+
+  // number of records to update every day
+  val numUpdateOrdersDaily = 1
+
+  // number of new records to insert every day
+  val newNewOrdersDaily = 1
+
+  // number of days to simulate
+  val numDays = 3
+
+  // print eveyday result or not to console
+  val printDetail = false
+
+  def generateDataForDay0(
+  sparkSession: SparkSession,
+  numOrders: Int = 100,
+  startDate: Date = Date.valueOf("2018-05-01")): DataFrame = {
+import sparkSession.implicits._
+sparkSession.sparkContext.parallelize(1 to numOrders, 4)
+  .map { x => Order(s"order$x", s"customer$x", startDate, 
Date.valueOf("-01-01"), 1)
+  }.toDS().toDF()
+  }
+
+  def generateDailyCDC(
   

[GitHub] carbondata issue #3064: [CARBONDATA-3243] Updated DOC for No-Sort Compaction...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3064
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10533/



---


[GitHub] carbondata issue #3066: [CARBONDATA-3244] Add benchmark for Change Data Capt...

2019-01-11 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/3066
  
> i think the query performance of carbon_solution is lower than 
hive_solution's, because carbon_solution has more segment (insert generates a 
segment and update generates more segment)
> Do we have some method to optimize this?

Since we are updating the existing data it creates extra files like delete 
delta and incremental carbondata files It may degrade query performance a 
little but when do the compaction it will get improved.


---


[GitHub] carbondata issue #3068: [HOTFIX] Fixed NPE during query with Local Dictionar...

2019-01-11 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/3068
  
> why does one segment have some blocklet encoded with local dictionary and 
some without local dictionary ?

It is because carbon generates a dictionary based on the column value count 
threshold, so once it reaches that threshold it stops generating the 
dictionary. There are scenarios where some blocks/blocklets are with in 
threshold and some are not, thats why some blocks has local dictionary and some 
don't have


---


[GitHub] carbondata issue #3067: [HOTFIX] Fix compile error after merging PR#3001

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3067
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2276/



---


[GitHub] carbondata issue #3068: [HOTFIX] Fixed NPE during query with Local Dictionar...

2019-01-11 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/3068
  
LGTM


---


[GitHub] carbondata issue #3065: [HOTFIX] Optimize presto-guide

2019-01-11 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/3065
  
@zzcclp Can you verify the presto with current master. There are changes 
related to Hive metastore is done now. So now carbon behaves as a one of the 
hive supported format in presto. Please check and let us know your feedback.


---


[GitHub] carbondata issue #3064: [CARBONDATA-3243] Updated DOC for No-Sort Compaction...

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3064
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2494/



---


[jira] [Updated] (CARBONDATA-3234) Unable to read data from carbondata table stored in S3 using Presto running on EMR

2019-01-11 Thread charles horrell (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

charles horrell updated CARBONDATA-3234:

Priority: Blocker  (was: Major)

> Unable to read data from carbondata table stored in S3 using Presto running 
> on EMR
> --
>
> Key: CARBONDATA-3234
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3234
> Project: CarbonData
>  Issue Type: Bug
>  Components: presto-integration
> Environment: Amazon EMR 5.19
>Reporter: charles horrell
>Priority: Blocker
>
> We are unable to use presto to query a carbondata table stored in S3.
> {code:java}
> presto:default> select count(*) from test_table;
> Query 20190107_135333_00026_8r2c8 failed: tried to access method 
> org.apache.hadoop.metrics2.lib.MutableCounterLong.(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V
>  from class org.apache.hadoop.fs.s3a.S3AInstrumentation
>  
> presto:default> select * from test_table;
> Query 20190107_135610_00028_8r2c8 failed: tried to access method 
> org.apache.hadoop.metrics2.lib.MutableCounterLong.(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V
>  from class org.apache.hadoop.fs.s3a.S3AInstrumentation
> {code}
> The catalog appears to have been picked up okay as show tables works as 
> expected as does describing the table it is just when actually trying to 
> access the data that we see the error.
> We configured presto as per the examples here: 
> [http://carbondata.apache.org/quick-start-guide.html]
> Querying from Spark works okay however it is vital for our use case that 
> presto also works and with S3.
> Amazon EMR version 5.19
>  Spark 2.3.2
>  Hadoop 2.8.5
>  Presto 0.212
>  
> Stack trace from presto server log
> {code:java}
> 2019-01-07T12:19:57.562Z WARN statement-response-4 
> com.facebook.presto.server.ThrowableMapper Request failed for 
> /v1/statement/20190107_121957_4_k6t5p/1
> java.lang.IllegalAccessError: tried to access method 
> org.apache.hadoop.metrics2.lib.MutableCounterLong.(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V
>  from class org.apache.hadoop.fs.s3a.S3AInstrumentation
> at 
> org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:194)
> at 
> org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:216)
> at 
> org.apache.hadoop.fs.s3a.S3AInstrumentation.(S3AInstrumentation.java:139)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:174)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
> at 
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.(AbstractDFSCarbonFile.java:74)
> at 
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.(AbstractDFSCarbonFile.java:66)
> at 
> org.apache.carbondata.core.datastore.filesystem.HDFSCarbonFile.(HDFSCarbonFile.java:41)
> at 
> org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.(S3CarbonFile.java:41)
> at 
> org.apache.carbondata.core.datastore.impl.DefaultFileTypeProvider.getCarbonFile(DefaultFileTypeProvider.java:53)
> at 
> org.apache.carbondata.core.datastore.impl.FileFactory.getCarbonFile(FileFactory.java:102)
> at 
> org.apache.carbondata.presto.impl.CarbonTableReader.updateCarbonFile(CarbonTableReader.java:202)
> at 
> org.apache.carbondata.presto.impl.CarbonTableReader.updateSchemaList(CarbonTableReader.java:216)
> at 
> org.apache.carbondata.presto.impl.CarbonTableReader.getSchemaNames(CarbonTableReader.java:189)
> at 
> org.apache.carbondata.presto.CarbondataMetadata.listSchemaNamesInternal(CarbondataMetadata.java:86)
> at 
> org.apache.carbondata.presto.CarbondataMetadata.getTableMetadata(CarbondataMetadata.java:135)
> at 
> org.apache.carbondata.presto.CarbondataMetadata.getTableMetadataInternal(CarbondataMetadata.java:240)
> at 
> org.apache.carbondata.presto.CarbondataMetadata.getTableMetadata(CarbondataMetadata.java:232)
> at 
> com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.getTableMetadata(ClassLoaderSafeConnectorMetadata.java:145)
> at 
> com.facebook.presto.metadata.MetadataManager.getTableMetadata(MetadataManager.java:388)
> at 
> com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:850)
> at 
> com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:258)
> at com.facebook.presto.sql.tree.Table.accept(Table.java:53)
> at com.facebook.presto.sql.tree.AstVisitor.pro

[GitHub] carbondata issue #3067: [HOTFIX] Fix compile error after merging PR#3001

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3067
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10535/



---


[GitHub] carbondata issue #3067: [HOTFIX] Fix compile error after merging PR#3001

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3067
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2496/



---


[GitHub] carbondata issue #3065: [HOTFIX] Optimize presto-guide

2019-01-11 Thread zzcclp
Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/3065
  
@ravipesala NP, another team of our company will test this and the feature 
of supporting reading stream segment. Will let you know the feedback.


---


[GitHub] carbondata issue #3067: [HOTFIX] Fix compile error after merging PR#3001

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3067
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2277/



---


[GitHub] carbondata issue #3067: [HOTFIX] Fix compile error after merging PR#3001

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3067
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2497/



---


[GitHub] carbondata issue #3067: [HOTFIX] Fix compile error after merging PR#3001

2019-01-11 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/3067
  
Build Success with Spark 2.3.2, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/10536/



---


[GitHub] carbondata issue #3067: [HOTFIX] Fix compile error after merging PR#3001

2019-01-11 Thread zzcclp
Github user zzcclp commented on the issue:

https://github.com/apache/carbondata/pull/3067
  
@QiangCai @xubo245 please take a look.


---


[GitHub] carbondata issue #3068: [HOTFIX] Fixed NPE during query with Local Dictionar...

2019-01-11 Thread qiuchenjian
Github user qiuchenjian commented on the issue:

https://github.com/apache/carbondata/pull/3068
  
@ravipesala I remember if the local dictionary value count reach threshold, 
it will go back to original value, right?
Does some nodes's local dictionary reach the threshold, bug other's not, so 
this scene appeared ?(some shards have local dictionary, some shards don't 
have  local dictionary)


---


[GitHub] carbondata pull request #3070: [CARBONDATA-3246]Fix sdk reader issue if batc...

2019-01-11 Thread qiuchenjian
Github user qiuchenjian commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3070#discussion_r247292846
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java
 ---
@@ -94,6 +94,9 @@
 if (null != batchSizeString) {
   try {
 batchSize = Integer.parseInt(batchSizeString);
+if (0 == batchSize) {
--- End diff --

```suggestion
if (0 >= batchSize) {
```


---