[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1198#discussion_r129765977 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -145,21 +146,31 @@ public static void renameBadRecordsFromInProgressToNormal( /** * This method will be used to delete sort temp location is it is exites */ - public static void deleteSortLocationIfExists(String tempFileLocation) { -// create new temp file location where this class -//will write all the temp files -File file = new File(tempFileLocation); - -if (file.exists()) { - try { -CarbonUtil.deleteFoldersAndFiles(file); - } catch (IOException | InterruptedException e) { -LOGGER.error(e); + public static void deleteSortLocationIfExists(String[] locations) { +for (String loc : locations) { + File file = new File(loc); + if (file.exists()) { +try { + CarbonUtil.deleteFoldersAndFiles(file); +} catch (IOException | InterruptedException e) { + LOGGER.error(e, "Failed to delete " + loc); +} } } } /** + * This method will be used to create dirs + * @param locations locations to create + */ + public static void createLocations(String[] locations) { +for (String loc : locations) { + if (new File(loc).mkdirs()) { --- End diff -- :+1: nice --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1198#discussion_r129765796 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1296,6 +1296,18 @@ public static final String CARBON_LEASE_RECOVERY_RETRY_INTERVAL = "carbon.lease.recovery.retry.interval"; + /** + * whether to use multi directories when loading data, + * the main purpose is to avoid single-disk-hot-spot + */ + @CarbonProperty + public static final String CARBON_USE_MULTI_TEMP_DIR = "carbon.use.multiple.temp.dir"; + + /** + * default value for multi temp dir + */ + public static final String CARBON_USING_MULTI_TEMP_DIR_DEFAULT = "false"; --- End diff -- :+1: fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-1335) Duplicated & time-consuming method call found in query
xuchuanyin created CARBONDATA-1335: -- Summary: Duplicated & time-consuming method call found in query Key: CARBONDATA-1335 URL: https://issues.apache.org/jira/browse/CARBONDATA-1335 Project: CarbonData Issue Type: Improvement Components: data-query Affects Versions: 1.1.1 Reporter: xuchuanyin Priority: Minor # Scenario Currently we did a concurrent 14 queries on Carbondata. The queries are the same, but on different tables. We have noticed the following scene: + A single query took about 5s; + In concurrent scenario, each query took about 15s; By adding checkpoint in the log, we found that there was great latency in starting query jobs in spark. # Analysts When we fire a query, Carbondata firstly do some job in the client side, including parse/analyze plans and prepare filtered blocks and inputSplits. Then Carbondata start to submit query job to spark. We found in the first step, Carbondata took about 7s in current scenario, but it only took about <1s in single scenario. By studying the related code, we found the most time consuming method call was `CarbonSessionCatalog.lookupRelation`. In side this method, it called `super.lookupRelation` twice, which consumed about 3s each time. # Solution Carbondata only needs to call the `super.lookupRelation` only once, we need to remove the useless duplicated method call. I've tested in my environment and it works well. In concurrent scenario, each query takes about 12s (3s saved for the improvement). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1192: [CARBONDATA-940] alter table add/split partition for...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1192 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3209/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1192: [CARBONDATA-940] alter table add/split partition for...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1192 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/614/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1198#discussion_r129753971 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -145,21 +146,31 @@ public static void renameBadRecordsFromInProgressToNormal( /** * This method will be used to delete sort temp location is it is exites */ - public static void deleteSortLocationIfExists(String tempFileLocation) { -// create new temp file location where this class -//will write all the temp files -File file = new File(tempFileLocation); - -if (file.exists()) { - try { -CarbonUtil.deleteFoldersAndFiles(file); - } catch (IOException | InterruptedException e) { -LOGGER.error(e); + public static void deleteSortLocationIfExists(String[] locations) { +for (String loc : locations) { + File file = new File(loc); + if (file.exists()) { +try { + CarbonUtil.deleteFoldersAndFiles(file); +} catch (IOException | InterruptedException e) { + LOGGER.error(e, "Failed to delete " + loc); +} } } } /** + * This method will be used to create dirs + * @param locations locations to create + */ + public static void createLocations(String[] locations) { +for (String loc : locations) { + if (new File(loc).mkdirs()) { --- End diff -- should it not be !new File(loc).mkdirs() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...
Github user sraghunandan commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1198#discussion_r129753676 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1296,6 +1296,18 @@ public static final String CARBON_LEASE_RECOVERY_RETRY_INTERVAL = "carbon.lease.recovery.retry.interval"; + /** + * whether to use multi directories when loading data, + * the main purpose is to avoid single-disk-hot-spot + */ + @CarbonProperty + public static final String CARBON_USE_MULTI_TEMP_DIR = "carbon.use.multiple.temp.dir"; + + /** + * default value for multi temp dir + */ + public static final String CARBON_USING_MULTI_TEMP_DIR_DEFAULT = "false"; --- End diff -- change to match the above configuration --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-1334) Delete Operation Hung in large dataset
sounak chakraborty created CARBONDATA-1334: -- Summary: Delete Operation Hung in large dataset Key: CARBONDATA-1334 URL: https://issues.apache.org/jira/browse/CARBONDATA-1334 Project: CarbonData Issue Type: Bug Reporter: sounak chakraborty Delete operation is hung in large dataset. Due to wrong quals check in DeleteDeltaBlockletDetails.java multiple DeleteDeltaBlockDetails objects being formed (almost like each object for each delete offset). Due to this high object formation search cost became very high which caused the hung situation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1197: [CARBONDATA-1238] Decouple the datatype convert from...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1197 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3208/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1197: [CARBONDATA-1238] Decouple the datatype convert from...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1197 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/613/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1192: [CARBONDATA-940] alter table add/split partition for...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1192 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3207/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1192: [CARBONDATA-940] alter table add/split partition for...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1192 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/612/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user lionelcao commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129744678 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -308,6 +308,10 @@ @CarbonProperty public static final String NUM_CORES_COMPACTING = "carbon.number.of.cores.while.compacting"; /** + * Number of cores to be used while alter partition + */ + public static final String NUM_CORES_ALT_PARTITION = "carbon.number.of.cores.while.altPartition"; + /** --- End diff -- No space line in other variables here, so keep one style. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1079: [CARBONDATA-1257] Measure Filter implementation
Github user zzcclp commented on the issue: https://github.com/apache/carbondata/pull/1079 @sounakr @ravipesala any progress on this pr? it was merged onto branch-1.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user lionelcao commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129740890 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonPartitionExample.scala --- @@ -101,17 +126,40 @@ object CarbonPartitionExample { spark.sql(""" | CREATE TABLE IF NOT EXISTS t5 | ( + | id Int, | vin String, | logdate Timestamp, | phonenumber Long, - | area String + | area String, + | salary Int |) | PARTITIONED BY (country String) | STORED BY 'carbondata' | TBLPROPERTIES('PARTITION_TYPE'='LIST', - | 'LIST_INFO'='(China,United States),UK ,japan,(Canada,Russia), South Korea ') + | 'LIST_INFO'='(China, US),UK ,Japan,(Canada,Russia, Good, NotGood), Korea ') --- End diff -- Hi @chenerlu , here in DDL statement, it's designed to leave no space to mock real situation which could happen in customer writing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1198 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3206/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1198 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/611/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1198 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1198 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/610/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1198 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3205/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1198 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1197: [CARBONDATA-1238] Decouple the datatype conve...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1197#discussion_r129736254 --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/util/SparkDataTypeConverterImp.java --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.util; + +import java.io.Serializable; + +import org.apache.carbondata.core.util.DataTypeConverter; + +import org.apache.spark.unsafe.types.UTF8String; + +/** + * Convert java data type to spark data type + */ +public final class SparkDataTypeConverterImp implements DataTypeConverter, Serializable { --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1197: [CARBONDATA-1238] Decouple the datatype conve...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1197#discussion_r129736197 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java --- @@ -768,29 +818,6 @@ public QueryModel getQueryModel(InputSplit inputSplit, TaskAttemptContext taskAt return queryModel; } - public CarbonReadSupport getReadSupportClass(Configuration configuration) { --- End diff -- no any code change, just move "set and get" method to together. for example : put setFilterPredicates and getFilterPredicates to together. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1195: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1195 @chenliang613 This PR contains commits from the others by uncorrected rebasing. So I close it and create a new one #1198 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #943: [CARBONDATA-1086]Added documentation for BATCH...
Github user vandana7 closed the pull request at: https://github.com/apache/carbondata/pull/943 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #943: [CARBONDATA-1086]Added documentation for BATCH SORT S...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/943 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/609/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #943: [CARBONDATA-1086]Added documentation for BATCH SORT S...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/943 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3204/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1198 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3203/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1198 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/608/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1196: Rebase datamap onto master
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1196 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3202/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1196: Rebase datamap onto master
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1196 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/607/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (CARBONDATA-1333) Fix Coverity_Fortify issue
[ https://issues.apache.org/jira/browse/CARBONDATA-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101782#comment-16101782 ] Kushal Sah commented on CARBONDATA-1333: Can this issue be assigned to me > Fix Coverity_Fortify issue > -- > > Key: CARBONDATA-1333 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1333 > Project: CarbonData > Issue Type: Improvement >Reporter: Kushal Sah >Priority: Minor > > Fixed coverity and fortify issue detected by the codex tool -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1333) Fix Coverity_Fortify issue
Kushal Sah created CARBONDATA-1333: -- Summary: Fix Coverity_Fortify issue Key: CARBONDATA-1333 URL: https://issues.apache.org/jira/browse/CARBONDATA-1333 Project: CarbonData Issue Type: Improvement Reporter: Kushal Sah Priority: Minor Fixed coverity and fortify issue detected by the codex tool -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129598686 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java --- @@ -440,9 +510,16 @@ protected Expression getFilterPredicates(Configuration configuration) { for (Map.Entry entry : segmentIndexMap.entrySet()) { SegmentTaskIndexStore.TaskBucketHolder taskHolder = entry.getKey(); - int taskId = CarbonTablePath.DataFileUtil.getTaskIdFromTaskNo(taskHolder.taskNo); + int partitionId = CarbonTablePath.DataFileUtil.getTaskIdFromTaskNo(taskHolder.taskNo); + //oldPartitionIdList is only used in alter table partition command because it change + //partition info first and then read data. + //for other normal query should use newest partitionIdList --- End diff -- use /** */ instead if multi line notes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129598531 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java --- @@ -440,9 +510,16 @@ protected Expression getFilterPredicates(Configuration configuration) { for (Map.Entry entry : segmentIndexMap.entrySet()) { SegmentTaskIndexStore.TaskBucketHolder taskHolder = entry.getKey(); - int taskId = CarbonTablePath.DataFileUtil.getTaskIdFromTaskNo(taskHolder.taskNo); + int partitionId = CarbonTablePath.DataFileUtil.getTaskIdFromTaskNo(taskHolder.taskNo); + //oldPartitionIdList is only used in alter table partition command because it change + //partition info first and then read data. + //for other normal query should use newest partitionIdList --- End diff -- use /** */ instead --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1197: [CARBONDATA-1238] Decouple the datatype conve...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1197#discussion_r129597130 --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/util/SparkDataTypeConverterImp.java --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.util; + +import java.io.Serializable; + +import org.apache.carbondata.core.util.DataTypeConverter; + +import org.apache.spark.unsafe.types.UTF8String; + +/** + * Convert java data type to spark data type + */ +public final class SparkDataTypeConverterImp implements DataTypeConverter, Serializable { --- End diff -- type, `SparkDataTypeConverterImpl` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1197: [CARBONDATA-1238] Decouple the datatype conve...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1197#discussion_r129596893 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java --- @@ -768,29 +818,6 @@ public QueryModel getQueryModel(InputSplit inputSplit, TaskAttemptContext taskAt return queryModel; } - public CarbonReadSupport getReadSupportClass(Configuration configuration) { --- End diff -- any change for this method? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1196: Rebase datamap onto master
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/1196 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1193: [CARBONDATA-1327] Add carbon sort column exam...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1193#discussion_r129592396 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSortColumnsExample.scala --- @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.spark.sql.SparkSession + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties + +object CarbonSortColumnsExample { + + def main(args: Array[String]) { +val rootPath = new File(this.getClass.getResource("/").getPath ++ "../../../..").getCanonicalPath +val storeLocation = s"$rootPath/examples/spark2/target/store" +val warehouse = s"$rootPath/examples/spark2/target/warehouse" +val metastoredb = s"$rootPath/examples/spark2/target" + +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd HH:mm:ss") + .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/MM/dd") + +import org.apache.spark.sql.CarbonSession._ +val spark = SparkSession + .builder() + .master("local") + .appName("CarbonSortColumnsExample") + .config("spark.sql.warehouse.dir", warehouse) + .config("spark.driver.host", "localhost") + .getOrCreateCarbonSession(storeLocation, metastoredb) + +spark.sparkContext.setLogLevel("WARN") + +spark.sql("DROP TABLE IF EXISTS sort_columns_table") + +// Create table with no sort columns +spark.sql( + s""" + | CREATE TABLE no_sort_columns_table( + | shortField SHORT, + | intField INT, + | bigintField LONG, + | doubleField DOUBLE, + | stringField STRING, + | timestampField TIMESTAMP, + | decimalField DECIMAL(18,2), + | dateField DATE, + | charField CHAR(5), + | floatField FLOAT, + | complexData ARRAY + | ) + | STORED BY 'carbondata' + | TBLPROPERTIES('SORT_COLUMNS'='') + """.stripMargin) + +// Create table with sort columns +// Currently sort_column don't support "FLOAD, DOUBLE, DECIMAL" +// but can support other numeric type(like: INT, LONG) --- End diff -- How about changing the below comments from // Currently sort_column don't support "FLOAD, DOUBLE, DECIMAL" +// but can support other numeric type(like: INT, LONG) to // you can specify any columns to sort columns for building MDX index, remark: currently sort columns don't support "FLOAT, DOUBLE, DECIMAL" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (CARBONDATA-1323) Presto Performace Improvement at Integration Layer
[ https://issues.apache.org/jira/browse/CARBONDATA-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen resolved CARBONDATA-1323. Resolution: Fixed Assignee: Bhavya Aggarwal Fix Version/s: 1.2.0 > Presto Performace Improvement at Integration Layer > -- > > Key: CARBONDATA-1323 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1323 > Project: CarbonData > Issue Type: Improvement > Components: presto-integration >Affects Versions: 1.2.0 >Reporter: Bhavya Aggarwal >Assignee: Bhavya Aggarwal > Fix For: 1.2.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Presto Performace Improvement -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1190: [CARBONDATA-1323] Presto Optimization for Int...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/1190 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1190: [CARBONDATA-1323] Presto Optimization for Int...
Github user bhavya411 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1190#discussion_r129588108 --- Diff: integration/presto/pom.xml --- @@ -228,6 +228,33 @@ true + + org.scala-tools + maven-scala-plugin --- End diff -- I have written the dictionary decoding in scala as it is more optimized and easier to understand, hence we have to add this plugin for compiling the scala code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1198 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/606/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1198 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3201/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1198 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/605/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129584623 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonPartitionExample.scala --- @@ -101,17 +126,40 @@ object CarbonPartitionExample { spark.sql(""" | CREATE TABLE IF NOT EXISTS t5 | ( + | id Int, | vin String, | logdate Timestamp, | phonenumber Long, - | area String + | area String, + | salary Int |) | PARTITIONED BY (country String) | STORED BY 'carbondata' | TBLPROPERTIES('PARTITION_TYPE'='LIST', - | 'LIST_INFO'='(China,United States),UK ,japan,(Canada,Russia), South Korea ') + | 'LIST_INFO'='(China, US),UK ,Japan,(Canada,Russia, Good, NotGood), Korea ') --- End diff -- add space before , --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1198 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3200/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129583765 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/PartitionSpliterRawResultIterator.java --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.result.iterator; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.scan.result.BatchResult; + + +public class PartitionSpliterRawResultIterator extends CarbonIterator { + + private CarbonIterator iterator; + private BatchResult batch; + private int counter; + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(PartitionSpliterRawResultIterator.class.getName()); + + public PartitionSpliterRawResultIterator(CarbonIterator iterator) { +this.iterator = iterator; + } + + + @Override public boolean hasNext() { +if (null == batch || checkBatchEnd(batch)) { + if (iterator.hasNext()) { +batch = iterator.next(); +counter = 0; + } else { +return false; + } +} + +if (!checkBatchEnd(batch)) { + return true; +} else { + return false; +} + } + + @Override public Object[] next() { +if (batch == null) { + batch = iterator.next(); +} +if (!checkBatchEnd(batch)) { + try { +return batch.getRawRow(counter++); + } catch (Exception e) { +LOGGER.error(e.getMessage()); +return null; + } +} else { + batch = iterator.next(); + counter = 0; +} +try { + return batch.getRawRow(counter++); +} catch (Exception e) { + LOGGER.error(e.getMessage()); + return null; --- End diff -- This logical can be optimized. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129583246 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/PartitionSpliterRawResultIterator.java --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.result.iterator; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.scan.result.BatchResult; + + +public class PartitionSpliterRawResultIterator extends CarbonIterator { + + private CarbonIterator iterator; + private BatchResult batch; + private int counter; + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(PartitionSpliterRawResultIterator.class.getName()); + + public PartitionSpliterRawResultIterator(CarbonIterator iterator) { +this.iterator = iterator; + } + + + @Override public boolean hasNext() { +if (null == batch || checkBatchEnd(batch)) { + if (iterator.hasNext()) { +batch = iterator.next(); +counter = 0; + } else { +return false; + } +} + +if (!checkBatchEnd(batch)) { + return true; +} else { + return false; +} --- End diff -- use return !checkBatchEnd(batch) instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user lionelcao commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129583064 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/PartitionInfo.java --- @@ -65,6 +65,31 @@ public PartitionInfo(List columnSchemaList, PartitionType partitio this.partitionIds = new ArrayList<>(); } + /** + * add partition means split default partition, add in last directly --- End diff -- because maybe there is data existed in default partition need to be filled in new partition --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1198 There is no useful information in the compilation message. retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129582365 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/PartitionSpliterRawResultIterator.java --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.result.iterator; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.scan.result.BatchResult; + + --- End diff -- delete space line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129582258 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/PartitionSpliterRawResultIterator.java --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.result.iterator; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.scan.result.BatchResult; + + +public class PartitionSpliterRawResultIterator extends CarbonIterator { + + private CarbonIterator iterator; + private BatchResult batch; + private int counter; + + /** + * LOGGER + */ --- End diff -- I think this is not necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129582111 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/PartitionSpliterRawResultIterator.java --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.result.iterator; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.scan.result.BatchResult; + + +public class PartitionSpliterRawResultIterator extends CarbonIterator { + + private CarbonIterator iterator; + private BatchResult batch; + private int counter; + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(PartitionSpliterRawResultIterator.class.getName()); + + public PartitionSpliterRawResultIterator(CarbonIterator iterator) { +this.iterator = iterator; + } + + --- End diff -- delete useless space line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...
GitHub user xuchuanyin reopened a pull request: https://github.com/apache/carbondata/pull/1198 [CARBONDATA-1281] Support multiple temp dirs for writing files while loading data # Modifications This feature mainly focus on avoiding disk hot-spot in single massive data loading, changes are made in two parts: 1. randomly choose a yarn local folder while writing sort temp file each time in sort-process; 2.randomly choose a yarn local folder while writing carbondata file each time in write-process. # Usage To enable this feature, user should enable `carbon.use.multi.temp.dir=true` and `carbon.use.local.dir=true`. # Performance In my case, this feature improves the loading performance from 35M/s/node to 70+M/s/node You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata new_feature_mtd4l Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1198.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1198 commit 46da65a1a0579c62a7f4196ae622f83dd5197e3a Author: xuchuanyin Date: 2017-07-25T11:17:53Z Support multiple temp dirs for writing files while loading data randomly choose a dir to write sort temp files randomly choose a dir to write carbondata files Fix errors in spelling optimize default value for using multiple temp dir update document for multiple temp dirs feature update property name (cherry picked from commit 71ab293ef8d2ff24a122bb074b7b95bca8c1b77e) commit 6e35dec70196a12aaac24a69c795d3597f946386 Author: xuchuanyin Date: 2017-07-25T11:20:32Z Add tests for multiple temp dirs during data loading Fix bugs in tests remove header in test data remove useless comment remove added useless testdata update data source for tests (cherry picked from commit ee355b78c0d703d5bc2d2767837c32b6cc422361) commit 3e633070c3f793867c03ba350048994ced0e5527 Author: xuchuanyin Date: 2017-07-25T12:28:17Z resolve review comments + update documents + update parameter name + optimize code to avoid duplicate lines commit 9f746178600d7c16267bd0276b8a492f69871802 Author: xuchuanyin Date: 2017-07-25T12:42:35Z fix checkstyle error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1198 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1198 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...
Github user xuchuanyin closed the pull request at: https://github.com/apache/carbondata/pull/1198 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1198 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3199/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1198 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/604/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1195: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1195 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3198/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1195: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1195 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/603/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1198 I created this PR and closed #1195 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1195: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/1195 @chenliang613 Sorry for adding irrelevant commits to this PR by uncorrected rebasing. :disappointed: I've created a new PR #1198 for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-1332) Dictionary generation time in spark 2.1 is more than spark 1.5
Venkata Ramana G created CARBONDATA-1332: Summary: Dictionary generation time in spark 2.1 is more than spark 1.5 Key: CARBONDATA-1332 URL: https://issues.apache.org/jira/browse/CARBONDATA-1332 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 1.1.1 Reporter: Venkata Ramana G Fix For: 1.2.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1193: [CARBONDATA-1327] Add carbon sort column examples
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1193 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/602/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1193: [CARBONDATA-1327] Add carbon sort column examples
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1193 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3197/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1198 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/1198 [CARBONDATA-1281] Support multiple temp dirs for writing files while loading data # Modifications This feature mainly focus on avoiding disk hot-spot in single massive data loading, changes are made in two parts: 1. randomly choose a yarn local folder while writing sort temp file each time in sort-process; 2.randomly choose a yarn local folder while writing carbondata file each time in write-process. # Usage To enable this feature, user should enable `carbon.use.multi.temp.dir=true` and `carbon.use.local.dir=true`. # Performance In my case, this feature improves the loading performance from 35M/s/node to 70+M/s/node You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata new_feature_mtd4l Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1198.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1198 commit 46da65a1a0579c62a7f4196ae622f83dd5197e3a Author: xuchuanyin Date: 2017-07-25T11:17:53Z Support multiple temp dirs for writing files while loading data randomly choose a dir to write sort temp files randomly choose a dir to write carbondata files Fix errors in spelling optimize default value for using multiple temp dir update document for multiple temp dirs feature update property name (cherry picked from commit 71ab293ef8d2ff24a122bb074b7b95bca8c1b77e) commit 6e35dec70196a12aaac24a69c795d3597f946386 Author: xuchuanyin Date: 2017-07-25T11:20:32Z Add tests for multiple temp dirs during data loading Fix bugs in tests remove header in test data remove useless comment remove added useless testdata update data source for tests (cherry picked from commit ee355b78c0d703d5bc2d2767837c32b6cc422361) commit 3e633070c3f793867c03ba350048994ced0e5527 Author: xuchuanyin Date: 2017-07-25T12:28:17Z resolve review comments + update documents + update parameter name + optimize code to avoid duplicate lines commit 9f746178600d7c16267bd0276b8a492f69871802 Author: xuchuanyin Date: 2017-07-25T12:42:35Z fix checkstyle error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1198: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1198 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-1331) Fixed failing test cases
Mohammad Shahid Khan created CARBONDATA-1331: Summary: Fixed failing test cases Key: CARBONDATA-1331 URL: https://issues.apache.org/jira/browse/CARBONDATA-1331 Project: CarbonData Issue Type: Bug Reporter: Mohammad Shahid Khan Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1195: [CARBONDATA-1281] Support multiple temp dirs ...
Github user xuchuanyin closed the pull request at: https://github.com/apache/carbondata/pull/1195 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1193: [CARBONDATA-1327] Add carbon sort column examples
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1193 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/601/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1193: [CARBONDATA-1327] Add carbon sort column examples
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1193 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3196/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1193: [CARBONDATA-1327] Add carbon sort column exam...
Github user mayunSaicmotor commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1193#discussion_r129566524 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSortColumnsExample.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.spark.sql.SparkSession + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties + +object CarbonSortColumnsExample { + + def main(args: Array[String]) { +val rootPath = new File(this.getClass.getResource("/").getPath ++ "../../../..").getCanonicalPath +val storeLocation = s"$rootPath/examples/spark2/target/store" +val warehouse = s"$rootPath/examples/spark2/target/warehouse" +val metastoredb = s"$rootPath/examples/spark2/target" + +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd HH:mm:ss") + .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/MM/dd") + .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true") + +import org.apache.spark.sql.CarbonSession._ +val spark = SparkSession + .builder() + .master("local") + .appName("CarbonSortColumnsExample") + .config("spark.sql.warehouse.dir", warehouse) + .config("spark.driver.host", "localhost") + .getOrCreateCarbonSession(storeLocation, metastoredb) + +spark.sparkContext.setLogLevel("WARN") + +spark.sql("DROP TABLE IF EXISTS sort_columns_table") + +// Create table with no sort columns +spark.sql( + s""" + | CREATE TABLE no_sort_columns_table( + | shortField SHORT, + | intField INT, + | bigintField LONG, + | doubleField DOUBLE, + | stringField STRING, + | timestampField TIMESTAMP, + | decimalField DECIMAL(18,2), + | dateField DATE, + | charField CHAR(5), + | floatField FLOAT, + | complexData ARRAY + | ) + | STORED BY 'carbondata' + | TBLPROPERTIES('SORT_COLUMNS'='') + """.stripMargin) + +// Create table with sort columns +spark.sql( + s""" + | CREATE TABLE sort_columns_table( + | shortField SHORT, + | intField INT, + | bigintField LONG, + | doubleField DOUBLE, + | stringField STRING, + | timestampField TIMESTAMP, + | decimalField DECIMAL(18,2), + | dateField DATE, + | charField CHAR(5), + | floatField FLOAT, + | complexData ARRAY + | ) + | STORED BY 'carbondata' + | TBLPROPERTIES('SORT_COLUMNS'='intField, stringField, charField') + """.stripMargin) + +val path = s"$rootPath/examples/spark2/src/main/resources/data.csv" + +// scalastyle:off +spark.sql( + s""" + | LOAD DATA LOCAL INPATH '$path' + | INTO TABLE no_sort_columns_table + | OPTIONS('FILEHEADER'='shortField,intField,bigintField,doubleField,stringField,timestampField,decimalField,dateField,charField,floatField,complexData', + | 'COMPLEX_DELIMITER_LEVEL_1'='#') --- End diff -- added comments in line 74 and line 75 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1193: [CARBONDATA-1327] Add carbon sort column examples
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1193 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3195/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1193: [CARBONDATA-1327] Add carbon sort column examples
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1193 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/600/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1197: [CARBONDATA-1238] Decouple the datatype convert from...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1197 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/599/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1197: [CARBONDATA-1238] Decouple the datatype convert from...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1197 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3194/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129534273 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/PartitionInfo.java --- @@ -65,6 +65,31 @@ public PartitionInfo(List columnSchemaList, PartitionType partitio this.partitionIds = new ArrayList<>(); } + /** + * add partition means split default partition, add in last directly --- End diff -- default partition is 0, so why split partition ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129533186 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -308,6 +308,10 @@ @CarbonProperty public static final String NUM_CORES_COMPACTING = "carbon.number.of.cores.while.compacting"; /** + * Number of cores to be used while alter partition + */ + public static final String NUM_CORES_ALT_PARTITION = "carbon.number.of.cores.while.altPartition"; + /** --- End diff -- Add spaceline --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1197: [CARBONDATA-1238] Decouple the datatype convert from...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1197 Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/598/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1110: [CARBONDATA-1238] Decouple the datatype convert in c...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1110 raise a new PR #1197 , close the old one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1197: [CARBONDATA-1238] Decouple the datatype convert from...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1197 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3193/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1110: [CARBONDATA-1238] Decouple the datatype conve...
Github user chenliang613 closed the pull request at: https://github.com/apache/carbondata/pull/1110 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1197: [CARBONDATA-1238] Decouple the datatype convert from...
Github user asfgit commented on the issue: https://github.com/apache/carbondata/pull/1197 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1197: [CARBONDATA-1238] Decouple the datatype conve...
GitHub user chenliang613 opened a pull request: https://github.com/apache/carbondata/pull/1197 [CARBONDATA-1238] Decouple the datatype convert from Spark code in core module Decouple the datatype convert from Spark code in core module. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenliang613/carbondata decouple_sparkcode Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1197.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1197 commit 45e1d4c6cdf131743d0558e302ecd77a1aa9ef32 Author: chenliang613 Date: 2017-06-28T15:45:50Z [CARBONDATA-1238] Decouple the datatype convert from Spark code in core module [CARBONDATA-1238] Decouple the datatype convert from Spark code in core module --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1190: [CARBONDATA-1323] Presto Optimization for Int...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1190#discussion_r129528008 --- Diff: integration/presto/pom.xml --- @@ -228,6 +228,33 @@ true + + org.scala-tools + maven-scala-plugin --- End diff -- can you explain, why need add the plugin to pom file? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1195: [CARBONDATA-1281] Support multiple temp dirs for wri...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1195 @xuchuanyin everything looks ok, please do rebase. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user lionelcao commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129516143 --- Diff: integration/spark-common-test/src/test/resources/partition_data.csv --- @@ -0,0 +1,27 @@ +id,vin,logdate,phonenumber,country,area,salary --- End diff -- Oh, this file is copied from example package. Maybe I can reduce them and keep only one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user lionelcao commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129515426 --- Diff: conf/carbon.properties.template --- @@ -42,6 +42,9 @@ carbon.enableXXHash=true #carbon.max.level.cache.size=-1 #enable prefetch of data during merge sort while reading data from sort temp files in data loading #carbon.merge.sort.prefetch=true + Alter Partition Configuration +#Number of cores to be used while alter partition +carbon.number.of.cores.while.altPartition=2 --- End diff -- Yes, it will be used when take action of multiple segments in parallel. this configuration will allow user to set the threads according to their hardware. Sure, I will make the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user lionelcao commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129514953 --- Diff: integration/spark-common-test/src/test/resources/partition_data.csv --- @@ -0,0 +1,27 @@ +id,vin,logdate,phonenumber,country,area,salary --- End diff -- Hi @chenliang613 this csv data is already existed for partition example and test case. It's simple and clear to understand the partition concept. this PR just added two columns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1192: [CARBONDATA-940] alter table add/split partition for...
Github user lionelcao commented on the issue: https://github.com/apache/carbondata/pull/1192 # Feature Description This feature is to support ADD & SPLIT partition function on CarbonData. # Scope Support range partition and list partition table # Syntax Example Suppose one carbon table is list partitioned on COUNTRY column. Current partition definition is ('China', 'US', 'UK', 'India', 'Canada, Japan, South Korea, North Korea') ### add a partition ALTER TABLE t1 ADD PARTITION('Russia') ### split a partition ALTER TABLE t1 SPLIT PARTITION(5) INTO ('Canada', 'Japan', '(South Korea, North Korea)') # Modification ### parser added new parser to support alter table add/split partition statement ### validate new RangeInfo and ListInfo ensure new rangeInfo after adding/splitting is in correct order ensure new added listInfo is not existed before ensure the target split listInfo could be split ### read target partition data add function to read data in one segment and one partition ### use ALTER_PARTITION as key of temp directions add isAltPartitionFlow in getTempStoreLocationKey function ### repartition and write data decode the partition column and repartition write to new data blocks ### refresh cache drop old cache ### multi threads operation in different segments support make the changing of multiple segments in parallel. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129514137 --- Diff: conf/carbon.properties.template --- @@ -42,6 +42,9 @@ carbon.enableXXHash=true #carbon.max.level.cache.size=-1 #enable prefetch of data during merge sort while reading data from sort temp files in data loading #carbon.merge.sort.prefetch=true + Alter Partition Configuration +#Number of cores to be used while alter partition +carbon.number.of.cores.while.altPartition=2 --- End diff -- 1. Please check whether the parameter "carbon.number.of.cores.while.altPartition=2" is necessary , or not ? 2. If yes, suggest directly using : carbon.number.of.cores.while.alterPartition --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129513477 --- Diff: integration/spark-common-test/src/test/resources/partition_data.csv --- @@ -0,0 +1,27 @@ +id,vin,logdate,phonenumber,country,area,salary --- End diff -- can you try to reuse the current csv files or generate data. Don't suggest adding so many csv file to repo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1193: [CARBONDATA-1327] Add carbon sort column exam...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1193#discussion_r129511350 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSortColumnsExample.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.spark.sql.SparkSession + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties + +object CarbonSortColumnsExample { + + def main(args: Array[String]) { +val rootPath = new File(this.getClass.getResource("/").getPath ++ "../../../..").getCanonicalPath +val storeLocation = s"$rootPath/examples/spark2/target/store" +val warehouse = s"$rootPath/examples/spark2/target/warehouse" +val metastoredb = s"$rootPath/examples/spark2/target" + +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd HH:mm:ss") + .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/MM/dd") + .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true") + +import org.apache.spark.sql.CarbonSession._ +val spark = SparkSession + .builder() + .master("local") + .appName("CarbonSortColumnsExample") + .config("spark.sql.warehouse.dir", warehouse) + .config("spark.driver.host", "localhost") + .getOrCreateCarbonSession(storeLocation, metastoredb) + +spark.sparkContext.setLogLevel("WARN") + +spark.sql("DROP TABLE IF EXISTS sort_columns_table") + +// Create table with no sort columns +spark.sql( + s""" + | CREATE TABLE no_sort_columns_table( + | shortField SHORT, + | intField INT, + | bigintField LONG, + | doubleField DOUBLE, + | stringField STRING, + | timestampField TIMESTAMP, + | decimalField DECIMAL(18,2), + | dateField DATE, + | charField CHAR(5), + | floatField FLOAT, + | complexData ARRAY + | ) + | STORED BY 'carbondata' + | TBLPROPERTIES('SORT_COLUMNS'='') + """.stripMargin) + +// Create table with sort columns +spark.sql( + s""" + | CREATE TABLE sort_columns_table( + | shortField SHORT, + | intField INT, + | bigintField LONG, + | doubleField DOUBLE, + | stringField STRING, + | timestampField TIMESTAMP, + | decimalField DECIMAL(18,2), + | dateField DATE, + | charField CHAR(5), + | floatField FLOAT, + | complexData ARRAY + | ) + | STORED BY 'carbondata' + | TBLPROPERTIES('SORT_COLUMNS'='intField, stringField, charField') + """.stripMargin) + +val path = s"$rootPath/examples/spark2/src/main/resources/data.csv" + +// scalastyle:off +spark.sql( + s""" + | LOAD DATA LOCAL INPATH '$path' + | INTO TABLE no_sort_columns_table + | OPTIONS('FILEHEADER'='shortField,intField,bigintField,doubleField,stringField,timestampField,decimalField,dateField,charField,floatField,complexData', + | 'COMPLEX_DELIMITER_LEVEL_1'='#') --- End diff -- Currently, sort_column don't support "float,double,decimal", please add the comment in this example, but can support other numeric type(like : int,long) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1196: Rebase datamap onto master
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1196 please change the PR name to : Rebase datamap branch onto master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1193: [CARBONDATA-1327] Add carbon sort column exam...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1193#discussion_r129509969 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSortColumnsExample.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import java.io.File + +import org.apache.spark.sql.SparkSession + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties + +object CarbonSortColumnsExample { + + def main(args: Array[String]) { +val rootPath = new File(this.getClass.getResource("/").getPath ++ "../../../..").getCanonicalPath +val storeLocation = s"$rootPath/examples/spark2/target/store" +val warehouse = s"$rootPath/examples/spark2/target/warehouse" +val metastoredb = s"$rootPath/examples/spark2/target" + +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd HH:mm:ss") + .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/MM/dd") + .addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true") --- End diff -- can you explain ,why add : addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE_LOADING, "true") --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1193: [CARBONDATA-1327] Add carbon sort column exam...
Github user chenliang613 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1193#discussion_r129508099 --- Diff: docs/ddl-operation-on-carbondata.md --- @@ -101,6 +101,14 @@ The following DDL operations are supported in CarbonData : - All dimensions except complex datatype columns are part of multi dimensional key(MDK). This behavior can be overridden by using TBLPROPERTIES. If the user wants to keep any column (except columns of complex datatype) in multi dimensional key then he can keep the columns either in DICTIONARY_EXCLUDE or DICTIONARY_INCLUDE. + - **Sort Columns Configuration** + + It is used to specify the multi dimensional key(MDK) columns. By default MDK is composed of all dimension columns except complex datatype column. + --- End diff -- here, need give the description for "SORT_COLUMN" property: "SORT_COLUMN" property is for users to specify which columns belong to the MDK index. If user don't specify "SORT_COLUMN" property, by default MDK index be built by using all dimension columns except complex datatype column. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #943: [CARBONDATA-1086]Added documentation for BATCH SORT S...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/943 Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/597/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---