[GitHub] incubator-carbondata pull request #207: [CARBONDATA-283] VT enhancement for ...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/207#discussion_r86347851 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonTableStatusUtil.java --- @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.processing.util; + +import java.text.SimpleDateFormat; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Date; +import java.util.List; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.load.LoadMetadataDetails; + +/** + * This class contains all table status file utilities + */ +public final class CarbonTableStatusUtil { + private static final LogService LOGGER = + LogServiceFactory.getLogService(CarbonTableStatusUtil.class.getName()); + + private CarbonTableStatusUtil() { + + } + + /** + * updates table status details using latest metadata + * + * @param oldMetadata + * @param newMetadata + * @return + */ + + public static List updateLatestTableStatusDetails( + LoadMetadataDetails[] oldMetadata, LoadMetadataDetails[] newMetadata) { + +List newListMetadata = +new ArrayList(Arrays.asList(newMetadata)); +for (LoadMetadataDetails oldSegment : oldMetadata) { + if (CarbonCommonConstants.MARKED_FOR_DELETE.equalsIgnoreCase(oldSegment.getLoadStatus())) { + updateSegmentMetadataDetails(newListMetadata.get(newListMetadata.indexOf(oldSegment))); + } +} +return newListMetadata; + } + + /** + * returns current time + * + * @return + */ + private static String readCurrentTime() { +SimpleDateFormat sdf = new SimpleDateFormat(CarbonCommonConstants.CARBON_TIMESTAMP); +String date = null; + +date = sdf.format(new Date()); + +return date; + } + + /** + * updates segment status and modificaton time details + * + * @param loadMetadata + */ + public static void updateSegmentMetadataDetails(LoadMetadataDetails loadMetadata) { --- End diff -- Move these function to status Manager --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #237: [CARBONDATA-317] - CSV having only s...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/237#discussion_r83584249 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/csv/CarbonCsvRelation.scala --- @@ -148,6 +150,10 @@ case class CarbonCsvRelation protected[spark] ( .withSkipHeaderRecord(false) CSVParser.parse(firstLine, csvFormat).getRecords.get(0).asScala.toArray } + if(null == firstRow) { +throw new DataLoadingException("Please check your input path and make sure " + --- End diff -- "Give a message First line cannot be empty", irrespective of header present or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #189: [CARBONDATA-267] Set block_size for ...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/189#discussion_r81474992 --- Diff: format/src/main/thrift/schema.thrift --- @@ -124,6 +124,7 @@ struct TableSchema{ 1: required string table_id; // ID used to 2: required list table_columns; // Columns in the table 3: required SchemaEvolution schema_evolution; // History of schema evolution of this table + 4: optional i32 block_size --- End diff -- @Zhangshunyu Actually need not change Schema for such extra properties which are being added. We can directly store TABLE_PROPERTIES as a string key value pairs in thrift. So that all such properties configuration parameters can be effectively handled without modifying thrift in future. @jackylk please comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #190: [CARBONDATA-268]Improve carbonoptimi...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/190#discussion_r80220071 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonOptimizer.scala --- @@ -68,9 +70,50 @@ object CarbonOptimizer { */ class ResolveCarbonFunctions(relations: Seq[CarbonDecoderRelation]) extends Rule[LogicalPlan] with PredicateHelper { - + val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) def apply(plan: LogicalPlan): LogicalPlan = { -transformCarbonPlan(plan, relations) +if (relations.nonEmpty && !isOptimized(plan)) { + LOGGER.info("Starting to optimize plan") + val startTime = System.currentTimeMillis + val result = transformCarbonPlan(plan, relations) + LOGGER.info("Time taken to optimize plan: " + ( System.currentTimeMillis - startTime)) --- End diff -- add this to STATISTICS logger type --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #190: [CARBONDATA-268]Improve carbonoptimi...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/190#discussion_r80220168 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonOptimizer.scala --- @@ -397,20 +452,38 @@ class ResolveCarbonFunctions(relations: Seq[CarbonDecoderRelation]) case others => others } +} + +val tStartTime = System.currentTimeMillis --- End diff -- this variable not used --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #194: [CARBONDATA-270] Double data type va...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/194#discussion_r80221140 --- Diff: core/src/main/java/org/apache/carbondata/scan/filter/FilterUtil.java --- @@ -1401,8 +1401,7 @@ public static void logError(Throwable e, boolean invalidRowsPresent) { public static boolean nanSafeEqualsDoubles(Double d1, Double d2) { Boolean xIsNan = Double.isNaN(d1); Boolean yIsNan = Double.isNaN(d2); -if ((xIsNan && yIsNan) || (d1.doubleValue() == d2.doubleValue())) { - +if ((d1.doubleValue() == d2.doubleValue()) || (xIsNan && yIsNan)) { --- End diff -- Boolean xIsNan = Double.isNaN(d1); these variables are already computed, no use of moving them. Instead use ( Double.isNaN(d1) && Double.isNaN(d2)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #190: [CARBONDATA-268]Improve carbonoptimi...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/190#discussion_r80219989 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonOptimizer.scala --- @@ -68,9 +70,50 @@ object CarbonOptimizer { */ class ResolveCarbonFunctions(relations: Seq[CarbonDecoderRelation]) extends Rule[LogicalPlan] with PredicateHelper { - + val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) def apply(plan: LogicalPlan): LogicalPlan = { -transformCarbonPlan(plan, relations) +if (relations.nonEmpty && !isOptimized(plan)) { + LOGGER.info("Starting to optimize plan") + val startTime = System.currentTimeMillis + val result = transformCarbonPlan(plan, relations) + LOGGER.info("Time taken to optimize plan: " + ( System.currentTimeMillis - startTime)) --- End diff -- change the statement to "Time taken for Carbon Optimizer to optimize" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #190: [CARBONDATA-268]Improve carbonoptimi...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/190#discussion_r80219576 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonOptimizer.scala --- @@ -87,20 +130,32 @@ class ResolveCarbonFunctions(relations: Seq[CarbonDecoderRelation]) return plan } var decoder = false +val mapOfNodes = new java.util.HashMap[LogicalPlan, ExtraNodeInfo] +fillNodeInfo(plan, mapOfNodes) val aliasMap = CarbonAliasDecoderRelation() // collect alias information before hand. collectInformationOnAttributes(plan, aliasMap) -val transFormedPlan = - plan transformDown { -case cd: CarbonDictionaryTempDecoder if cd.isOuter => - decoder = true - cd + +def hasCarbonRelation(currentPlan: LogicalPlan): Boolean = { + val extraNodeInfo = mapOfNodes.get(currentPlan) --- End diff -- Change the name of mapofNodes to mapOfNonCarbonPlanNodes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/158#discussion_r79290374 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala --- @@ -102,7 +115,7 @@ class CarbonScanRDD[V: ClassTag]( val splits = carbonInputFormat.getSplits(job) if (!splits.isEmpty) { val carbonInputSplits = splits.asScala.map(_.asInstanceOf[CarbonInputSplit]) - + queryModel.setInvalidSegmentIds(validAndInvalidSegments.getInvalidSegments) --- End diff -- move this to common getSplits, other wise validAndInvalidSegments can be null, if parallel deletion happens. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/158#discussion_r79290196 --- Diff: hadoop/src/test/java/org/apache/carbondata/hadoop/ft/CarbonInputMapperTest.java --- @@ -129,6 +132,37 @@ private int countTheColumns(String outPath) throws Exception { return 0; } + private void runJob(String outPath, CarbonProjection projection, Expression filter) --- End diff -- move back the code to original place --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/158#discussion_r79290267 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java --- @@ -706,8 +725,9 @@ private String getUpdateExtension() { /** * @return updateExtension */ - private String[] getValidSegments(JobContext job) throws IOException { -String segmentString = job.getConfiguration().get(INPUT_SEGMENT_NUMBERS, ""); + private String[] getSegmentsFromConfiguration(JobContext job, String segmentType) + throws IOException { +String segmentString = job.getConfiguration().get(segmentType, ""); --- End diff -- change signature to previous one --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/161#discussion_r79289289 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala --- @@ -102,6 +102,11 @@ class CarbonMergerRDD[K, V]( var dataloadStatus = CarbonCommonConstants.STORE_LOADSTATUS_FAILURE val carbonSparkPartition = theSplit.asInstanceOf[CarbonSparkPartition] +// get destination segment properties as sent from driver which is of last segment. + +val segmentProperties = new SegmentProperties(carbonMergerMapping.columnSchemaList.asJava, --- End diff -- rename it to maxSegmentcolumnSchemaList and maxSegmentColCardinality. Write comment where it is declared --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #161: [CARBONDATA-246] compaction is wrong...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/161#discussion_r79289428 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala --- @@ -259,6 +253,9 @@ class CarbonMergerRDD[K, V]( ) ) + // keep on assigning till last one is reached. + blocksOfLastSegment = blocksOfOneSegment.asJava --- End diff -- add null check and size check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/158#discussion_r79288466 --- Diff: core/src/main/java/org/apache/carbondata/core/carbon/datastore/BlockIndexStore.java --- @@ -260,11 +295,29 @@ public void removeTableBlocks(List removeTableBlocksInfos, } Map<TableBlockInfo, AbstractIndex> map = tableBlocksMap.get(absoluteTableIdentifier); // if there is no loaded blocks then return -if (null == map) { +if (null == map || map.isEmpty()) { + return; +} +Map<String, List> segmentIdToBlockInfoMap = +segmentIdToBlockListMap.get(absoluteTableIdentifier); +if (null == segmentIdToBlockInfoMap || segmentIdToBlockInfoMap.isEmpty()) { return; } -for (TableBlockInfo blockInfos : removeTableBlocksInfos) { - map.remove(blockInfos); +synchronized (lockObject) { + for (String segmentId : segmentsToBeRemoved) { +List tableBlockInfoList = segmentIdToBlockInfoMap.get(segmentId); +if (null == tableBlockInfoList) { + continue; +} +Iterator tableBlockInfoIterator = tableBlockInfoList.iterator(); +while (tableBlockInfoIterator.hasNext()) { + TableBlockInfo info = tableBlockInfoIterator.next(); + AbstractIndex remove = map.remove(info); + if (null != remove) { --- End diff -- tableBlockInfoIterator.remove needs to called irrespective of null != remove --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #168: [CARBONDATA-251] making the auto com...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/168#discussion_r79287073 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -623,14 +623,7 @@ object CarbonDataRDDFactory extends Logging { } } } - if(compactionModel.isDDLTrigger) { -// making this an blocking call for DDL -compactionThread.run() - } - else { -// non blocking call in case of auto compaction. -compactionThread.start() - } +compactionThread.run() --- End diff -- Add a comment, why run is called instead of start, future will make it concurrent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #168: [CARBONDATA-251] making the auto com...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/168#discussion_r79287056 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -920,7 +920,7 @@ * Default value of Property for enabling system level compaction lock.1 compaction can run * at once. */ - public static String DEFAULT_ENABLE_CONCURRENT_COMPACTION = "false"; + public static String DEFAULT_ENABLE_CONCURRENT_COMPACTION = "true"; --- End diff -- Add comment that this property "carbon.concurrent.compaction" is @Depricated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/158#discussion_r79222167 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java --- @@ -101,6 +106,8 @@ //comma separated list of input segment numbers public static final String INPUT_SEGMENT_NUMBERS = "mapreduce.input.carboninputformat.segmentnumbers"; + public static final String INVALID_SEGMENT_NUMBERS = + "mapreduce.input.carboninputformat.invalidsegmentnumbers"; --- End diff -- Invalid segment deletion, need not be through CarbonInputFormat, When Invalid segments list given to Btree(both in Driver and executor it should able it delete invalid blocks). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #155: [CARBONDATA-239] Handling the except...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/155#discussion_r79208835 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -590,16 +605,25 @@ object CarbonDataRDDFactory extends Logging { executor, sqlContext, kettleHomePath, storeLocation ) } +catch { + case e: Exception => +logger.error("Exception in compaction thread for table " + tableForCompaction + .carbonTable.getDatabaseName + "." + + tableForCompaction.carbonTableIdentifier + .getTableName) + // not handling the exception. only logging as this is not the table triggered + // by user. +} finally { - // delete the compaction required file + // delete the compaction required file in case of failure or success also. --- End diff -- If compact file deletion fails, add failed table to skipped list. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/158#discussion_r79003748 --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java --- @@ -102,6 +91,60 @@ public long getTableStatusLastModifiedTime() throws IOException { /** * get valid segment for given table + * + * @return + * @throws IOException + */ + public InvalidSegmentsInfo getInvalidSegments() throws IOException { --- End diff -- This requires reading SegmentInfo twice, once for valid blocks and next for Invalid Blocks. Instead send a single class containting ValidAndInvalidBlocks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #140: [CARBONDATA-227] In block distributi...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/140#discussion_r77950753 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala --- @@ -73,7 +73,7 @@ class CarbonScanRDD[V: ClassTag]( baseStoreLocation: String) extends RDD[V](sc, Nil) with Logging { - val defaultParallelism = sc.defaultParallelism + var defaultParallelism = sc.defaultParallelism --- End diff -- now member is not required, make it a private val. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #137: [CARBONDATA-222] Handled query issue...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/137#discussion_r77950155 --- Diff: core/src/main/java/org/apache/carbondata/core/carbon/datastore/impl/btree/BTreeDataRefNodeFinder.java --- @@ -240,9 +240,16 @@ private int compareIndexes(IndexKey first, IndexKey second) { firstNoDictionaryKeyBuffer.getShort(nonDictionaryKeyOffset + SHORT_SIZE_IN_BYTES); secondNodeDictionaryLength = secondNoDictionaryKeyBuffer.getShort(nonDictionaryKeyOffset + SHORT_SIZE_IN_BYTES); - compareResult = ByteUtil.UnsafeComparer.INSTANCE - .compareTo(first.getNoDictionaryKeys(), actualOffset, firstNoDcitionaryLength, - second.getNoDictionaryKeys(), actualOffset, secondNodeDictionaryLength); + int minLength = Math.min(firstNoDcitionaryLength, secondNodeDictionaryLength); --- End diff -- @kumarvishal09 please comment on this. I am not getting how actualOffset is same for both keybuffers. Also as I know length can be calculated by nextOffset-currentOffset. Please give the structure of data stored and how this function works --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #139: [CARBONDATA-224]Fixed data mismatch ...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/139#discussion_r77947424 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonSqlParser.scala --- @@ -864,8 +864,8 @@ class CarbonSqlParser() /** * detects whether double or decimal column is part of dictionary_exclude */ - def isDoubleDecimalColDictionaryExclude(columnDataType: String): Boolean = { -val dataTypes = Array("double", "decimal") + def isNumberInColDictionaryExclude(columnDataType: String): Boolean = { + val dataTypes = Array("double", "decimal", "int") --- End diff -- Add a reverse check that only string or (varchar-check) is only supported. As other datatypes like bigint etc will have problem in above check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #116: [CARBONDATA-198] System level lock f...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/116#discussion_r77129093 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -333,7 +531,7 @@ object CarbonDataRDDFactory extends Logging { ) } -var loadsToMerge = CarbonDataMergerUtil.identifySegmentsToBeMerged( +val loadsToMerge = CarbonDataMergerUtil.identifySegmentsToBeMerged( --- End diff -- this condition is not required before thread launch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #116: [CARBONDATA-198] System level lock f...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/116#discussion_r77128213 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -297,15 +366,144 @@ object CarbonDataRDDFactory extends Logging { } else { logger -.audit("Not able to acquire the compaction lock for table " + +.audit("Not able to acquire the system level compaction lock for table " + s"${carbonLoadModel.getDatabaseName}.${carbonLoadModel.getTableName}" ) logger .error("Not able to acquire the compaction lock for table " + carbonLoadModel .getDatabaseName + "." + carbonLoadModel.getTableName ) - sys.error("Table is already locked for compaction. Please try after some time.") + if (!CarbonCompactionUtil +.createCompactionRequiredFile(carbonTable.getMetaDataFilepath, compactionType)) { +logger.error("Not able to create a compaction required file for table " + carbonLoadModel + .getDatabaseName + "." + carbonLoadModel.getTableName +) + } + else { +logger + .info("successfully created a compaction required file for table " + carbonLoadModel +.getDatabaseName + "." + carbonLoadModel.getTableName + ) + } + sys.error("System is already locked for compaction. Please try after some time.") --- End diff -- change message to "Compaction is in progress, compaction request for table "" in queue". Print error only during ddl trigger. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #116: [CARBONDATA-198] System level lock f...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/116#discussion_r77127755 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -297,15 +366,144 @@ object CarbonDataRDDFactory extends Logging { } else { logger -.audit("Not able to acquire the compaction lock for table " + +.audit("Not able to acquire the system level compaction lock for table " + s"${carbonLoadModel.getDatabaseName}.${carbonLoadModel.getTableName}" ) logger .error("Not able to acquire the compaction lock for table " + carbonLoadModel .getDatabaseName + "." + carbonLoadModel.getTableName ) - sys.error("Table is already locked for compaction. Please try after some time.") + if (!CarbonCompactionUtil +.createCompactionRequiredFile(carbonTable.getMetaDataFilepath, compactionType)) { --- End diff -- move printing log to function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #111: [CARBONDATA-194] ArrayIndexOfBoundEx...
Github user gvramana commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/111#discussion_r76936740 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/UnivocityCsvParser.java --- @@ -104,6 +108,20 @@ public void initialize() throws IOException { } /** + * This method will decide the number of coulmns to be parsed for a row by univocity parser + * + * @param columnCountInSchema total number of columns in schema + * @return + */ + private int getMaxColumnsForParsing(int columnCountInSchema) { +int maxNumberOfColumnsForParsing = columnCountInSchema; +if (columnCountInSchema < MAX_NUMBER_OF_COLUMNS_FOR_PARSING) { + maxNumberOfColumnsForParsing = MAX_NUMBER_OF_COLUMNS_FOR_PARSING; +} +return maxNumberOfColumnsForParsing; --- End diff -- Add +10 if schema columns are considered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---