[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/594#discussion_r101460125 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/Sorter.java --- @@ -39,11 +39,13 @@ * Sorts the data of all iterators, this iterators can be * read parallely depends on implementation. * - * @param iterators array of iterators to read data. * @return * @throws CarbonDataLoadingException */ - Iterator[] sort(Iterator[] iterators) + Iterator[] sort() + throws CarbonDataLoadingException; + + void prepare(Iterator[] iterators) --- End diff -- Better to invoke child.close() before final merger --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/594#discussion_r101432953 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/IntermediateFileMerger.java --- @@ -116,8 +116,15 @@ public IntermediateFileMerger(SortParameters mergerParameters, File[] intermedia writeDataTofile(next()); } } else { +int i = 0; while (hasNext()) { + i++; writeDataTofileWithOutKettle(next()); + if (i % 1 == 0) { --- End diff -- ok, I will test a better value of defualt buffer size. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/594#discussion_r101436969 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/IntermediateFileMerger.java --- @@ -116,8 +116,15 @@ public IntermediateFileMerger(SortParameters mergerParameters, File[] intermedia writeDataTofile(next()); } } else { +int i = 0; while (hasNext()) { + i++; writeDataTofileWithOutKettle(next()); + if (i % 1 == 0) { --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/594#discussion_r101436963 --- Diff: processing/src/main/java/org/apache/carbondata/processing/sortandgroupby/sortdata/SortDataRows.java --- @@ -375,6 +376,9 @@ private void writeDataWithOutKettle(Object[][] recordHolderList, int entryCountL stream.write((byte) 0); } } +if (i % 1 == ) { --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/594#discussion_r101462875 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/Sorter.java --- @@ -39,11 +39,13 @@ * Sorts the data of all iterators, this iterators can be * read parallely depends on implementation. * - * @param iterators array of iterators to read data. * @return * @throws CarbonDataLoadingException */ - Iterator[] sort(Iterator[] iterators) + Iterator[] sort() + throws CarbonDataLoadingException; + + void prepare(Iterator[] iterators) --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #596: [WIP]Test for repository
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/596 Looks good --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #594: [CARBONDATA-701]Fix memory leak issu...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/594#discussion_r101432584 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/sort/impl/ParallelReadMergeSorterImpl.java --- @@ -86,11 +88,10 @@ public void initialize(SortParameters sortParameters) { sortParameters.getNoDictionaryDimnesionColumn(), sortParameters.isUseKettle()); } - @Override - public Iterator[] sort(Iterator[] iterators) + public void prepare(Iterator[] iterators) --- End diff -- Yes, but the method initialize is exists. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #539: [CARBONDATA-659]add WhitespaceAround and Pa...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/539 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #528: [CARBONDATA-617]Fix InsertInto test ...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/528#discussion_r96129390 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonMetastore.scala --- @@ -782,7 +782,47 @@ case class CarbonRelation( nullable = true)()) } - override val output = dimensionsAttr ++ measureAttr + override val output = { +val columns = tableMeta.carbonTable.getCreateOrderColumn(tableMeta.carbonTable.getFactTableName) + .asScala +columns.filter(!_.isInvisible).map { column => --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #528: [CARBONDATA-617]Fix InsertInto test ...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/528#discussion_r96129389 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/CsvInput.java --- @@ -194,28 +196,25 @@ public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws K } class RddScanCallable implements Callable { -List<CarbonIterator<String[]>> iterList; - -RddScanCallable() { - this.iterList = new ArrayList<CarbonIterator<String[]>>(1000); -} - -public void addJavaRddIterator(CarbonIterator<String[]> iter) { - this.iterList.add(iter); -} - -@Override -public Void call() throws Exception { - StandardLogService.setThreadName(("PROCESS_DataFrame_PARTITIONS"), - Thread.currentThread().getName()); +@Override public Void call() throws Exception { + StandardLogService + .setThreadName(("PROCESS_DataFrame_PARTITIONS"), Thread.currentThread().getName()); try { String[] values = null; -for (CarbonIterator<String[]> iter: iterList) { - iter.initialize(); - while (iter.hasNext()) { -values = iter.next(); -synchronized (putRowLock) { - putRow(data.outputRowMeta, values); +boolean hasNext = true; +CarbonIterator<String[]> iter; +boolean isInitialized = false; +while (hasNext) { + iter = getRddIterator(isInitialized); --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #537: fix unapproved licenses
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/537 @ravipesala already added header check to java stype --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #537: fix unapproved licenses
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/537 @chenliang613 java's license header is the same with scala's. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #537: fix unapproved licenses
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/537#discussion_r96141120 --- Diff: pom.xml --- @@ -439,15 +439,22 @@ **/*.csv **/*.dictionary **/*.ktr +**/*.rat **/_SUCCESS **/non-csv **/.invisibilityfile **/noneCsvFormat.cs **/org.apache.spark.sql.sources.DataSourceRegister + **/org.apache.spark.sql.test.TestQueryExecutorRegister **/derby.log **/meta.lock **/loadmetadata.metadata **/modifiedTime.mdt +**/PULL_REQUEST_TEMPLATE.md +**/dict.txt +**/dict.txt --- End diff -- fixed. remove one --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #537: fix unapproved licenses
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/537 @chenliang613 @jackylk @ravipesala now the license header of Java file is different with the license header of Scala file. Which one we should choose? java file header: ``` /* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * *http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY * KIND, either express or implied. See the License for the * specific language governing permissions and limitations * under the License. */ ``` scala file header:(same with spark) ``` /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * *http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #537: fix unapproved licenses
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/537 I agree. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #217: [CARBONDATA-287]Using multi local directory...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/217 close this pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #217: [CARBONDATA-287]Using multi local di...
Github user QiangCai closed the pull request at: https://github.com/apache/incubator-carbondata/pull/217 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #285: [WIP]Insert into carbon table feature
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/285 @ashokblend please rebase --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #360: [CARBONDATA-462] Clean up carbonTableSchema...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/360 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #278: [CARBONDATA-368]Imporve performance of data...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/278 http://136.243.101.176:8080/job/ApacheCarbonManualPRBuilder/682/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #363: [CARBONDATA-461] clean partitioner in carbo...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/363 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #278: [CARBONDATA-368]Imporve performance ...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/278#discussion_r89950862 --- Diff: processing/src/main/java/org/apache/carbondata/processing/csvreaderstep/CsvInput.java --- @@ -384,21 +383,75 @@ public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws K CarbonTimeStatisticsFactory.getLoadStatisticsInstance().recordCsvInputStepTime( meta.getPartitionID(), System.currentTimeMillis()); } else { - scanRddIterator(); + scanRddIterator(numberOfNodes); } setOutputDone(); return false; } - private void scanRddIterator() throws RuntimeException { -Iterator<String[]> iterator = RddInputUtils.getAndRemove(rddIteratorKey); -if (iterator != null) { - try{ -while(iterator.hasNext()){ - putRow(data.outputRowMeta, iterator.next()); + class RddScanCallable implements Callable { +List<JavaRddIterator<String[]>> iterList; + +RddScanCallable() { + this.iterList = new ArrayList<JavaRddIterator<String[]>>(1000); +} + +public void addJavaRddIterator(JavaRddIterator<String[]> iter) { + this.iterList.add(iter); +} + +@Override +public Void call() throws Exception { + StandardLogService.setThreadName(("PROCESS_DataFrame_PARTITIONS"), + Thread.currentThread().getName()); + try { +String[] values = null; +for (JavaRddIterator<String[]> iter: iterList) { + iter.initialize(); + while (iter.hasNext()) { +values = iter.next(); +synchronized (putRowLock) { + putRow(data.outputRowMeta, values); +} + } +} + } catch (Exception e) { +LOGGER.error(e, "Scan rdd during data load is terminated due to error."); +throw e; + } + return null; +} + }; --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #278: [CARBONDATA-368]Imporve performance ...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/278#discussion_r89950872 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -932,7 +942,8 @@ object CarbonDataRDDFactory { loadDataFile() } val newStatusMap = scala.collection.mutable.Map.empty[String, String] -status.foreach { eachLoadStatus => +if (status.nonEmpty) { + status.foreach { eachLoadStatus => val state = newStatusMap.get(eachLoadStatus._1) --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #278: [CARBONDATA-368]Imporve performance ...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/278#discussion_r89950829 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataLoadRDD.scala --- @@ -548,77 +552,53 @@ class DataFrameLoaderRDD[K, V]( override protected def getPartitions: Array[Partition] = firstParent[Row].partitions } +class PartitionIterator(partitionIter: Iterator[DataLoadPartitionWrap[Row]], +carbonLoadModel: CarbonLoadModel, +context: TaskContext) extends JavaRddIterator[JavaRddIterator[Array[String]]] { --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #278: [CARBONDATA-368]Imporve performance ...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/278#discussion_r88588734 --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala --- @@ -164,4 +165,53 @@ object CarbonScalaUtil extends Logging { kettleHomePath } + def getString(value: Any, + serializationNullFormat: String, + delimiterLevel1: String, + delimiterLevel2: String, + format: SimpleDateFormat, + level: Int = 1): String = { +value == null match { + case true => serializationNullFormat --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #278: [CARBONDATA-368]Imporve performance of data...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/278 Rebase done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #366: [WIP][CARBONDATA-368]Insert into carbon tab...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/366 please rebase --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #413: [CARBONDATA-516][SPARK2]fix union is...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/413 [CARBONDATA-516][SPARK2]fix union issue in CarbonLateDecoderRule In spark2, Union class is no longer the sub-class of BinaryNode. We need fix union issue in CarbonLateDecoderRule for spark2. You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata fixUnionIssue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/413.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #413 commit a86342453ab969107944182ead36ac4cf80f74ef Author: QiangCai <qiang...@qq.com> Date: 2016-12-08T11:06:33Z fixUnionIssue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #339: [CARBONDATA-429][WIP] Remove unneces...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/339#discussion_r92332659 --- Diff: core/src/main/java/org/apache/carbondata/core/cache/dictionary/ReverseDictionaryCache.java --- @@ -167,12 +167,9 @@ private Dictionary getDictionary( DictionaryColumnUniqueIdentifier dictionaryColumnUniqueIdentifier) throws CarbonUtilException { Dictionary reverseDictionary = null; -// create column dictionary info object only if dictionary and its -// metadata file exists for a given column identifier -if (!isFileExistsForGivenColumn(dictionaryColumnUniqueIdentifier)) { - throw new CarbonUtilException( - "Either dictionary or its metadata does not exist for column identifier :: " - + dictionaryColumnUniqueIdentifier.getColumnIdentifier()); +// create column dictionary info object only if it is primitive type. +if (dictionaryColumnUniqueIdentifier.getDataType().isComplexType()) { + return null; --- End diff -- We will not invoke getDictionary() method on complex type column directly. For complex type column, we call getDictionary() method on primitive type sub-column. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #411: [WIP]Support data type: date and char
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/411 @ravipesala Please correct CI to support -Pbuild-with-format. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #413: [CARBONDATA-516][SPARK2]fix union issue in ...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/413 @jackylk Added test case Local test case pass for spark1.5 and spark2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #439: [CARBONDATA-536]initialize updateTab...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/439 [CARBONDATA-536]initialize updateTableMetadata method in LoadTable for Spark2 For spark2, GlobalDictionaryUtil.updateTableMetadataFunc should been initialized You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata fixBugInLoadTable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/439.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #439 commit 1f28ef9864e2d45807bb7c6bb1cbb51f65f423e3 Author: QiangCai <qiang...@qq.com> Date: 2016-12-15T16:26:09Z fixLoadTableForSpark2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #345: [CARBONDATA-443]Nosort dataloading
Github user QiangCai closed the pull request at: https://github.com/apache/incubator-carbondata/pull/345 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #345: [CARBONDATA-443]Nosort dataloading
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/345 Close this PR. In the future, I will raise another PR to support mixed data format table. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #518: [WIP]unify file header reader
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/518 [WIP]unify file header reader You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata fileheader Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/518.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #518 commit 5440b9c16799d935f9da1728344564a65a2d6ef2 Author: QiangCai <qiang...@qq.com> Date: 2017-01-10T13:32:51Z readfileheader --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #524: [CARBONDATA-627]fix union test case ...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/524#discussion_r95714873 --- Diff: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/AllDataTypesTestCaseAggregate.scala --- @@ -59,21 +59,4 @@ class AllDataTypesTestCaseAggregate extends QueryTest with BeforeAndAfterAll { Seq(Row(15.8))) }) - test("CARBONDATA-60-union-defect")({ --- End diff -- Because the previous builder 559 added one test case, so the builder 560 has two deleted test case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #524: [CARBONDATA-627]fix union test case ...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/524 [CARBONDATA-627]fix union test case for spark2 Analyze: Union test case failed in spark2. The result of union query is twice of the result of left query. Root Cause: CarbonLateDecodeRule only use union.children.head plan to build all CarbonDictionaryTempDecoder. Changes: Use child plan to build each CarbonDictionaryTempDecoder. You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata fixUnionTestCase Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/524.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #524 commit 0abc4f8f1fe6cfe0e8fe8842f7b7ba40f1e191a7 Author: QiangCai <qiang...@qq.com> Date: 2017-01-11T15:47:25Z fixUnionTestCase --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #520: fix dependency issue for IntelliJ IDEA
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/520 close this pr. I didn't reproduce this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/518#discussion_r95518312 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -301,4 +304,45 @@ object CommonUtil { LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ newSplitSize.toString }") } } + + def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] = { +val delimiter = if (StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) { + CarbonCommonConstants.COMMA +} else { + CarbonUtil.delimiterConverter(carbonLoadModel.getCsvDelimiter) +} +var csvFile: String = null +var csvHeader: String = carbonLoadModel.getCsvHeader +val csvColumns = if (StringUtils.isBlank(csvHeader)) { + // read header from csv file + csvFile = carbonLoadModel.getFactFilePath.split(",")(0) + csvHeader = CarbonUtil.readHeader(csvFile) + if (StringUtils.isBlank(csvHeader)) { +throw new CarbonDataLoadingException("First line of the csv is not valid.") + } + csvHeader.toLowerCase().split(delimiter).map(_.replaceAll("\"", "").trim) +} else { + csvHeader.toLowerCase.split(CarbonCommonConstants.COMMA).map(_.trim) +} + +if (!CarbonDataProcessorUtil.isHeaderValid(carbonLoadModel.getTableName, csvColumns, +carbonLoadModel.getCarbonDataLoadSchema)) { + if (csvFile == null) { +LOGGER.error("CSV header provided in DDL is not proper." + + " Column names in schema and CSV header are not the same.") +throw new CarbonDataLoadingException( + "CSV header provided in DDL is not proper. Column names in schema and CSV header are " + + "not the same.") + } else { +LOGGER.error( + "CSV File provided is not proper. Column names in schema and csv header are not same. " --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/518#discussion_r95518311 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -373,83 +368,15 @@ private static void addAllComplexTypeChildren(CarbonDimension dimension, StringB return complexTypesMap; } - /** - * Get the csv file to read if it the path is file otherwise get the first file of directory. - * - * @param csvFilePath - * @return File - */ - public static CarbonFile getCsvFileToRead(String csvFilePath) { -CarbonFile csvFile = -FileFactory.getCarbonFile(csvFilePath, FileFactory.getFileType(csvFilePath)); - -CarbonFile[] listFiles = null; -if (csvFile.isDirectory()) { - listFiles = csvFile.listFiles(new CarbonFileFilter() { -@Override public boolean accept(CarbonFile pathname) { - if (!pathname.isDirectory()) { -if (pathname.getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION) || pathname - .getName().endsWith(CarbonCommonConstants.CSV_FILE_EXTENSION -+ CarbonCommonConstants.FILE_INPROGRESS_STATUS)) { - return true; -} - } - return false; -} - }); -} else { - listFiles = new CarbonFile[1]; - listFiles[0] = csvFile; -} -return listFiles[0]; - } - - /** - * Get the file header from csv file. - */ - public static String getFileHeader(CarbonFile csvFile) - throws DataLoadingException { -DataInputStream fileReader = null; -BufferedReader bufferedReader = null; -String readLine = null; - -FileType fileType = FileFactory.getFileType(csvFile.getAbsolutePath()); - -if (!csvFile.exists()) { - csvFile = FileFactory - .getCarbonFile(csvFile.getAbsolutePath() + CarbonCommonConstants.FILE_INPROGRESS_STATUS, - fileType); -} - -try { - fileReader = FileFactory.getDataInputStream(csvFile.getAbsolutePath(), fileType); - bufferedReader = - new BufferedReader(new InputStreamReader(fileReader, Charset.defaultCharset())); - readLine = bufferedReader.readLine(); -} catch (FileNotFoundException e) { - LOGGER.error(e, "CSV Input File not found " + e.getMessage()); - throw new DataLoadingException("CSV Input File not found ", e); -} catch (IOException e) { - LOGGER.error(e, "Not able to read CSV input File " + e.getMessage()); - throw new DataLoadingException("Not able to read CSV input File ", e); -} finally { - CarbonUtil.closeStreams(fileReader, bufferedReader); -} - -return readLine; - } - - public static boolean isHeaderValid(String tableName, String header, - CarbonDataLoadSchema schema, String delimiter) throws DataLoadingException { -delimiter = CarbonUtil.delimiterConverter(delimiter); + public static boolean isHeaderValid(String tableName, String[] csvHeader, + CarbonDataLoadSchema schema) throws DataLoadingException { --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/518#discussion_r95518309 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java --- @@ -462,6 +389,13 @@ public static boolean isHeaderValid(String tableName, String header, return count == columnNames.length; } + public static boolean isHeaderValid(String tableName, String header, + CarbonDataLoadSchema schema, String delimiter) throws DataLoadingException { +delimiter = CarbonUtil.delimiterConverter(delimiter); --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #530: fix default profile for spark-common...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/530 fix default profile for spark-common-test now the profile spark-1.6 should be active by default. You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata fixDefaultProfile Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/530.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #530 commit 67415ad64a823f5cf303cd22676b4f9cfc2b78f5 Author: QiangCai <qiang...@qq.com> Date: 2017-01-13T02:59:15Z fix default profile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #529: Fixed testcase issues in spark 1.6 and 2.1 ...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/529 @ravipesala ok, no conflict, just need do rebase. PR528 is for kettle flow in spark2 and move test case to spark-common-test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #529: [WIP]Fixed testcase issues in spark 1.6 and...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/529 @ravipesala In PR 528, I fixed InsertInto issue for kettle flow and move test case to spark-common-test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #528: [CARBONDATA-617]Fix InsertInto test ...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/528#discussion_r95973885 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonOptimizer.scala --- @@ -237,9 +237,15 @@ class ResolveCarbonFunctions(relations: Seq[CarbonDecoderRelation]) val leftCondAttrs = new util.HashSet[AttributeReferenceWrapper] val rightCondAttrs = new util.HashSet[AttributeReferenceWrapper] union.left.output.foreach(attr => --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #528: [CARBONDATA-617]Fix InsertInto test ...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/528#discussion_r95973907 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonMetastore.scala --- @@ -782,7 +781,49 @@ case class CarbonRelation( nullable = true)()) } - override val output = dimensionsAttr ++ measureAttr + override val output = { +val columns = tableMeta.carbonTable.getCreateOrderColumn(tableMeta.carbonTable.getFactTableName) + .asScala +columns.filter(!_.isInvisible).map { column => + if (column.isDimesion()) { +val output: DataType = column.getDataType.toString.toLowerCase match { + case "array" => + CarbonMetastoreTypes.toDataType(s"array<${getArrayChildren(column.getColName)}>") + case "struct" => + CarbonMetastoreTypes.toDataType(s"struct<${getStructChildren(column.getColName)}>") + case dType => +val dataType = addDecimalScaleAndPrecision(column, dType) +CarbonMetastoreTypes.toDataType(dataType) +} +AttributeReference(column.getColName, output, + nullable = true +)(qualifier = Option(tableName + "." + column.getColName)) --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #514: [CARBONDATA-614]Fix issue: Dictionary file ...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/514 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #514: [CARBONDATA-614]Fix issue: Dictionar...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/514 [CARBONDATA-614]Fix issue: Dictionary file name is locked for updation. 1. set carbon property CarbonCommonConstants.STORE_LOCATION for CarbonBlockDistinctValuesCombineRDD and CarbonGlobalDictionaryGenerateRDD to avoid java.lang.RuntimeException: Dictionary file name is locked for updation. 2. pass CARBON_TIMESTAMP_FORMAT to executor side from driver side during dcitonary generation 3. fix code style for carbonTableSchema.scala You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata fixDictLockedIssue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/514.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #514 commit 5ab7d80e7bd65f31b462d8687ee7aefd326cbc02 Author: QiangCai <qiang...@qq.com> Date: 2017-01-10T06:34:20Z fixDictLockIssue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #520: fix dependency issue for IntelliJ ID...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/520 fix dependency issue for IntelliJ IDEA When using profile spark-2.1, can not run test case of spark-common-test in IntelliJ IDEA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata fixIdeaMavenIssue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/520.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #520 commit 74d4bf8933540348525a16fdba361e780fe0f494 Author: QiangCai <qiang...@qq.com> Date: 2017-01-11T08:37:24Z fix dependency issue for IntelliJ IDEA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #518: [CARBONDATA-622]unify file header re...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/518#discussion_r95507937 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -301,4 +304,45 @@ object CommonUtil { LOGGER.info(s"mapreduce.input.fileinputformat.split.maxsize: ${ newSplitSize.toString }") } } + + def getCsvHeaderColumns(carbonLoadModel: CarbonLoadModel): Array[String] = { +val delimiter = if (StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) { --- End diff -- I think the delimiter maybe a blank " " --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #528: Fix InsertInto test case for spark2
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/528 Fix InsertInto test case for spark2 Changes: 1. move insertInto test case to spark-common-test module from spark module 2. add test case: insert into carbon table from carbon table union query 3. CarbonDecoderOptimizerHelper support InsertIntoTable for spark2 4. CreateTable and CarbonRelation use origin ordinal of columns for spark2 5. Optimize CSVInput for InsertInto to avoid to allocate too much memory at once. Impaction: 1. dataloading 2. query You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata fixInsertIntoFromUnionQuery Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/528.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #528 commit 96e94fb967936192520f64dd8404e148c7e5fad2 Author: QiangCai <qiang...@qq.com> Date: 2017-01-12T17:26:30Z fix InsertInto issue for spark2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #390: [CARBONDATA-492]fix a bug of profile...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/390 [CARBONDATA-492]fix a bug of profile spark-2.0 for intellij idea When profile spark-2.0 is chosen , CarbonExample have error in Intellij idea You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata fixprofileforidea Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/390.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #390 commit d626af30664948b149218859271af88cc41853e2 Author: QiangCai <qiang...@qq.com> Date: 2016-12-03T18:16:18Z fix profile issue for idea --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #377: [CARBONDATA-478]Spark2 module should...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/377 [CARBONDATA-478]Spark2 module should have different SparkRowReadSupportImpl with spark1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/377.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #377 commit 3bc55a38c5d645ca1e07381910692ac0b2bb6297 Author: QiangCai <qiang...@qq.com> Date: 2016-12-01T11:32:04Z fixLatedecoderIssueForSpark2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #384: [CARBONDATA-488][SPARK2]add InsertIn...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/384 [CARBONDATA-488][SPARK2]add InsertInto feature for spark2 1. add InsertInto feature for spark2 2. optimize CarbonExample to use relation path And use InsertInto to load data Link: [CARBONDATA-488](https://issues.apache.org/jira/browse/CARBONDATA-488) You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata insertinto_for_spark2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/384.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #384 commit a1b3b2962a8e12f09cf5efabc15de071e105c885 Author: QiangCai <qiang...@qq.com> Date: 2016-12-02T17:53:32Z insertinto for spark2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #382: [CARBONDATA-486]fix bug for reading ...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/382 [CARBONDATA-486]fix bug for reading dataframe concurrently Fix a insertinto bug for reading from hive table concurrently You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata fixbugforinsertinto2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/382.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #382 commit 9dcdf7de6bde64d1c800fd268f2099d2278e8f33 Author: QiangCai <qiang...@qq.com> Date: 2016-12-02T09:41:23Z fix bug for reading dataframe concurrently --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #403: [CARBONDATA-497][SPARK2]fix datatype...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/403 [CARBONDATA-497][SPARK2]fix datatype issue of CarbonLateDecoderRule 1. Fix the data type of dictionary dimension to resolve the logical plan 2. Perfect translateFilter method to push down more filters to CarbonScanRDD. 3. Add decimal type field to CarbonExample You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata fixbugforlatedecoder Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/403.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #403 commit 7159713725ac6bef057e27144021cdd06e4adba0 Author: QiangCai <qiang...@qq.com> Date: 2016-12-06T09:40:21Z fixlatedecoder --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #472: [CARBONDATA-568] clean up code for c...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/472#discussion_r94367280 --- Diff: core/src/main/java/org/apache/carbondata/core/carbon/datastore/chunk/reader/dimension/v2/CompressedDimensionChunkFileBasedReaderV2.java --- @@ -135,6 +135,7 @@ public CompressedDimensionChunkFileBasedReaderV2(final BlockletInfo blockletInfo dimensionChunk = fileReader.readByteArray(filePath, dimensionChunksOffset.get(blockIndex), dimensionChunksLength.get(blockIndex)); dimensionColumnChunk = CarbonUtil.readDataChunk(dimensionChunk); + assert dimensionColumnChunk != null; --- End diff -- I prefer to throw an exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #499: [CARBONDATA-218]fix data loading iss...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/499 [CARBONDATA-218]fix data loading issue for UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata fixDataLoadingIssue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/499.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #499 commit 0cfefbb450b596da23f87a9cab65016c94f96a0a Author: QiangCai <qiang...@qq.com> Date: 2017-01-05T03:03:25Z fixDataLoadingIssue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #450: [CARBONDATA-545]Added support for offheap s...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/450 @kumarvishal09 please rebase and fix some known issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #481: [CARBONDATA-601]reuse test case for ...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/481#discussion_r95060180 --- Diff: integration/spark-common-test/pom.xml --- @@ -0,0 +1,232 @@ + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + + 4.0.0 + + +org.apache.carbondata +carbondata-parent +1.0.0-incubating-SNAPSHOT +../../pom.xml + + + carbondata-spark-common-test + Apache CarbonData :: Spark Common Test + + +${basedir}/../../dev + + + + + org.apache.carbondata + carbondata-spark-common + ${project.version} + test + + + org.apache.spark + spark-hive-thriftserver_2.10 + + + + + org.apache.spark + spark-hive-thriftserver_${scala.binary.version} + test + + + junit + junit + + + org.scalatest + scalatest_${scala.binary.version} + 2.2.1 + test + + + + +src/test/scala + + +src/resources + + +. + + CARBON_SPARK_INTERFACELogResource.properties + + + + + +org.scala-tools +maven-scala-plugin +2.15.2 + + +compile + + compile + +compile + + +testCompile + + testCompile + +test + + +process-resources + + compile + + + + + +maven-compiler-plugin + + 1.7 + 1.7 + + + +org.apache.maven.plugins +maven-surefire-plugin +2.18 + + + +**/Test*.java +**/*Test.java +**/*TestCase.java +**/*Suite.java + + ${project.build.directory}/surefire-reports + -Xmx3g -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=512m + +true + + false + + + +org.scalatest +scalatest-maven-plugin +1.0 + + + ${project.build.directory}/surefire-reports + . + CarbonTestSuite.txt + -ea -Xmx3g -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=512m + + + + + +true +${use.kettle} + + + + +test + + test + + + + + + + + + spark-1.5 + +true + + + + org.apache.carbondata + carbondata-spark + ${project.version} + test + + + org.apache.spark + spark-hive-thriftserver_2.10 --- End diff -- This exclusion will be fixed dependency issue of Intellij Idea --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #481: [CARBONDATA-601]reuse test case for ...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/481#discussion_r95060705 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/test/TestQueryExecutorImplV1.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.test + +import org.apache.spark.{SparkConf, SparkContext} +import org.apache.spark.sql.{CarbonContext, DataFrame, SQLContext} + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties + +class TestQueryExecutorImplV1 extends TestQueryExecutorRegister { --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #508: [CARBONDATA-611] Make the default maven com...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/508 @ravipesala In my opinion, If some profile is active by default and only contains some property elements, we can remove this profile and add these property elements into properties element. For assembly/pom.xml, I prefer to add these three provided property elements into properties element, but not each profile. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #508: [CARBONDATA-611] Make the default maven com...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/508 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #481: [WIP]reuse test case for integration...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/481 [WIP]reuse test case for integration module You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata refactoryTestCase Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/481.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #481 commit ea6ac0d143a7af33124bc647c7e6a99dafe012c1 Author: QiangCai <qiang...@qq.com> Date: 2016-12-29T14:43:29Z reuse test case for integration module --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #494: [CARBONDATA-218]Using CSVInputFormat...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/494 [CARBONDATA-218]Using CSVInputFormat instead of spark-csv during dictionary geneartion 1. Using CSVInputFormat instead of spark-csv during dictionary geneartion 2. Remove spark-csv dependency from whole project. You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata unifyCsvReader Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/494.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #494 commit 34f55769d74969cff5d9a322ad8ae0cbf14befdc Author: QiangCai <qiang...@qq.com> Date: 2017-01-03T08:28:06Z unify csv reader --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #494: [CARBONDATA-218]Using CSVInputFormat...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/494#discussion_r94545504 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala --- @@ -356,37 +363,49 @@ object GlobalDictionaryUtil { */ def loadDataFrame(sqlContext: SQLContext, carbonLoadModel: CarbonLoadModel): DataFrame = { -val df = sqlContext.read - .format("com.databricks.spark.csv.newapi") - .option("header", { -if (StringUtils.isEmpty(carbonLoadModel.getCsvHeader)) { - "true" -} else { - "false" -} - }) - .option("delimiter", { -if (StringUtils.isEmpty(carbonLoadModel.getCsvDelimiter)) { - "" + DEFAULT_SEPARATOR -} else { - carbonLoadModel.getCsvDelimiter + val hadoopConfiguration = new Configuration() + CommonUtil.configureCSVInputFormat(hadoopConfiguration, carbonLoadModel) + hadoopConfiguration.set(FileInputFormat.INPUT_DIR, carbonLoadModel.getFactFilePath) --- End diff -- FileInputFormat.addInputPath method need a Job type paramter. In addition, this FactFilePath already consist of all file path, we can directly set input path, no need to separate path and add path again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #481: [CARBONDATA-601]reuse test case for integra...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/481 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #481: [CARBONDATA-601]reuse test case for integra...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/481 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #452: [CARBONDATA-546] Extract data manage...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/452#discussion_r93733275 --- Diff: integration/spark2/src/main/scala/org/apache/spark/util/TableAPIUtil.scala --- @@ -51,4 +56,15 @@ object TableAPIUtil { .config(CarbonCommonConstants.STORE_LOCATION, storePath) --- End diff -- Can you remove this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #452: [CARBONDATA-546] Extract data manage...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/452#discussion_r93733676 --- Diff: integration/spark2/src/main/scala/org/apache/spark/util/TableAPIUtil.scala --- @@ -51,4 +56,15 @@ object TableAPIUtil { .config(CarbonCommonConstants.STORE_LOCATION, storePath) --- End diff -- BTW, now CarbonEnv will get carbon.storelocation from CarobnProperties object. So need to add storepath to configuration, but should add property to CarbonProperties Object. CarbonProperties.getInstance().addProperty(CarbonCommonConstants.STORE_LOCATION, storePath). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #449: [CARBONDATA-540]Support insertInto without ...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/449 @chenliang613 @jackylk Rebase done and fixed comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #672: [CARBONDATA-815] add hive integratio...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/672#discussion_r107841211 --- Diff: dev/java-code-format-template.xml --- @@ -34,8 +34,8 @@ - + --- End diff -- Yes. javax package should be after java package. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #672: [CARBONDATA-815] add hive integratio...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/672#discussion_r107890848 --- Diff: integration/hive/src/main/java/org/apache/carbondata/hive/MapredCarbonOutputFormat.java --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.hive; + + +import java.io.IOException; +import java.util.Properties; + +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.exec.FileSinkOperator; +import org.apache.hadoop.hive.ql.io.HiveOutputFormat; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.FileOutputFormat; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.RecordWriter; +import org.apache.hadoop.util.Progressable; + + +public class MapredCarbonOutputFormat extends FileOutputFormat<Void, T> --- End diff -- Is same with CarbonTableOutputFormat? So we only support reading carbondata table in hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #672: [CARBONDATA-815] add hive integratio...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/672#discussion_r107863484 --- Diff: integration/hive/src/main/java/org/apache/carbondata/hive/CarbonArrayInspector.java --- @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.hive; + +import java.util.ArrayList; +import java.util.List; + +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.SettableListObjectInspector; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.io.Writable; + +/** + * The CarbonHiveArrayInspector will inspect an ArrayWritable, considering it as an Hive array. + * It can also inspect a List if Hive decides to inspect the result of an inspection. + */ +public class CarbonArrayInspector implements SettableListObjectInspector { + + ObjectInspector arrayElementInspector; + + public CarbonArrayInspector(final ObjectInspector arrayElementInspector) { +this.arrayElementInspector = arrayElementInspector; + } + + @Override + public String getTypeName() { +return "array<" + arrayElementInspector.getTypeName() + ">"; + } + + @Override + public Category getCategory() { +return Category.LIST; + } + + @Override + public ObjectInspector getListElementObjectInspector() { +return arrayElementInspector; + } + + @Override + public Object getListElement(final Object data, final int index) { +if (data == null) { + return null; +} + +if (data instanceof ArrayWritable) { + final Writable[] listContainer = ((ArrayWritable) data).get(); + + if (listContainer == null || listContainer.length == 0) { +return null; + } + + final Writable subObj = listContainer[0]; + + if (subObj == null) { +return null; + } + + if (index >= 0 && index < ((ArrayWritable) subObj).get().length) { +return ((ArrayWritable) subObj).get()[index]; + } else { +return null; + } +} + +throw new UnsupportedOperationException("Cannot inspect " + + data.getClass().getCanonicalName()); + } + + @Override + public int getListLength(final Object data) { +if (data == null) { + return -1; +} + +if (data instanceof ArrayWritable) { + final Writable[] listContainer = ((ArrayWritable) data).get(); + + if (listContainer == null || listContainer.length == 0) { +return -1; + } + + final Writable subObj = listContainer[0]; + + if (subObj == null) { +return 0; + } + + return ((ArrayWritable) subObj).get().length; +} + +throw new UnsupportedOperationException("Cannot inspect " + + data.getClass().getCanonicalName()); + } + + @Override + public List getList(final Object data) { +if (data == null) { + return null; +} + +if (data instanceof ArrayWritable) { + final Writable[] listContainer = ((ArrayWritable) data).get(); + + if (listContainer == null || listContainer.length == 0) { +return null; + } + + final Writable subObj = listContainer[0]; + + if (subObj == null) { +return null; + } + + final Writable[] array = ((ArrayWritable) subObj).get(); + final List list = new ArrayList(); + + for (final Writable obj : array) { +list.add(obj); + } + + return list; +} + +throw new UnsupportedOperationException("Cannot inspect " + + data.getClass().getCanonicalName());
[GitHub] incubator-carbondata pull request #672: [CARBONDATA-815] add hive integratio...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/672#discussion_r107863410 --- Diff: integration/hive/src/main/java/org/apache/carbondata/hive/CarbonArrayInspector.java --- @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.hive; + +import java.util.ArrayList; +import java.util.List; + +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.SettableListObjectInspector; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.io.Writable; + +/** + * The CarbonHiveArrayInspector will inspect an ArrayWritable, considering it as an Hive array. + * It can also inspect a List if Hive decides to inspect the result of an inspection. + */ +public class CarbonArrayInspector implements SettableListObjectInspector { + + ObjectInspector arrayElementInspector; + + public CarbonArrayInspector(final ObjectInspector arrayElementInspector) { +this.arrayElementInspector = arrayElementInspector; + } + + @Override + public String getTypeName() { +return "array<" + arrayElementInspector.getTypeName() + ">"; + } + + @Override + public Category getCategory() { +return Category.LIST; + } + + @Override + public ObjectInspector getListElementObjectInspector() { +return arrayElementInspector; + } + + @Override + public Object getListElement(final Object data, final int index) { +if (data == null) { + return null; +} + +if (data instanceof ArrayWritable) { + final Writable[] listContainer = ((ArrayWritable) data).get(); + + if (listContainer == null || listContainer.length == 0) { +return null; + } + + final Writable subObj = listContainer[0]; + + if (subObj == null) { +return null; + } + + if (index >= 0 && index < ((ArrayWritable) subObj).get().length) { +return ((ArrayWritable) subObj).get()[index]; + } else { +return null; + } +} + +throw new UnsupportedOperationException("Cannot inspect " + + data.getClass().getCanonicalName()); + } + + @Override + public int getListLength(final Object data) { +if (data == null) { + return -1; +} + +if (data instanceof ArrayWritable) { + final Writable[] listContainer = ((ArrayWritable) data).get(); + + if (listContainer == null || listContainer.length == 0) { +return -1; + } + + final Writable subObj = listContainer[0]; + + if (subObj == null) { +return 0; + } + + return ((ArrayWritable) subObj).get().length; +} + +throw new UnsupportedOperationException("Cannot inspect " + + data.getClass().getCanonicalName()); + } + + @Override + public List getList(final Object data) { +if (data == null) { + return null; +} + +if (data instanceof ArrayWritable) { + final Writable[] listContainer = ((ArrayWritable) data).get(); + + if (listContainer == null || listContainer.length == 0) { +return null; + } + + final Writable subObj = listContainer[0]; + + if (subObj == null) { +return null; + } + + final Writable[] array = ((ArrayWritable) subObj).get(); + final List list = new ArrayList(); + + for (final Writable obj : array) { +list.add(obj); + } + + return list; +} + +throw new UnsupportedOperationException("Cannot inspect " + + data.getClass().getCanonicalName());
[GitHub] incubator-carbondata pull request #672: [CARBONDATA-815] add hive integratio...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/672#discussion_r107881262 --- Diff: integration/hive/src/main/java/org/apache/carbondata/hive/CarbonHiveRecordReader.java --- @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.hive; + + +import java.io.IOException; +import java.sql.Date; +import java.sql.Timestamp; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Iterator; +import java.util.List; + +import org.apache.carbondata.core.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.core.scan.result.iterator.ChunkRowIterator; +import org.apache.carbondata.hadoop.CarbonRecordReader; +import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.common.type.HiveDecimal; +import org.apache.hadoop.hive.serde.serdeConstants; +import org.apache.hadoop.hive.serde2.SerDeException; +import org.apache.hadoop.hive.serde2.io.DateWritable; +import org.apache.hadoop.hive.serde2.io.DoubleWritable; +import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable; +import org.apache.hadoop.hive.serde2.io.ShortWritable; +import org.apache.hadoop.hive.serde2.io.TimestampWritable; +import org.apache.hadoop.hive.serde2.objectinspector.*; +import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.io.IntWritable; +import org.apache.hadoop.io.LongWritable; +import org.apache.hadoop.io.Text; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.InputSplit; +import org.apache.hadoop.mapred.JobConf; + +public class CarbonHiveRecordReader extends CarbonRecordReader +implements org.apache.hadoop.mapred.RecordReader<Void, ArrayWritable> { + + ArrayWritable valueObj = null; + private CarbonObjectInspector objInspector; + + public CarbonHiveRecordReader(QueryModel queryModel, CarbonReadSupport readSupport, +InputSplit inputSplit, JobConf jobConf) throws IOException { +super(queryModel, readSupport); +initialize(inputSplit, jobConf); + } + + public void initialize(InputSplit inputSplit, Configuration conf) throws IOException { +// The input split can contain single HDFS block or multiple blocks, so firstly get all the +// blocks and then set them in the query model. +List splitList; +if (inputSplit instanceof CarbonHiveInputSplit) { + splitList = new ArrayList<>(1); + splitList.add((CarbonHiveInputSplit) inputSplit); +} else { + throw new RuntimeException("unsupported input split type: " + inputSplit); +} +List tableBlockInfoList = CarbonHiveInputSplit.createBlocks(splitList); +queryModel.setTableBlockInfos(tableBlockInfoList); +readSupport.initialize(queryModel.getProjectionColumns(), +queryModel.getAbsoluteTableIdentifier()); +try { + carbonIterator = new ChunkRowIterator(queryExecutor.execute(queryModel)); +} catch (QueryExecutionException e) { + throw new IOException(e.getMessage(), e.getCause()); +} +if (valueObj == null) { + valueObj = new ArrayWritable(Writable.class, + new Writable[queryModel.getProjectionColumns().length]); +} + +final TypeInfo rowTypeInfo; +final List columnNames; +List columnTypes; +// Get column names and sort orde
[GitHub] incubator-carbondata pull request #672: [CARBONDATA-815] add hive integratio...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/672#discussion_r107862824 --- Diff: integration/hive/src/main/java/org/apache/carbondata/hive/CarbonArrayInspector.java --- @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.hive; + +import java.util.ArrayList; +import java.util.List; + +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.SettableListObjectInspector; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.io.Writable; + +/** + * The CarbonHiveArrayInspector will inspect an ArrayWritable, considering it as an Hive array. + * It can also inspect a List if Hive decides to inspect the result of an inspection. + */ +public class CarbonArrayInspector implements SettableListObjectInspector { + + ObjectInspector arrayElementInspector; + + public CarbonArrayInspector(final ObjectInspector arrayElementInspector) { +this.arrayElementInspector = arrayElementInspector; + } + + @Override + public String getTypeName() { +return "array<" + arrayElementInspector.getTypeName() + ">"; + } + + @Override + public Category getCategory() { +return Category.LIST; + } + + @Override + public ObjectInspector getListElementObjectInspector() { +return arrayElementInspector; + } + + @Override + public Object getListElement(final Object data, final int index) { +if (data == null) { + return null; +} + +if (data instanceof ArrayWritable) { + final Writable[] listContainer = ((ArrayWritable) data).get(); + + if (listContainer == null || listContainer.length == 0) { +return null; + } + + final Writable subObj = listContainer[0]; + + if (subObj == null) { +return null; + } + + if (index >= 0 && index < ((ArrayWritable) subObj).get().length) { +return ((ArrayWritable) subObj).get()[index]; + } else { +return null; + } +} + +throw new UnsupportedOperationException("Cannot inspect " + + data.getClass().getCanonicalName()); + } + + @Override + public int getListLength(final Object data) { +if (data == null) { + return -1; +} + +if (data instanceof ArrayWritable) { + final Writable[] listContainer = ((ArrayWritable) data).get(); + + if (listContainer == null || listContainer.length == 0) { +return -1; + } + + final Writable subObj = listContainer[0]; + + if (subObj == null) { +return 0; + } + + return ((ArrayWritable) subObj).get().length; +} + +throw new UnsupportedOperationException("Cannot inspect " + + data.getClass().getCanonicalName()); + } + + @Override + public List getList(final Object data) { +if (data == null) { + return null; +} + +if (data instanceof ArrayWritable) { + final Writable[] listContainer = ((ArrayWritable) data).get(); + + if (listContainer == null || listContainer.length == 0) { +return null; + } + + final Writable subObj = listContainer[0]; + + if (subObj == null) { +return null; + } + + final Writable[] array = ((ArrayWritable) subObj).get(); + final List list = new ArrayList(); + + for (final Writable obj : array) { --- End diff -- Better to use Arrays.asList(array) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature ena
[GitHub] incubator-carbondata pull request #672: [CARBONDATA-815] add hive integratio...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/672#discussion_r107873741 --- Diff: integration/hive/src/main/java/org/apache/carbondata/hive/MapredCarbonInputFormat.java --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.hive; + +import java.io.IOException; +import java.util.List; + +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.model.CarbonQueryPlan; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.hadoop.CarbonInputFormat; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.mapred.InputFormat; +import org.apache.hadoop.mapred.InputSplit; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.RecordReader; +import org.apache.hadoop.mapred.Reporter; +import org.apache.hadoop.mapreduce.Job; + + +public class MapredCarbonInputFormat extends CarbonInputFormat --- End diff -- CarbonInputFormat is the implement of MRv2. MapredCarbonInputFormat is a implement of MRv1. So I think MapredCarbonInputFormat shouldn't extend from CarbonInputFormat. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #672: [CARBONDATA-815] add hive integratio...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/672#discussion_r107893281 --- Diff: integration/hive/src/main/java/org/apache/carbondata/hive/MapredCarbonInputFormat.java --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.hive; + +import java.io.IOException; +import java.util.List; + +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.model.CarbonQueryPlan; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.hadoop.CarbonInputFormat; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.mapred.InputFormat; +import org.apache.hadoop.mapred.InputSplit; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.RecordReader; +import org.apache.hadoop.mapred.Reporter; +import org.apache.hadoop.mapreduce.Job; + + +public class MapredCarbonInputFormat extends CarbonInputFormat +implements InputFormat<Void, ArrayWritable>, CombineHiveInputFormat.AvoidSplitCombination { + + @Override + public InputSplit[] getSplits(JobConf jobConf, int numSplits) throws IOException { +org.apache.hadoop.mapreduce.JobContext jobContext = Job.getInstance(jobConf); +List splitList = super.getSplits(jobContext); --- End diff -- for hive, need remove InputSplit of Invalid Segments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #672: [CARBONDATA-815] add hive integratio...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/672#discussion_r107875522 --- Diff: integration/hive/src/main/java/org/apache/carbondata/hive/CarbonHiveRecordReader.java --- @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.hive; + + +import java.io.IOException; +import java.sql.Date; +import java.sql.Timestamp; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Iterator; +import java.util.List; + +import org.apache.carbondata.core.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.core.scan.result.iterator.ChunkRowIterator; +import org.apache.carbondata.hadoop.CarbonRecordReader; +import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.common.type.HiveDecimal; +import org.apache.hadoop.hive.serde.serdeConstants; +import org.apache.hadoop.hive.serde2.SerDeException; +import org.apache.hadoop.hive.serde2.io.DateWritable; +import org.apache.hadoop.hive.serde2.io.DoubleWritable; +import org.apache.hadoop.hive.serde2.io.HiveDecimalWritable; +import org.apache.hadoop.hive.serde2.io.ShortWritable; +import org.apache.hadoop.hive.serde2.io.TimestampWritable; +import org.apache.hadoop.hive.serde2.objectinspector.*; +import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.io.IntWritable; +import org.apache.hadoop.io.LongWritable; +import org.apache.hadoop.io.Text; +import org.apache.hadoop.io.Writable; +import org.apache.hadoop.mapred.InputSplit; +import org.apache.hadoop.mapred.JobConf; + +public class CarbonHiveRecordReader extends CarbonRecordReader --- End diff -- CarbonRecordReader is for MRv2, CarbonHiveRecordReader is for MRv1. CarbonHiveRecordReader shouldn't extend from CarbonRecordReader. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #672: [CARBONDATA-815] add hive integratio...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/672#discussion_r107863523 --- Diff: integration/hive/src/main/java/org/apache/carbondata/hive/CarbonArrayInspector.java --- @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.hive; + +import java.util.ArrayList; +import java.util.List; + +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.SettableListObjectInspector; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.io.Writable; + +/** + * The CarbonHiveArrayInspector will inspect an ArrayWritable, considering it as an Hive array. + * It can also inspect a List if Hive decides to inspect the result of an inspection. + */ +public class CarbonArrayInspector implements SettableListObjectInspector { + + ObjectInspector arrayElementInspector; + + public CarbonArrayInspector(final ObjectInspector arrayElementInspector) { +this.arrayElementInspector = arrayElementInspector; + } + + @Override + public String getTypeName() { +return "array<" + arrayElementInspector.getTypeName() + ">"; + } + + @Override + public Category getCategory() { +return Category.LIST; + } + + @Override + public ObjectInspector getListElementObjectInspector() { +return arrayElementInspector; + } + + @Override + public Object getListElement(final Object data, final int index) { +if (data == null) { + return null; +} + +if (data instanceof ArrayWritable) { + final Writable[] listContainer = ((ArrayWritable) data).get(); + + if (listContainer == null || listContainer.length == 0) { +return null; + } + + final Writable subObj = listContainer[0]; + + if (subObj == null) { +return null; + } + + if (index >= 0 && index < ((ArrayWritable) subObj).get().length) { +return ((ArrayWritable) subObj).get()[index]; + } else { +return null; + } +} + +throw new UnsupportedOperationException("Cannot inspect " + + data.getClass().getCanonicalName()); + } + + @Override + public int getListLength(final Object data) { +if (data == null) { + return -1; +} + +if (data instanceof ArrayWritable) { + final Writable[] listContainer = ((ArrayWritable) data).get(); + + if (listContainer == null || listContainer.length == 0) { +return -1; + } + + final Writable subObj = listContainer[0]; + + if (subObj == null) { +return 0; + } + + return ((ArrayWritable) subObj).get().length; +} + +throw new UnsupportedOperationException("Cannot inspect " + + data.getClass().getCanonicalName()); + } + + @Override + public List getList(final Object data) { +if (data == null) { + return null; +} + +if (data instanceof ArrayWritable) { + final Writable[] listContainer = ((ArrayWritable) data).get(); + + if (listContainer == null || listContainer.length == 0) { +return null; + } + + final Writable subObj = listContainer[0]; + + if (subObj == null) { +return null; + } + + final Writable[] array = ((ArrayWritable) subObj).get(); + final List list = new ArrayList(); + + for (final Writable obj : array) { +list.add(obj); + } + + return list; +} + +throw new UnsupportedOperationException("Cannot inspect " + + data.getClass().getCanonicalName());
[GitHub] incubator-carbondata pull request #672: [CARBONDATA-815] add hive integratio...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/672#discussion_r107894484 --- Diff: integration/hive/src/main/java/org/apache/carbondata/hive/MapredCarbonInputFormat.java --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.hive; + +import java.io.IOException; +import java.util.List; + +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.model.CarbonQueryPlan; +import org.apache.carbondata.core.scan.model.QueryModel; +import org.apache.carbondata.hadoop.CarbonInputFormat; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.mapred.InputFormat; +import org.apache.hadoop.mapred.InputSplit; +import org.apache.hadoop.mapred.JobConf; +import org.apache.hadoop.mapred.RecordReader; +import org.apache.hadoop.mapred.Reporter; +import org.apache.hadoop.mapreduce.Job; + + +public class MapredCarbonInputFormat extends CarbonInputFormat +implements InputFormat<Void, ArrayWritable>, CombineHiveInputFormat.AvoidSplitCombination { + + @Override + public InputSplit[] getSplits(JobConf jobConf, int numSplits) throws IOException { +org.apache.hadoop.mapreduce.JobContext jobContext = Job.getInstance(jobConf); +List splitList = super.getSplits(jobContext); +InputSplit[] splits = new InputSplit[splitList.size()]; +CarbonInputSplit split = null; +for (int i = 0; i < splitList.size(); i++) { + split = (CarbonInputSplit) splitList.get(i); + splits[i] = new CarbonHiveInputSplit(split.getSegmentId(), split.getPath(), + split.getStart(), split.getLength(), split.getLocations(), + split.getNumberOfBlocklets(), split.getVersion(), split.getBlockStorageIdMap()); +} +return splits; + } + + @Override + public RecordReader<Void, ArrayWritable> getRecordReader(InputSplit inputSplit, JobConf jobConf, + Reporter reporter) throws IOException { +QueryModel queryModel = getQueryModel(jobConf); +CarbonReadSupport readSupport = getReadSupportClass(jobConf); --- End diff -- need decode all dictionary columns and direct-dictionary columns. Better to use SparkRowReadSupportImpl in spark1 module. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #672: [CARBONDATA-815] add hive integratio...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/672#discussion_r107862312 --- Diff: integration/hive/src/main/java/org/apache/carbondata/hive/CarbonArrayInspector.java --- @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.hive; + +import java.util.ArrayList; +import java.util.List; + +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.SettableListObjectInspector; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.io.Writable; + +/** + * The CarbonHiveArrayInspector will inspect an ArrayWritable, considering it as an Hive array. + * It can also inspect a List if Hive decides to inspect the result of an inspection. + */ +public class CarbonArrayInspector implements SettableListObjectInspector { + + ObjectInspector arrayElementInspector; --- End diff -- add private. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #672: [CARBONDATA-815] add hive integratio...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/672#discussion_r107886267 --- Diff: integration/hive/src/main/java/org/apache/carbondata/hive/CarbonHiveSerDe.java --- @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.hive; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Iterator; +import java.util.List; +import java.util.Properties; +import javax.annotation.Nullable; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hive.serde.serdeConstants; +import org.apache.hadoop.hive.serde2.AbstractSerDe; +import org.apache.hadoop.hive.serde2.SerDeException; +import org.apache.hadoop.hive.serde2.SerDeSpec; +import org.apache.hadoop.hive.serde2.SerDeStats; +import org.apache.hadoop.hive.serde2.io.DoubleWritable; +import org.apache.hadoop.hive.serde2.io.ShortWritable; +import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructField; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.DateObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.DoubleObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.HiveDecimalObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.IntObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.LongObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.ShortObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.StringObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.TimestampObjectInspector; +import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; +import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils; +import org.apache.hadoop.io.ArrayWritable; +import org.apache.hadoop.io.IntWritable; +import org.apache.hadoop.io.LongWritable; +import org.apache.hadoop.io.Writable; + + +/** + * A serde class for Carbondata. + * It transparently passes the object to/from the Carbon file reader/writer. + */ +@SerDeSpec(schemaProps = {serdeConstants.LIST_COLUMNS, serdeConstants.LIST_COLUMN_TYPES}) +public class CarbonHiveSerDe extends AbstractSerDe { + private SerDeStats stats; + private ObjectInspector objInspector; + + private enum LAST_OPERATION { +SERIALIZE, +DESERIALIZE, +UNKNOWN + } + + private LAST_OPERATION status; + private long serializedSize; + private long deserializedSize; + + public CarbonHiveSerDe() { +stats = new SerDeStats(); + } + + @Override + public void initialize(@Nullable Configuration configuration, Properties tbl) + throws SerDeException { + +final TypeInfo rowTypeInfo; +final List columnNames; +final List columnTypes; +// Get column names and sort order +final String columnNameProperty = tbl.getProperty(serdeConstants.LIST_COLUMNS); +final String columnTypeProperty = tbl.getProperty(serdeConstants.LIST_COLUMN_TYPES); + +if (columnNameProperty.length() == 0) { + columnNames = new ArrayList(); +} else { + columnNames = Arrays.asList(columnNameProperty.split(",")); +} +if (columnTypeProperty.length() == 0) { + columnTypes = new ArrayList(); +} else { + columnTypes = TypeInfoUtils.getTypeInfosFro
[GitHub] incubator-carbondata issue #696: [CARBONDATA-818] Make the file_name in carb...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/696 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #659: [CARBONDATA-781] Store one SegmentPr...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/659#discussion_r108033699 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentTaskIndex.java --- @@ -16,30 +16,52 @@ */ package org.apache.carbondata.core.datastore.block; +import java.util.HashMap; import java.util.List; +import java.util.Map; import org.apache.carbondata.core.datastore.BTreeBuilderInfo; import org.apache.carbondata.core.datastore.BtreeBuilder; import org.apache.carbondata.core.datastore.impl.btree.BlockBTreeBuilder; +import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; import org.apache.carbondata.core.metadata.blocklet.DataFileFooter; /** * Class which is responsible for loading the b+ tree block. This class will * persist all the detail of a table segment */ public class SegmentTaskIndex extends AbstractIndex { + private static Map<SegmentKey, SegmentProperties> segmentPropertiesCached = --- End diff -- why not use TableSegmentUniqueIdentifier? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #659: [CARBONDATA-781] Store one SegmentPr...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/659#discussion_r108146058 --- Diff: core/src/test/java/org/apache/carbondata/core/datastore/block/SegmentTaskIndexTest.java --- @@ -58,7 +58,9 @@ @Mock public void build(BTreeBuilderInfo segmentBuilderInfos) {} }; long numberOfRows = 100; -SegmentTaskIndex segmentTaskIndex = new SegmentTaskIndex(); +SegmentProperties properties = new SegmentProperties(footerList.get(0).getColumnInTable(), --- End diff -- should be after the initialization of variable footerList. move to line 72 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [CARBONDATA-782]support SORT_COLUMNS
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #715: [CARBONDATA-782]support SORT_COLUMNS...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/715 [CARBONDATA-782]support SORT_COLUMNS The tasks of SORT_COLUMNS: Support create table with sort_columns property. e.g. tblproperties('sort_columns' = 'col7,col3') The table with SORT_COLUMNS property will be sorted by SORT_COLUMNS. The order of columns is decided by SORT_COLUMNS. Change the encoding rule of SORT_COLUMNS Firstly, the rule of column encoding will keep consistent with previous. Secondly, if a column of SORT_COLUMNS is a measure before, now this column will be created as a dimension. And this dimension is a no-dicitonary column(Better to use other direct-dictionary). Thirdly, the dimension of SORT_COLUMNS have RLE and ROWID page, other dimension have only RLE(not sorted). The start/end key should be composed of SORT_COLUMNS. Using SORT_COLUMNS to build start/end key during data loading and select query. You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata sort_columns Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/715.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #715 commit 287337170650e6ed19fb8d45e32df953d3a1d166 Author: QiangCai <qiang...@qq.com> Date: 2017-03-02T09:48:54Z sort columns --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #635: [CARBONDATA-782]support SORT_COLUMNS
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/635 close this pr. I will raise another pr to merge to 12-dev branch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #635: [CARBONDATA-782]support SORT_COLUMNS
Github user QiangCai closed the pull request at: https://github.com/apache/incubator-carbondata/pull/635 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #659: [CARBONDATA-781] Store one SegmentPropertie...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/659 @kumarvishal09 dump picture is driver tree. @watermen this pr only implement to reuse segment properties in driver side. can you try to do it in executor side? About the building of executor side tree, please have a look AbstractQueryExecutor.initQuery and BlockIndexStore.getAll. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #715: [CARBONDATA-782]support SORT_COLUMNS
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/715 done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #691: [CARBONDATA-783] Fixed message fails with o...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/691 Looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #691: [CARBONDATA-783] Fixed message fails with o...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/691 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #696: [CARBONDATA-818] Make the file_name in carb...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/696 @watermen It is unnecessary to store carbondata file path in carbonindex file. During btree building, just use carbondata file name to sort tableblockinfos. please check CarbonUtil.readCarbonIndexFile and TableBlockInfo.compareTo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #697: [CARBONDATA-708] Fixed Between and L...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/697#discussion_r107920260 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/CarbonFilters.scala --- @@ -111,16 +111,24 @@ object CarbonFilters { } def getCarbonLiteralExpression(name: String, value: Any): CarbonExpression = { - new CarbonLiteralExpression(value, -CarbonScalaUtil.convertSparkToCarbonDataType(dataTypeOf(name))) + val dataTypeOfAttribute = CarbonScalaUtil.convertSparkToCarbonDataType(dataTypeOf(name)) + val dataType = if (Option(value).isDefined + && dataTypeOfAttribute == DataType.STRING + && value.isInstanceOf[Double]) { +DataType.DOUBLE --- End diff -- what's the reason to change datatype? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #697: [CARBONDATA-708] Fixed Between and L...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/697#discussion_r107919139 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/CarbonFilters.scala --- @@ -111,16 +111,24 @@ object CarbonFilters { } def getCarbonLiteralExpression(name: String, value: Any): CarbonExpression = { - new CarbonLiteralExpression(value, -CarbonScalaUtil.convertSparkToCarbonDataType(dataTypeOf(name))) + val dataTypeOfAttribute = CarbonScalaUtil.convertSparkToCarbonDataType(dataTypeOf(name)) + val dataType = if (Option(value).isDefined + && dataTypeOfAttribute == DataType.STRING + && value.isInstanceOf[Double]) { +DataType.DOUBLE + } + else { --- End diff -- take care codestyle --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #697: [CARBONDATA-708] Fixed Between and L...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/697#discussion_r107918771 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/CarbonFilters.scala --- @@ -111,16 +111,24 @@ object CarbonFilters { } def getCarbonLiteralExpression(name: String, value: Any): CarbonExpression = { - new CarbonLiteralExpression(value, -CarbonScalaUtil.convertSparkToCarbonDataType(dataTypeOf(name))) + val dataTypeOfAttribute = CarbonScalaUtil.convertSparkToCarbonDataType(dataTypeOf(name)) + val dataType = if (Option(value).isDefined + && dataTypeOfAttribute == DataType.STRING + && value.isInstanceOf[Double]) { +DataType.DOUBLE + } + else { +dataTypeOfAttribute + } + new CarbonLiteralExpression(value, dataType) } createFilter(predicate) } // Check out which filters can be pushed down to carbon, remaining can be handled in spark layer. - // Mostly dimension filters are only pushed down since it is faster in carbon. + // Mostly dimension filters are only pushed down since it is faster in carbo n. --- End diff -- redundant blank --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata issue #694: [CARBONDATA-814] bad record log file writin...
Github user QiangCai commented on the issue: https://github.com/apache/incubator-carbondata/pull/694 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---