[GitHub] carbondata issue #2465: [WIP] Refactored CarbonFile interface
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2465 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6277/ ---
[GitHub] carbondata issue #2465: [WIP] Refactored CarbonFile interface
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2465 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6648/ ---
[GitHub] carbondata issue #2465: [WIP] Refactored CarbonFile interface
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2465 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7925/ ---
[GitHub] carbondata issue #2640: [CARBONDATA-2862][DataMap] Fix exception message for...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2640 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6646/ ---
[GitHub] carbondata issue #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom index fi...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2624 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7924/ ---
[GitHub] carbondata issue #2640: [CARBONDATA-2862][DataMap] Fix exception message for...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2640 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7923/ ---
[GitHub] carbondata issue #2640: [CARBONDATA-2862][DataMap] Fix exception message for...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2640 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6276/ ---
[GitHub] carbondata issue #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom index fi...
Github user kevinjmh commented on the issue: https://github.com/apache/carbondata/pull/2624 retest this please ---
[GitHub] carbondata pull request #2640: [CARBONDATA-2862][DataMap] Fix exception mess...
GitHub user kevinjmh opened a pull request: https://github.com/apache/carbondata/pull/2640 [CARBONDATA-2862][DataMap] Fix exception message for datamap rebuild command Since datamap rebuild command support execute without specify table name, and it will scan all datamap instead, so error message has no need to get table name, else will get exception. This PR modify the error message to show detail table name on demand. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinjmh/carbondata dmRebuild Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2640.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2640 commit f9255d6c2f44de5f66ab7ef0dd1ba0010ba62cbb Author: Manhua Date: 2018-08-16T03:07:08Z fix error msg ---
[jira] [Created] (CARBONDATA-2862) Fix exception message for datamap rebuild command
jiangmanhua created CARBONDATA-2862: --- Summary: Fix exception message for datamap rebuild command Key: CARBONDATA-2862 URL: https://issues.apache.org/jira/browse/CARBONDATA-2862 Project: CarbonData Issue Type: Bug Reporter: jiangmanhua Assignee: jiangmanhua -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CARBONDATA-2819) cannot drop preagg datamap on table if the table has other index datamaps
[ https://issues.apache.org/jira/browse/CARBONDATA-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581850#comment-16581850 ] Jacky Li edited comment on CARBONDATA-2819 at 8/16/18 2:36 AM: --- Currently preagg schema is not written into _system folder, so when dropping datamap the logic in CarbonDropDataMapCommand maybe wrong. I think for this issue, we can modify CarbonDropDataMapCommand in a simple way, but for long run, we need to consider whether to put preagg datamap schema also into _system folder so that the processing of it is unified was (Author: jackylk): Currently preagg schema is not written into _system folder, so when dropping datamap the logic in CarbonDropDataMapCommand maybe wrong. I think for this issue, we can modify CarbonDropDataMapCommand in a simple way, but for long run, we need to consider whether to put preagg datamap also into _system folder > cannot drop preagg datamap on table if the table has other index datamaps > - > > Key: CARBONDATA-2819 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2819 > Project: CarbonData > Issue Type: Improvement >Affects Versions: 1.4.1 >Reporter: lianganping >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > 1.create table student_test(id int,name string,class_number int,male > int,female int) stored by 'carbondata'; > 2.create datamap dm1_preaggr_student_test ON TABLE student_test USING > 'preaggregate' as select class_number,sum(male) from student_test group by > class_number > 3.create datamap dm_lucene_student_test on table student_test using 'lucene' > dmproperties('index_columns' = 'name'); > 4.drop datamap dm1_preaggr_student_test on table student_test; > and will get this error: > Error: org.apache.carbondata.common.exceptions.sql.NoSuchDataMapException: > Datamap with name dm1_preaggr_student_test does not exist (state=,code=0) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2819) cannot drop preagg datamap on table if the table has other index datamaps
[ https://issues.apache.org/jira/browse/CARBONDATA-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581850#comment-16581850 ] Jacky Li commented on CARBONDATA-2819: -- Currently preagg schema is not written into _system folder, so when dropping datamap the logic in CarbonDropDataMapCommand maybe wrong. I think for this issue, we can modify CarbonDropDataMapCommand in a simple way, but for long run, we need to consider whether to put preagg datamap also into _system folder > cannot drop preagg datamap on table if the table has other index datamaps > - > > Key: CARBONDATA-2819 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2819 > Project: CarbonData > Issue Type: Improvement >Affects Versions: 1.4.1 >Reporter: lianganping >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > 1.create table student_test(id int,name string,class_number int,male > int,female int) stored by 'carbondata'; > 2.create datamap dm1_preaggr_student_test ON TABLE student_test USING > 'preaggregate' as select class_number,sum(male) from student_test group by > class_number > 3.create datamap dm_lucene_student_test on table student_test using 'lucene' > dmproperties('index_columns' = 'name'); > 4.drop datamap dm1_preaggr_student_test on table student_test; > and will get this error: > Error: org.apache.carbondata.common.exceptions.sql.NoSuchDataMapException: > Datamap with name dm1_preaggr_student_test does not exist (state=,code=0) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2627: [CARBONDATA-2835] [MVDataMap] Block MV datamap on st...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2627 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6275/ ---
[GitHub] carbondata pull request #2639: [CARBONDATA-2858] Fix external table schema b...
Github user jackylk closed the pull request at: https://github.com/apache/carbondata/pull/2639 ---
[GitHub] carbondata issue #2627: [CARBONDATA-2835] [MVDataMap] Block MV datamap on st...
Github user ndwangsen commented on the issue: https://github.com/apache/carbondata/pull/2627 retest sdv please ---
[jira] [Created] (CARBONDATA-2861) Issues with download page
Sebb created CARBONDATA-2861: Summary: Issues with download page Key: CARBONDATA-2861 URL: https://issues.apache.org/jira/browse/CARBONDATA-2861 Project: CarbonData Issue Type: Bug Reporter: Sebb The download page currently links to https://dist.apache.org/ However that host is only intended as a staging area for use by developers. Download pages must use the ASF mirror system for build artifacts, and must use https://www.apache.org/dist/... for KEYS, sigs and hashes. The download page must provide public download links where current official source releases and accompanying cryptographic files may be obtained. [2] Links to the download artifacts must support downloads from mirrors, e.g. via links to dyn/closer. Links to metadata (SHA, ASC) must be to https://www.apache.org/dist///* MD5 is no longer considered useful and should not be used. SHA is required. Similarly, SHA-1 is no longer considered useful and should not be used. SHA-512 (preferred) or SHA-256 are required for new releases. Older releases need not be updated, may continue unchanged, and might use MD5 or SHA-1. The KEYS link must be to https://www.apache.org/dist//KEYS [1] http://www.apache.org/legal/release-policy.html#release-announcements [2] https://www.apache.org/dev/release-distribution#download-links -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2860) Please delete old releases from mirroring system
Sebb created CARBONDATA-2860: Summary: Please delete old releases from mirroring system Key: CARBONDATA-2860 URL: https://issues.apache.org/jira/browse/CARBONDATA-2860 Project: CarbonData Issue Type: Bug Environment: https://dist.apache.org/repos/dist/release/carbondata/ Reporter: Sebb To reduce the load on the ASF mirrors, projects are required to delete old releases [1] Please can you remove all non-current releases? It's unfair to expect the 3rd party mirrors to carry old releases. However you can still link to the archives for historic releases. Please also update your release procedures (if relevant) Thanks! [1] http://www.apache.org/dev/release.html#when-to-archive -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2639: [CARBONDATA-2858] Fix external table schema bug
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2639 Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6645/ ---
[GitHub] carbondata issue #2639: [CARBONDATA-2858] Fix external table schema bug
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2639 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7922/ ---
[GitHub] carbondata issue #2639: [CARBONDATA-2858] Fix external table schema bug
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2639 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6274/ ---
[GitHub] carbondata pull request #2639: [CARBONDATA-2858] Fix external table schema b...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2639#discussion_r210323959 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSource.scala --- @@ -351,10 +349,7 @@ object CarbonSource { query: Option[LogicalPlan]): Map[String, String] = { val model = createTableInfoFromParams(properties, dataSchema, identifier, query, sparkSession) val tableInfo: TableInfo = TableNewProcessor(model) -val isExternal = properties.getOrElse("isExternal", "false") --- End diff -- This variable is not used ---
[GitHub] carbondata pull request #2639: [CARBONDATA-2858] Fix external table schema b...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2639#discussion_r210323933 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSource.scala --- @@ -324,9 +324,7 @@ object CarbonSource { tableDesc.copy(storage = updatedFormat) } else { val tableInfo = CarbonUtil.convertGsonToTableInfo(properties.asJava) - val isExternal = properties.getOrElse("isExternal", "false") --- End diff -- This variable is not used ---
[GitHub] carbondata pull request #2639: [CARBONDATA-2858] Fix external table schema b...
GitHub user jackylk opened a pull request: https://github.com/apache/carbondata/pull/2639 [CARBONDATA-2858] Fix external table schema bug If user specifies schema in CREATE EXTERNAL TABLE, we should: 1. Validate the schema against the inferred schema from the table path. 2. Honor the schema specified by the user instead of the inferred schema (full schema) - [X] Any interfaces changed? No - [X] Any backward compatibility impacted? No - [X] Document update required? No - [X] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. two test case added - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/jackylk/incubator-carbondata fix_external_table Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2639.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2639 commit 6d14a819d3e371b8a147edee9259da5ca4a60996 Author: Jacky Li Date: 2018-08-15T16:12:17Z fix external table ---
[jira] [Assigned] (CARBONDATA-2858) Schema not correct when user gives schema in CREATE EXTERNAL TABLE
[ https://issues.apache.org/jira/browse/CARBONDATA-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li reassigned CARBONDATA-2858: Assignee: Jacky Li > Schema not correct when user gives schema in CREATE EXTERNAL TABLE > -- > > Key: CARBONDATA-2858 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2858 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.4.1 >Reporter: Jacky Li >Assignee: Jacky Li >Priority: Major > > A bug can be reproduced by following step: > 1. create a normal table and load data. For example > CREATE TABLE origin (key int, value string) > STORED AS carbondata > 2. create an external table on table folder of "origin", and specify a new > schema for it > CREATE EXTERNAL TABLE source (value string) > STORED AS carbondata > LOCATION 'path-to-origin' > 3. query the external table and show, it should show only value column but > currently it is showing key and value columns -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2637: [HOTFIX] Correct the sentence to be meaningful
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2637 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7921/ ---
[GitHub] carbondata issue #2636: [CARBONDATA-2857] Correct the Contribution content
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2636 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7920/ ---
[GitHub] carbondata issue #2636: [CARBONDATA-2857] Correct the Contribution content
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2636 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6273/ ---
[GitHub] carbondata pull request #2637: [HOTFIX] Correct the sentence to be meaningfu...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2637 ---
[GitHub] carbondata issue #2637: [HOTFIX] Correct the sentence to be meaningful
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2637 add to whitelist ---
[jira] [Assigned] (CARBONDATA-2857) Improvement in "How to contribute to Apache CarbonData" page
[ https://issues.apache.org/jira/browse/CARBONDATA-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen reassigned CARBONDATA-2857: -- Assignee: Vadiraj Muradi > Improvement in "How to contribute to Apache CarbonData" page > > > Key: CARBONDATA-2857 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2857 > Project: CarbonData > Issue Type: Improvement > Components: docs >Reporter: Vadiraj Muradi >Assignee: Vadiraj Muradi >Priority: Minor > Fix For: 1.5.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Improvement to "How to contribute to Apache CarbonData" page -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2857) Improvement in "How to contribute to Apache CarbonData" page
[ https://issues.apache.org/jira/browse/CARBONDATA-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Chen resolved CARBONDATA-2857. Resolution: Fixed Fix Version/s: 1.5.0 > Improvement in "How to contribute to Apache CarbonData" page > > > Key: CARBONDATA-2857 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2857 > Project: CarbonData > Issue Type: Improvement > Components: docs >Reporter: Vadiraj Muradi >Assignee: Vadiraj Muradi >Priority: Minor > Fix For: 1.5.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Improvement to "How to contribute to Apache CarbonData" page -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2636: [CARBONDATA-2857] Correct the Contribution co...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2636 ---
[GitHub] carbondata issue #2636: [CARBONDATA-2857] Correct the Contribution content
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2636 LGTM ---
[GitHub] carbondata issue #2636: [CARBONDATA-2857] Correct the Contribution content
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/2636 add to whitelist ---
[GitHub] carbondata issue #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom index fi...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2624 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6643/ ---
[GitHub] carbondata issue #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom index fi...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2624 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7919/ ---
[GitHub] carbondata issue #2627: [CARBONDATA-2835] [MVDataMap] Block MV datamap on st...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2627 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6642/ ---
[GitHub] carbondata issue #2627: [CARBONDATA-2835] [MVDataMap] Block MV datamap on st...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2627 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7918/ ---
[GitHub] carbondata issue #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom index fi...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2624 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6272/ ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2627#discussion_r210241324 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -80,6 +81,16 @@ object MVHelper { dmProperties.foreach(t => tableProperties.put(t._1, t._2)) val selectTables = getTables(logicalPlan) +selectTables.map { selectTable => + val mainCarbonTable = CarbonEnv.getCarbonTableOption(selectTable.identifier.database, +selectTable.identifier.table)(sparkSession) + + if (!mainCarbonTable.isEmpty && mainCarbonTable.get.isStreamingSink ) { +throw new MalformedCarbonCommandException(s"Streaming table does not support creating " + +s"MV datamap") + } + selectTable --- End diff -- okï¼I remove it ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2627#discussion_r210241115 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala --- @@ -73,13 +73,8 @@ case class CarbonCreateDataMapCommand( } } -if (mainTable != null && -mainTable.isStreamingSink && - !(dmProviderName.equalsIgnoreCase(DataMapClassProvider.PREAGGREGATE.toString) - || dmProviderName.equalsIgnoreCase(DataMapClassProvider.TIMESERIES.toString))) { - throw new MalformedCarbonCommandException(s"Streaming table does not support creating " + -s"$dmProviderName datamap") -} +// delete this code because streaming table only does not support creating MV datamap, --- End diff -- Delete the comments here ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2627#discussion_r210239499 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -80,6 +81,16 @@ object MVHelper { dmProperties.foreach(t => tableProperties.put(t._1, t._2)) val selectTables = getTables(logicalPlan) +selectTables.map { selectTable => --- End diff -- ok ---
[GitHub] carbondata issue #2627: [CARBONDATA-2835] [MVDataMap] Block MV datamap on st...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2627 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6271/ ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2627#discussion_r210239381 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -80,6 +81,16 @@ object MVHelper { dmProperties.foreach(t => tableProperties.put(t._1, t._2)) val selectTables = getTables(logicalPlan) +selectTables.map { selectTable => + val mainCarbonTable = CarbonEnv.getCarbonTableOption(selectTable.identifier.database, +selectTable.identifier.table)(sparkSession) + + if (!mainCarbonTable.isEmpty && mainCarbonTable.get.isStreamingSink ) { +throw new MalformedCarbonCommandException(s"Streaming table does not support creating " + --- End diff -- okï¼i modify it ---
[GitHub] carbondata issue #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom index fi...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2624 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6641/ ---
[GitHub] carbondata issue #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom index fi...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2624 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7917/ ---
[GitHub] carbondata issue #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom index fi...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2624 The main point I concern is that you use thrift to write the merged bloom index file. I am not sure whether we should add a new thrift file in Carbon-Format and the file only works for Carbon-Bloom. @ravipesala @jackylk How do you think about it? ---
[GitHub] carbondata issue #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom index fi...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2624 Please provide a brief description of the modifications you have made in this PR, such as add Listener, add RDD, use thrift, etc... ---
[GitHub] carbondata pull request #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom i...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2624#discussion_r210213202 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomIndexFileStore.java --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.datamap.bloom; + +import java.io.ByteArrayInputStream; +import java.io.DataInputStream; +import java.io.File; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.filesystem.CarbonFileFilter; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.fileoperations.FileWriteOperation; +import org.apache.carbondata.core.reader.ThriftReader; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.writer.ThriftWriter; +import org.apache.carbondata.format.MergedBloomIndex; +import org.apache.carbondata.format.MergedBloomIndexHeader; + +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.util.bloom.CarbonBloomFilter; +import org.apache.thrift.TBase; + +public class BloomIndexFileStore { + + private static final LogService LOGGER = + LogServiceFactory.getLogService(BloomIndexFileStore.class.getName()); + + /*suffix of original generated file*/ + public static final String BLOOM_INDEX_SUFFIX = ".bloomindex"; + /*suffix of merged bloom index file*/ + public static final String MERGE_BLOOM_INDEX_SUFFIX = ".bloomindexmerge"; + /* directory to store merged bloom index files */ + public static final String MERGE_BLOOM_INDEX_SHARD_NAME = "mergeShard"; + /** + * flag file for merging + * if flag file exists, query won't use mergeShard + * if flag file not exists and mergeShard generated, query will use mergeShard + */ + public static final String MERGE_INPROGRESS_FILE = "mergeShard.inprogress"; + + + public static void mergeBloomIndexFile(String dmSegmentPathString, List indexCols) { +// get all shard paths of old store +CarbonFile segmentPath = FileFactory.getCarbonFile(dmSegmentPathString, +FileFactory.getFileType(dmSegmentPathString)); +CarbonFile[] shardPaths = segmentPath.listFiles(new CarbonFileFilter() { + @Override + public boolean accept(CarbonFile file) { +return file.isDirectory() && !file.getName().equals(MERGE_BLOOM_INDEX_SHARD_NAME); + } +}); + +String mergeShardPath = dmSegmentPathString + File.separator + MERGE_BLOOM_INDEX_SHARD_NAME; +String mergeInprofressFile = dmSegmentPathString + File.separator + MERGE_INPROGRESS_FILE; +try { + // delete mergeShard folder if exists + if (FileFactory.isFileExist(mergeShardPath)) { +FileFactory.deleteFile(mergeShardPath, FileFactory.getFileType(mergeShardPath)); + } + // create flag file before creating mergeShard folder + if (!FileFactory.isFileExist(mergeInprofressFile)) { +FileFactory.createNewFile( +mergeInprofressFile, FileFactory.getFileType(mergeInprofressFile)); + } + // prepare mergeShard output folder + if (!FileFactory.mkdirs(mergeShardPath, FileFactory.getFileType(mergeShardPath))) { +throw new RuntimeException("Failed to create directory " + mergeShardPath); + } +} catch (IOException e) { + LOGGER.error(e, "Error occurs while create directory " + mergeShardPath); + throw new RuntimeException("Error occurs while create directory " + mergeShardPath); +} + +// for each index column, merge the bloomindex files from all
[GitHub] carbondata pull request #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom i...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2624#discussion_r210212039 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java --- @@ -222,102 +220,95 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, @Override public List getDataMaps(Segment segment) throws IOException { -List dataMaps = new ArrayList(1); +List dataMaps = new ArrayList<>(); try { Set shardPaths = segmentMap.get(segment.getSegmentNo()); if (shardPaths == null) { -String dataMapStorePath = DataMapWriter.getDefaultDataMapPath( -getCarbonTable().getTablePath(), segment.getSegmentNo(), dataMapName); -CarbonFile[] carbonFiles = FileFactory.getCarbonFile(dataMapStorePath).listFiles(); -shardPaths = new HashSet<>(); -for (CarbonFile carbonFile : carbonFiles) { - shardPaths.add(carbonFile.getAbsolutePath()); -} +shardPaths = getAllShardPaths(getCarbonTable().getTablePath(), segment.getSegmentNo()); segmentMap.put(segment.getSegmentNo(), shardPaths); } + Set filteredShards = segment.getFilteredIndexShardNames(); for (String shard : shardPaths) { -BloomCoarseGrainDataMap bloomDM = new BloomCoarseGrainDataMap(); -bloomDM.init(new BloomDataMapModel(shard, cache)); -bloomDM.initIndexColumnConverters(getCarbonTable(), dataMapMeta.getIndexedColumns()); -dataMaps.add(bloomDM); +if (shard.endsWith(BloomIndexFileStore.MERGE_BLOOM_INDEX_SHARD_NAME) || +filteredShards.contains(new File(shard).getName())) { + // Filter out the tasks which are filtered through Main datamap. + // for merge shard, shard pruning delay to be done before pruning blocklet + BloomCoarseGrainDataMap bloomDM = new BloomCoarseGrainDataMap(); + bloomDM.init(new BloomDataMapModel(shard, cache)); + bloomDM.initIndexColumnConverters(getCarbonTable(), dataMapMeta.getIndexedColumns()); + bloomDM.setFilteredShard(filteredShards); + dataMaps.add(bloomDM); +} } } catch (Exception e) { throw new IOException("Error occurs while init Bloom DataMap", e); } return dataMaps; } - @Override - public List getDataMaps(DataMapDistributable distributable) - throws IOException { -List coarseGrainDataMaps = new ArrayList<>(); -BloomCoarseGrainDataMap bloomCoarseGrainDataMap = new BloomCoarseGrainDataMap(); -String indexPath = ((BloomDataMapDistributable) distributable).getIndexPath(); -bloomCoarseGrainDataMap.init(new BloomDataMapModel(indexPath, cache)); -bloomCoarseGrainDataMap.initIndexColumnConverters(getCarbonTable(), -dataMapMeta.getIndexedColumns()); -coarseGrainDataMaps.add(bloomCoarseGrainDataMap); -return coarseGrainDataMaps; - } - /** - * returns all the directories of lucene index files for query - * Note: copied from luceneDataMapFactory, will extract to a common interface + * returns all shard directories of bloom index files for query + * if bloom index files are merged we should get only one shard path */ - private CarbonFile[] getAllIndexDirs(String tablePath, String segmentId) { -List indexDirs = new ArrayList<>(); -List dataMaps; -try { - // there can be multiple bloom datamaps present on a table, so get all datamaps and form - // the path till the index file directories in all datamaps folders present in each segment - dataMaps = DataMapStoreManager.getInstance().getAllDataMap(getCarbonTable()); -} catch (IOException ex) { - LOGGER.error(ex, String.format("failed to get datamaps for tablePath %s, segmentId %s", - tablePath, segmentId)); - throw new RuntimeException(ex); -} -if (dataMaps.size() > 0) { - for (TableDataMap dataMap : dataMaps) { -if (dataMap.getDataMapSchema().getDataMapName().equals(this.dataMapName)) { - List indexFiles; - String dmPath = CarbonTablePath.getDataMapStorePath(tablePath, segmentId, - dataMap.getDataMapSchema().getDataMapName()); - FileFactory.FileType fileType = FileFactory.getFileType(dmPath); - final CarbonFile dirPath = FileFactory.getCarbonFile(dmPath, fileType); - indexFiles = Arrays.asList(dirPath.listFiles(new CarbonFileFilter() { -@Override -public boolean accept(CarbonFile file) { - return file.isDirectory(); -} -
[GitHub] carbondata pull request #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom i...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2624#discussion_r210214357 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/datamap/CarbonMergeBloomIndexFilesRDD.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.datamap + +import scala.collection.JavaConverters._ + +import org.apache.spark.Partition +import org.apache.spark.rdd.CarbonMergeFilePartition +import org.apache.spark.SparkContext +import org.apache.spark.TaskContext + +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.datamap.bloom.BloomIndexFileStore +import org.apache.carbondata.spark.rdd.CarbonRDD + + +/** + * RDD to merge all bloomindex files of each segment for bloom datamap. + * + * @param sc + * @param carbonTable + * @param segmentIds segments to be merged + * @param bloomDatamapNames list of bloom datamap + * @param bloomIndexColumns list of index columns correspond to datamap + */ +class CarbonMergeBloomIndexFilesRDD( + sc: SparkContext, + carbonTable: CarbonTable, + segmentIds: Seq[String], + bloomDatamapNames: Seq[String], + bloomIndexColumns: Seq[Seq[String]]) + extends CarbonRDD[String](sc, Nil, sc.hadoopConfiguration) { + + override def getPartitions: Array[Partition] = { +segmentIds.zipWithIndex.map {s => + CarbonMergeFilePartition(id, s._2, s._1) +}.toArray + } + + override def internalCompute(theSplit: Partition, context: TaskContext): Iterator[String] = { +val tablePath = carbonTable.getTablePath +val split = theSplit.asInstanceOf[CarbonMergeFilePartition] +logInfo("Merging bloom index files of segment : " + split.segmentId) --- End diff -- s"Merging bloom index files of segment ${SEG_ID} for ${TABLE} ---
[GitHub] carbondata pull request #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom i...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2624#discussion_r210213840 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomIndexFileStore.java --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.datamap.bloom; + +import java.io.ByteArrayInputStream; +import java.io.DataInputStream; +import java.io.File; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.filesystem.CarbonFileFilter; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.fileoperations.FileWriteOperation; +import org.apache.carbondata.core.reader.ThriftReader; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.writer.ThriftWriter; +import org.apache.carbondata.format.MergedBloomIndex; +import org.apache.carbondata.format.MergedBloomIndexHeader; + +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.util.bloom.CarbonBloomFilter; +import org.apache.thrift.TBase; + +public class BloomIndexFileStore { + + private static final LogService LOGGER = + LogServiceFactory.getLogService(BloomIndexFileStore.class.getName()); + + /*suffix of original generated file*/ + public static final String BLOOM_INDEX_SUFFIX = ".bloomindex"; + /*suffix of merged bloom index file*/ + public static final String MERGE_BLOOM_INDEX_SUFFIX = ".bloomindexmerge"; + /* directory to store merged bloom index files */ + public static final String MERGE_BLOOM_INDEX_SHARD_NAME = "mergeShard"; + /** + * flag file for merging + * if flag file exists, query won't use mergeShard + * if flag file not exists and mergeShard generated, query will use mergeShard + */ + public static final String MERGE_INPROGRESS_FILE = "mergeShard.inprogress"; + + + public static void mergeBloomIndexFile(String dmSegmentPathString, List indexCols) { +// get all shard paths of old store +CarbonFile segmentPath = FileFactory.getCarbonFile(dmSegmentPathString, +FileFactory.getFileType(dmSegmentPathString)); +CarbonFile[] shardPaths = segmentPath.listFiles(new CarbonFileFilter() { + @Override + public boolean accept(CarbonFile file) { +return file.isDirectory() && !file.getName().equals(MERGE_BLOOM_INDEX_SHARD_NAME); + } +}); + +String mergeShardPath = dmSegmentPathString + File.separator + MERGE_BLOOM_INDEX_SHARD_NAME; +String mergeInprofressFile = dmSegmentPathString + File.separator + MERGE_INPROGRESS_FILE; +try { + // delete mergeShard folder if exists + if (FileFactory.isFileExist(mergeShardPath)) { +FileFactory.deleteFile(mergeShardPath, FileFactory.getFileType(mergeShardPath)); + } + // create flag file before creating mergeShard folder + if (!FileFactory.isFileExist(mergeInprofressFile)) { +FileFactory.createNewFile( +mergeInprofressFile, FileFactory.getFileType(mergeInprofressFile)); + } + // prepare mergeShard output folder + if (!FileFactory.mkdirs(mergeShardPath, FileFactory.getFileType(mergeShardPath))) { +throw new RuntimeException("Failed to create directory " + mergeShardPath); + } +} catch (IOException e) { + LOGGER.error(e, "Error occurs while create directory " + mergeShardPath); + throw new RuntimeException("Error occurs while create directory " + mergeShardPath); +} + +// for each index column, merge the bloomindex files from all
[GitHub] carbondata pull request #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom i...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2624#discussion_r210212518 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomIndexFileStore.java --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.datamap.bloom; + +import java.io.ByteArrayInputStream; +import java.io.DataInputStream; +import java.io.File; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.filesystem.CarbonFileFilter; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.fileoperations.FileWriteOperation; +import org.apache.carbondata.core.reader.ThriftReader; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.writer.ThriftWriter; +import org.apache.carbondata.format.MergedBloomIndex; +import org.apache.carbondata.format.MergedBloomIndexHeader; + +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.util.bloom.CarbonBloomFilter; +import org.apache.thrift.TBase; + +public class BloomIndexFileStore { --- End diff -- Add InterfaceAudience.Internal annotation for this Add comment for this class toot ---
[GitHub] carbondata pull request #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom i...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2624#discussion_r210212700 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomIndexFileStore.java --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.datamap.bloom; + +import java.io.ByteArrayInputStream; +import java.io.DataInputStream; +import java.io.File; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.datastore.filesystem.CarbonFile; +import org.apache.carbondata.core.datastore.filesystem.CarbonFileFilter; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.fileoperations.FileWriteOperation; +import org.apache.carbondata.core.reader.ThriftReader; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.writer.ThriftWriter; +import org.apache.carbondata.format.MergedBloomIndex; +import org.apache.carbondata.format.MergedBloomIndexHeader; + +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.util.bloom.CarbonBloomFilter; +import org.apache.thrift.TBase; + +public class BloomIndexFileStore { + + private static final LogService LOGGER = + LogServiceFactory.getLogService(BloomIndexFileStore.class.getName()); + + /*suffix of original generated file*/ --- End diff -- use `//` for one line comment ---
[GitHub] carbondata pull request #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom i...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2624#discussion_r210214549 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/datamap/CarbonMergeBloomIndexFilesRDD.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.datamap + +import scala.collection.JavaConverters._ + +import org.apache.spark.Partition +import org.apache.spark.rdd.CarbonMergeFilePartition +import org.apache.spark.SparkContext +import org.apache.spark.TaskContext + +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.datamap.bloom.BloomIndexFileStore +import org.apache.carbondata.spark.rdd.CarbonRDD + + +/** + * RDD to merge all bloomindex files of each segment for bloom datamap. --- End diff -- `of each segment` => `of specified segment` ---
[GitHub] carbondata pull request #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom i...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2624#discussion_r210211799 --- Diff: datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java --- @@ -222,102 +220,95 @@ public DataMapBuilder createBuilder(Segment segment, String shardName, @Override public List getDataMaps(Segment segment) throws IOException { -List dataMaps = new ArrayList(1); +List dataMaps = new ArrayList<>(); try { Set shardPaths = segmentMap.get(segment.getSegmentNo()); if (shardPaths == null) { -String dataMapStorePath = DataMapWriter.getDefaultDataMapPath( -getCarbonTable().getTablePath(), segment.getSegmentNo(), dataMapName); -CarbonFile[] carbonFiles = FileFactory.getCarbonFile(dataMapStorePath).listFiles(); -shardPaths = new HashSet<>(); -for (CarbonFile carbonFile : carbonFiles) { - shardPaths.add(carbonFile.getAbsolutePath()); -} +shardPaths = getAllShardPaths(getCarbonTable().getTablePath(), segment.getSegmentNo()); segmentMap.put(segment.getSegmentNo(), shardPaths); } + Set filteredShards = segment.getFilteredIndexShardNames(); for (String shard : shardPaths) { -BloomCoarseGrainDataMap bloomDM = new BloomCoarseGrainDataMap(); -bloomDM.init(new BloomDataMapModel(shard, cache)); -bloomDM.initIndexColumnConverters(getCarbonTable(), dataMapMeta.getIndexedColumns()); -dataMaps.add(bloomDM); +if (shard.endsWith(BloomIndexFileStore.MERGE_BLOOM_INDEX_SHARD_NAME) || +filteredShards.contains(new File(shard).getName())) { + // Filter out the tasks which are filtered through Main datamap. + // for merge shard, shard pruning delay to be done before pruning blocklet + BloomCoarseGrainDataMap bloomDM = new BloomCoarseGrainDataMap(); + bloomDM.init(new BloomDataMapModel(shard, cache)); + bloomDM.initIndexColumnConverters(getCarbonTable(), dataMapMeta.getIndexedColumns()); + bloomDM.setFilteredShard(filteredShards); + dataMaps.add(bloomDM); +} } } catch (Exception e) { throw new IOException("Error occurs while init Bloom DataMap", e); } return dataMaps; } - @Override - public List getDataMaps(DataMapDistributable distributable) - throws IOException { -List coarseGrainDataMaps = new ArrayList<>(); -BloomCoarseGrainDataMap bloomCoarseGrainDataMap = new BloomCoarseGrainDataMap(); -String indexPath = ((BloomDataMapDistributable) distributable).getIndexPath(); -bloomCoarseGrainDataMap.init(new BloomDataMapModel(indexPath, cache)); -bloomCoarseGrainDataMap.initIndexColumnConverters(getCarbonTable(), -dataMapMeta.getIndexedColumns()); -coarseGrainDataMaps.add(bloomCoarseGrainDataMap); -return coarseGrainDataMaps; - } - /** - * returns all the directories of lucene index files for query - * Note: copied from luceneDataMapFactory, will extract to a common interface + * returns all shard directories of bloom index files for query + * if bloom index files are merged we should get only one shard path */ - private CarbonFile[] getAllIndexDirs(String tablePath, String segmentId) { -List indexDirs = new ArrayList<>(); -List dataMaps; -try { - // there can be multiple bloom datamaps present on a table, so get all datamaps and form - // the path till the index file directories in all datamaps folders present in each segment - dataMaps = DataMapStoreManager.getInstance().getAllDataMap(getCarbonTable()); -} catch (IOException ex) { - LOGGER.error(ex, String.format("failed to get datamaps for tablePath %s, segmentId %s", - tablePath, segmentId)); - throw new RuntimeException(ex); -} -if (dataMaps.size() > 0) { - for (TableDataMap dataMap : dataMaps) { -if (dataMap.getDataMapSchema().getDataMapName().equals(this.dataMapName)) { - List indexFiles; - String dmPath = CarbonTablePath.getDataMapStorePath(tablePath, segmentId, - dataMap.getDataMapSchema().getDataMapName()); - FileFactory.FileType fileType = FileFactory.getFileType(dmPath); - final CarbonFile dirPath = FileFactory.getCarbonFile(dmPath, fileType); - indexFiles = Arrays.asList(dirPath.listFiles(new CarbonFileFilter() { -@Override -public boolean accept(CarbonFile file) { - return file.isDirectory(); -} -
[GitHub] carbondata issue #2624: [CARBONDATA-2845][BloomDataMap] Merge bloom index fi...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2624 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6270/ ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2627#discussion_r210202221 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -80,6 +81,16 @@ object MVHelper { dmProperties.foreach(t => tableProperties.put(t._1, t._2)) val selectTables = getTables(logicalPlan) +selectTables.map { selectTable => + val mainCarbonTable = CarbonEnv.getCarbonTableOption(selectTable.identifier.database, +selectTable.identifier.table)(sparkSession) + + if (!mainCarbonTable.isEmpty && mainCarbonTable.get.isStreamingSink ) { +throw new MalformedCarbonCommandException(s"Streaming table does not support creating " + --- End diff -- weird indentation here. You can start the statement in a new line to make the message in a complete line for better reading. ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2627#discussion_r210204028 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala --- @@ -237,6 +237,21 @@ object CarbonEnv { getCarbonTable(tableIdentifier.database, tableIdentifier.table)(sparkSession) } + /** + * whether the classification identifier is a carbon table --- End diff -- The current expression is improper, it seems that this method returns a boolean which actually is not. So it can be optimized to ``` This method returns corresponding CarbonTable, it will return None if it's not a CarbonTable ``` ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2627#discussion_r210202460 --- Diff: datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala --- @@ -80,6 +81,16 @@ object MVHelper { dmProperties.foreach(t => tableProperties.put(t._1, t._2)) val selectTables = getTables(logicalPlan) +selectTables.map { selectTable => --- End diff -- since it will take any effect, you can use `foreach` instead of `map` ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2627#discussion_r210203693 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala --- @@ -277,6 +278,10 @@ class DDLStrategy(sparkSession: SparkSession) extends SparkStrategy { throw new MalformedCarbonCommandException( "Streaming property value is incorrect") } +if (CarbonTable.hasMVDataMap(carbonTable)) { + throw new MalformedCarbonCommandException( +"The table which has MV datamap, does not support set streaming property") --- End diff -- Please optimize the error message to ``` Cannot set streaming property on table which has MV datamap ``` OR just remove the comma in your original error message. ---
[GitHub] carbondata pull request #2627: [CARBONDATA-2835] [MVDataMap] Block MV datama...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2627#discussion_r210206238 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala --- @@ -73,13 +73,8 @@ case class CarbonCreateDataMapCommand( } } -if (mainTable != null && -mainTable.isStreamingSink && - !(dmProviderName.equalsIgnoreCase(DataMapClassProvider.PREAGGREGATE.toString) - || dmProviderName.equalsIgnoreCase(DataMapClassProvider.TIMESERIES.toString))) { - throw new MalformedCarbonCommandException(s"Streaming table does not support creating " + -s"$dmProviderName datamap") -} +// delete this code because streaming table only does not support creating MV datamap, --- End diff -- These two statement can be optimized to ``` // Carbondata support index/preagg datamap on streaming table and does not support MV on streaming table. // It will be blocked when we parse the original table of the query statement of MV, refer to CARBONDATA-2835 ``` ---
[GitHub] carbondata issue #2638: [CARBONDATA-2859][SDV] Add sdv test cases for bloomf...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2638 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6640/ ---
[GitHub] carbondata issue #2638: [CARBONDATA-2859][SDV] Add sdv test cases for bloomf...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2638 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7916/ ---
[GitHub] carbondata issue #2638: [CARBONDATA-2859][SDV] Add sdv test cases for bloomf...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2638 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6269/ ---
[GitHub] carbondata issue #2637: [HOTFIX] Correct the sentence to be meaningful
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2637 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6639/ ---
[jira] [Updated] (CARBONDATA-2859) add sdv test case for bloomfilter datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuchuanyin updated CARBONDATA-2859: --- Issue Type: Sub-task (was: Bug) Parent: CARBONDATA-2632 > add sdv test case for bloomfilter datamap > - > > Key: CARBONDATA-2859 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2859 > Project: CarbonData > Issue Type: Sub-task >Reporter: xuchuanyin >Assignee: xuchuanyin >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > add sdv test case for bloomfilter datamap -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2638: [CARBONDATA-2859][SDV] Add sdv test cases for...
GitHub user xuchuanyin opened a pull request: https://github.com/apache/carbondata/pull/2638 [CARBONDATA-2859][SDV] Add sdv test cases for bloomfilter datamap add sdv test cases for bloomfilter datamap Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x] Any interfaces changed? `NO` - [x] Any backward compatibility impacted? `NO` - [x] Document update required? `NO` - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? `Add SDV tests` - How it is tested? Please attach test report. `NA` - Is it a performance related change? Please attach the performance test report. `NA` - Any additional information to help reviewers in testing this change. `NA` - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. `NA` You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata 0814_bloom_sdv Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2638.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2638 commit 111cb618c51f7b589d5b08c908a49749a200ef97 Author: xuchuanyin Date: 2018-08-15T07:01:21Z Add sdv test cases for bloomfilter datamap add sdv test cases for bloomfilter datamap ---
[GitHub] carbondata issue #2637: [HOTFIX] Correct the sentence to be meaningful
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2637 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7915/ ---
[GitHub] carbondata issue #2635: [CARBONDATA-2856][BloomDataMap] Fix bug in bloom ind...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2635 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6638/ ---
[jira] [Created] (CARBONDATA-2859) add sdv test case for bloomfilter datamap
xuchuanyin created CARBONDATA-2859: -- Summary: add sdv test case for bloomfilter datamap Key: CARBONDATA-2859 URL: https://issues.apache.org/jira/browse/CARBONDATA-2859 Project: CarbonData Issue Type: Bug Reporter: xuchuanyin Assignee: xuchuanyin add sdv test case for bloomfilter datamap -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2635: [CARBONDATA-2856][BloomDataMap] Fix bug in bloom ind...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2635 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7914/ ---
[GitHub] carbondata issue #2637: [HOTFIX] Correct the sentence to be meaningful
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2637 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6268/ ---
[GitHub] carbondata issue #2637: [HOTFIX] Correct the sentence to be meaningful
Github user sraghunandan commented on the issue: https://github.com/apache/carbondata/pull/2637 LGTM ---
[GitHub] carbondata pull request #2637: [HOTFIX]Correct the sentence to be meaningful...
GitHub user brijoobopanna opened a pull request: https://github.com/apache/carbondata/pull/2637 [HOTFIX]Correct the sentence to be meaningfull Modify the wording to make it meaningful with the context Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [x ] Any interfaces changed? No - [x] Any backward compatibility impacted? No - [x] Document update required? No You can merge this pull request into a Git repository by running: $ git pull https://github.com/brijoobopanna/carbondata patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2637.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2637 commit 795342aa98f07aa0a0ef95ff392b067c254be005 Author: Brijoo Bopanna Date: 2018-08-15T06:14:23Z [HOTFIX]Correct the sentence to be meaningfull Modify the wording to make it meaningful with the context ---
[GitHub] carbondata issue #2636: [CARBONDATA-2857] Correct the Contribution content
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2636 Can one of the admins verify this patch? ---
[GitHub] carbondata issue #2636: [CARBONDATA-2857] Correct the Contribution content
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2636 Can one of the admins verify this patch? ---
[GitHub] carbondata issue #2636: [CARBONDATA-2857] Correct the Contribution content
Github user sraghunandan commented on the issue: https://github.com/apache/carbondata/pull/2636 LGTM ---
[GitHub] carbondata issue #2636: [CARBONDATA-2857] Correct the Contribution content
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2636 Can one of the admins verify this patch? ---
[GitHub] carbondata pull request #2636: [CARBONDATA-2857] Correct the Contribution co...
GitHub user vrajmuradi opened a pull request: https://github.com/apache/carbondata/pull/2636 [CARBONDATA-2857] Correct the Contribution content changed 1. Updated Design section 2. Updated Fork section Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [X] Any interfaces changed? NA - [X] Any backward compatibility impacted? NA - [X] Document update required? YES You can merge this pull request into a Git repository by running: $ git pull https://github.com/vrajmuradi/carbondata patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2636.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2636 ---
[GitHub] carbondata issue #2635: [CARBONDATA-2856][BloomDataMap] Fix bug in bloom ind...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2635 retest this please ---