[GitHub] carbondata pull request #2623: [CARBONDATA-2844] Pass SK/AK to executor by s...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2623#discussion_r212869553 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataPageSourceProvider.java --- @@ -79,6 +80,7 @@ @Override public ConnectorPageSource createPageSource(ConnectorTransactionHandle transactionHandle, ConnectorSession session, ConnectorSplit split, List columns) { +ThreadLocalSessionInfo.getOrCreateCarbonSessionInfo(); --- End diff -- removed ---
[GitHub] carbondata issue #2623: [CARBONDATA-2844] Pass SK/AK to executor by serializ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2623 @ravipesala fixed the comments ---
[GitHub] carbondata issue #2623: [CARBONDATA-2844] add sk ak to file factory on creat...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2623 @ravipesala Please review. ---
[GitHub] carbondata pull request #2655: [WIP][TEST] sk ak test
Github user kunal642 closed the pull request at: https://github.com/apache/carbondata/pull/2655 ---
[GitHub] carbondata pull request #2655: [WIP][TEST] sk ak test
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2655 [WIP][TEST] sk ak test Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata sk_ak_test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2655.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2655 commit 1f95301cc74e042bdf71564d687b604646c16ce3 Author: kunal642 Date: 2018-08-08T16:20:44Z add sk ak to ThreadLocal in driver and executor. commit 629fac6bc7f9e81ab91f8b300416ffeae3a6faa9 Author: kunal642 Date: 2018-08-23T18:54:05Z test PR ---
[GitHub] carbondata issue #2643: [Documentation] Formatting fix s3
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2643 LGTM ---
[GitHub] carbondata pull request #2381: [WIP] fixed SparkCarbonFileFormat for flat fo...
Github user kunal642 closed the pull request at: https://github.com/apache/carbondata/pull/2381 ---
[GitHub] carbondata pull request #2484: [WIP] added hadoop conf to thread local
Github user kunal642 closed the pull request at: https://github.com/apache/carbondata/pull/2484 ---
[GitHub] carbondata pull request #2623: [CARBONDATA-2844] add sk ak to file factory o...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2623#discussion_r208935162 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonConfiguration.java --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.util; + +import java.io.Serializable; +import java.util.HashMap; +import java.util.Map; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; + +public class CarbonConfiguration implements Serializable { + + private static final long serialVersionUID = 3811544264223154007L; + private transient Configuration configuration; + private HashMap extraConf; + + public CarbonConfiguration(Configuration configuration) { +this.configuration = configuration; +HashMap s3Conf = new HashMap(); +s3Conf.put("fs.s3a.access.key", configuration.get("fs.s3a.access.key")); +s3Conf.put("fs.s3a.secret.key", configuration.get("fs.s3a.secret.key")); --- End diff -- done ---
[GitHub] carbondata issue #2623: [CARBONDATA-2844] add sk ak to file factory on creat...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2623 retest this please ---
[GitHub] carbondata issue #2623: [HOTFIX] add sk ak to file factory on creation of ca...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2623 Retest this please ---
[GitHub] carbondata pull request #2623: [HOTFIX] add sk ak to file factory on creatio...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2623 [HOTFIX] add sk ak to file factory on creation of carbon env add SK AK to file factory on creation of carbon env Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata sk_ak_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2623.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2623 commit 3a69dc5818ee9d31f195a5737bffd1b80298ed3f Author: kunal642 Date: 2018-08-08T16:20:44Z add sk ak to file factory on creation of carbon env ---
[GitHub] carbondata pull request #2612: [CARBONDATA-2834] Remove unnecessary nested l...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2612 [CARBONDATA-2834] Remove unnecessary nested looping over loadMetadatadetails. removed nested for loop which causes query performance degradation if⦠Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata nestedloop_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2612.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2612 commit ebe22d331dc4ea4ef6904e779702801c3eb5d859 Author: kunal642 Date: 2018-08-06T12:47:28Z removed nested for loop which causes query performance degradation if number of segments are too many ---
[GitHub] carbondata issue #2603: [Documentation] Editorial review comment fixed
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2603 LGTM ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207250192 --- Diff: docs/s3-guide.md --- @@ -0,0 +1,64 @@ + + +#S3 Guide (Alpha Feature 1.4.1) +S3 is an Object Storage API on cloud, it is recommended for storing large data files. You can use +this feature if you want to store data on Amazon cloud or Huawei cloud(OBS). +Since the data is stored on to cloud there are no restrictions on the size of data and the data can be accessed from anywhere at any time. +Carbondata can support any Object Storage that conforms to Amazon S3 API. --- End diff -- merged ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207250096 --- Diff: docs/configuration-parameters.md --- @@ -106,7 +106,10 @@ This section provides the details of all the configurations required for CarbonD |-|--|-| | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed buffer size is 10485760 byte. | | carbon.lock.type | LOCALLOCK | This configuration specifies the type of lock to be acquired during concurrent operations on table. There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking. | -| carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to be created. Recommended to configure zookeeper lock type or configure HDFS lock path(to this property) in case of S3 file system as locking is not feasible on S3. +| carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to +be created. Recommended to configure HDFS lock path(to this property) in case of S3 file system +as locking is not feasible on S3. +**Note:** If this property is not set to HDFS location for S3 store, then there is a possibility of data corruption. --- End diff -- done ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207250066 --- Diff: docs/configuration-parameters.md --- @@ -106,7 +106,10 @@ This section provides the details of all the configurations required for CarbonD |-|--|-| | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed buffer size is 10485760 byte. | | carbon.lock.type | LOCALLOCK | This configuration specifies the type of lock to be acquired during concurrent operations on table. There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking. | -| carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to be created. Recommended to configure zookeeper lock type or configure HDFS lock path(to this property) in case of S3 file system as locking is not feasible on S3. +| carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to --- End diff -- added description ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207249973 --- Diff: docs/data-management-on-carbondata.md --- @@ -730,6 +736,8 @@ Users can specify which columns to include and exclude for local dictionary gene * If the IGNORE option is used, then bad records are neither loaded nor written to the separate CSV file. * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and the load operation fails. * The maximum number of characters per column is 32000. If there are more than 32000 characters in a column, data loading will fail. + * Since Bad Records Path can be specified in both create, load and carbon properties. --- End diff -- done ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207249849 --- Diff: docs/datamap/preaggregate-datamap-guide.md --- @@ -7,6 +24,7 @@ * [Querying Data](#querying-data) * [Compaction](#compacting-pre-aggregate-tables) * [Data Management](#data-management-with-pre-aggregate-tables) +* [Limitations](#Limitations) --- End diff -- removed ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207249941 --- Diff: docs/s3-guide.md --- @@ -0,0 +1,63 @@ + + +#S3 Guide (Alpha Feature 1.4.1) +Amazon S3 is a cloud storage service that is recommended for storing large data files. You can +use this feature if you want to store data on amazon cloud. Since the data is stored on to cloud +storage there are no restrictions on the size of data and the data can be accessed from anywhere at any time. +Carbon can support any Object store that conforms to Amazon S3 API. + +#Writing to Object Store +To store carbondata files on to Object Store location, you need to set `carbon +.storelocation` property to Object Store path in CarbonProperties file. For example, carbon +.storelocation=s3a://mybucket/carbonstore. By setting this property, all the tables will be created on the specified Object Store path. + +If your existing store is HDFS, and you want to store specific tables on S3 location, then `location` parameter has to be set during create +table. +For example: + +``` +CREATE TABLE IF NOT EXISTS db1.table1(col1 string, col2 int) STORED AS carbondata LOCATION 's3a://mybucket/carbonstore' +``` + +For more details on create table, Refer [data-management-on-carbondata](https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#create-table) + +#Authentication +You need to set authentication properties to store the carbondata files on to S3 location. For +more details on authentication properties, refer +[hadoop authentication document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties) + +Another way of setting the authentication parameters is as follows: + +``` + SparkSession + .builder() + .master(masterURL) + .appName("S3Example") + .config("spark.driver.host", "localhost") + .config("spark.hadoop.fs.s3a.access.key", "") + .config("spark.hadoop.fs.s3a.secret.key", "") + .config("spark.hadoop.fs.s3a.endpoint", "1.1.1.1") + .getOrCreateCarbonSession() +``` + +#Recommendations +1. Object stores like S3 does not support file leasing mechanism(supported by HDFS) that is +required to take locks which ensure consistency between concurrent operations therefore, it is +recommended to set the configurable lock path property([carbon.lock.path](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md#miscellaneous-configuration)) + to a HDFS directory. +2. As Object stores are eventual consistent meaning that any put request can take some time to reflect when trying to list objects from that bucket therefore concurrent queries are not supported. --- End diff -- done ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207249910 --- Diff: docs/s3-guide.md --- @@ -0,0 +1,63 @@ + + +#S3 Guide (Alpha Feature 1.4.1) +Amazon S3 is a cloud storage service that is recommended for storing large data files. You can --- End diff -- done ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2576#discussion_r207249485 --- Diff: docs/configuration-parameters.md --- @@ -106,7 +106,12 @@ This section provides the details of all the configurations required for CarbonD |-|--|-| | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed buffer size is 10485760 byte. | | carbon.lock.type | LOCALLOCK | This configuration specifies the type of lock to be acquired during concurrent operations on table. There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking. | -| carbon.lock.path | TABLEPATH | This configuration specifies the path where lock files have to be created. Recommended to configure zookeeper lock type or configure HDFS lock path(to this property) in case of S3 file system as locking is not feasible on S3. +| carbon.lock.path | TABLEPATH | Locks on the files are used to prevent concurrent operation from modifying the same files. This +configuration specifies the path where lock files have to be created. Recommended to configure +HDFS lock path(to this property) in case of S3 file system as locking is not feasible on S3. +**Note:** If this property is not set to HDFS location for S3 store, then there is a possibility +of data corruption because multiple data manipulation calls might try to update the status file +and as lock is not acquired before updation data might get overwritten. --- End diff -- added ---
[GitHub] carbondata issue #2599: [CARBONDATA-2812] Implement freeMemory for complex p...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2599 LGTM ---
[GitHub] carbondata issue #2571: [CARBONDATA-2792][schema restructure] Create externa...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2571 LGTM ---
[GitHub] carbondata issue #2600: [CARBONDATA-2813] Fixed code to get data size from L...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2600 retest this please ---
[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2576 retest this please ---
[GitHub] carbondata pull request #2600: [CARBONDATA-2813] Fixed code to get data size...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2600 [CARBONDATA-2813] Fixed code to get data size from LoadDetails if size is written there. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata major_comp Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2600.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2600 commit 44b66bb1e4fc2621c3641f536247d400a963cff1 Author: kunal642 Date: 2018-08-02T06:14:20Z Fixed code to get data size from LoadDetails if size is written there. ---
[GitHub] carbondata issue #2578: [CARBONDATA-2798] Fix Dictionary_Include for Complex...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2578 LGTM ---
[GitHub] carbondata pull request #2578: [CARBONDATA-2798] Fix Dictionary_Include for ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2578#discussion_r206531160 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala --- @@ -52,10 +52,11 @@ import org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory import org.apache.carbondata.core.metadata.ColumnarFormatVersion import org.apache.carbondata.core.metadata.datatype.DataTypes import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, DataFileFooterConverterV3} +import org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException import org.apache.carbondata.sdk.file._ -class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll { +class TestNonTransactiogitnalCarbonTable extends QueryTest with BeforeAndAfterAll { --- End diff -- Please revert this change ---
[GitHub] carbondata pull request #2571: [CARBONDATA-2792][schema restructure] Create ...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2571#discussion_r206480372 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/util/CarbonSparkUtil.scala --- @@ -87,18 +88,20 @@ object CarbonSparkUtil { val fields = new Array[String]( carbonRelation.dimensionsAttr.size + carbonRelation.measureAttr.size) val carbonTable = carbonRelation.carbonTable +val columnSchemas: mutable.Buffer[ColumnSchema] = carbonTable.getTableInfo.getFactTable. + getListOfColumns.asScala.filter(_.getSchemaOrdinal != -1).filter(!_.isInvisible). --- End diff -- both filter conditions can be combined to one like .filter(a>2 && b<10). ---
[GitHub] carbondata pull request #2578: Fix Dictionary_Include for ComplexDataType
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2578#discussion_r206396378 --- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java --- @@ -236,7 +236,7 @@ } } -if (carbonLoadModel.isCarbonTransactionalTable() && !CarbonDataProcessorUtil +if (hadoopConf != null && !CarbonDataProcessorUtil --- End diff -- why hadoopConf null check is needed? ---
[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2576 @sraghunandan @sgururajshetty @chenliang613 Please review ---
[GitHub] carbondata issue #2578: Fix Dictionary_Include for ComplexDataType
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2578 Please add a test case to verify the fix ---
[GitHub] carbondata pull request #2578: Fix Dictionary_Include for ComplexDataType
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2578#discussion_r206091924 --- Diff: processing/src/main/java/org/apache/carbondata/processing/datatypes/ArrayDataType.java --- @@ -172,7 +191,7 @@ public void setSurrogateIndex(int surrIndex) { @Override public void fillCardinality(List dimCardWithComplex) { -if (children.getIsColumnDictionary()) { +if (this.isDictionaryColumn) { --- End diff -- return isDictionaryColumn from getIsColumnDictionary() also. ---
[GitHub] carbondata issue #2517: [CARBONDATA-2749][dataload] In HDFS Empty tablestatu...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2517 LGTM. The failures in 2.2.1 build are note related to this PR. Those are failing on master as well. Need to fix. ---
[GitHub] carbondata pull request #2563: WIP test
Github user kunal642 closed the pull request at: https://github.com/apache/carbondata/pull/2563 ---
[GitHub] carbondata pull request #2576: [CARBONDATA-2795] disable preagg datamap on d...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2576 [CARBONDATA-2795] disable preagg datamap on dataload for s3 store Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata disable_preagg Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2576.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2576 commit ee154f1e94e8ada5fcba08cda3534e9d80ee506a Author: kunal642 Date: 2018-07-29T16:14:22Z disable preagg datamap on dataload for s3 store ---
[GitHub] carbondata issue #2575: [WIP] fixed for ModularPlan exception during update ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2575 @rahulforallp Please fix the build. ---
[GitHub] carbondata issue #2562: [HOTFIX] CreateDataMapPost Event was skipped in case...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2562 LGTM ---
[GitHub] carbondata issue #2552: [CARBONDATA-2781] Added fix for Null Pointer Excpeti...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2552 @praveenmeenakshi56 Please add detailed description stating the issue and the solution(Why specific handling for preagg and timeseries is required). ---
[GitHub] carbondata issue #2553: [HOTFIX] Fixed random test failure
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2553 @mohammadshahidkhan Please add description ---
[GitHub] carbondata issue #2563: WIP test
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2563 retest this please ---
[GitHub] carbondata issue #2563: WIP test
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2563 retest this please ---
[GitHub] carbondata pull request #2563: WIP test
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2563 WIP test Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata temp Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2563.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2563 commit f3e6792f2b96df0852c43cf816283a8f7dc64a4b Author: kunal642 Date: 2018-07-12T11:39:54Z added hadoop conf in thread params ---
[GitHub] carbondata issue #2549: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2549 @dhatchayani Please rebase ---
[GitHub] carbondata issue #2535: [CARBONDATA-2606]Fix Complex array Pushdown
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2535 LGTM ---
[GitHub] carbondata pull request #2535: [CARBONDATA-2606]Fix Complex array Pushdown
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2535#discussion_r204983701 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala --- @@ -77,6 +77,7 @@ import org.apache.carbondata.spark.dictionary.provider.SecureDictionaryServicePr import org.apache.carbondata.spark.dictionary.server.SecureDictionaryServer import org.apache.carbondata.spark.load.{CsvRDDHelper, DataLoadProcessorStepOnSpark} import org.apache.carbondata.spark.rdd.CarbonDataRDDFactory +import org.apache.carbondata.spark.rdd.CarbonDataRDDFactory.LOGGER --- End diff -- no need for this import. CarbonLoadDataCommand already has a LOGGER ---
[GitHub] carbondata issue #2514: [CARBONDATA-2740]segment file is not getting deleted...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2514 LGTM ---
[GitHub] carbondata pull request #2548: [CARBONDATA-2778]Fixed bug when select after ...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2548 [CARBONDATA-2778]Fixed bug when select after delete and cleanup is showing empty records Problem: In case if delete operation when it is found that the data being deleted is leading to a state where one complete block data is getting deleted. In that case the status if that block is marked for delete and during the next delete operation run the block is deleted along with its carbonIndex file. The problem arises due to deletion of carbonIndex file because for multiple blocks there can be one carbonIndex file as one carbonIndex file represents one task. Solution: Do not delete the carbondata and carbonIndex file. After compaction it will automatically take care of deleting the stale data and stale segments. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata iud_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2548.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2548 commit b5f9c3c857a93d7633580417f63e9c745cbcbbe6 Author: kunal642 Date: 2018-07-24T10:42:54Z Fixed bug when select after delete and cleanup is showing empty records ---
[GitHub] carbondata issue #2484: [HOTFIX] added hadoop conf to thread local
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2484 retest this please ---
[GitHub] carbondata pull request #2535: [CARBONDATA-2606]Fix Complex array Pushdown
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2535#discussion_r204642515 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestComplexDataType.scala --- @@ -885,4 +885,47 @@ class TestComplexDataType extends QueryTest with BeforeAndAfterAll { checkExistence(sql("select * from table1"),true,"1.0E9") } + test("test block compaction - auto merge") { +sql("DROP TABLE IF EXISTS table1") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.ENABLE_AUTO_LOAD_MERGE, "true") +sql( + "create table table1 (roll int,person Struct) stored " + + "by 'carbondata'") +sql( + "load data inpath '" + resourcesPath + + "/Struct.csv' into table table1 options('delimiter'=','," + + "'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," + + "'complex_delimiter_level_2'='&')") +sql( + "load data inpath '" + resourcesPath + + "/Struct.csv' into table table1 options('delimiter'=','," + + "'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," + + "'complex_delimiter_level_2'='&')") +sql( + "load data inpath '" + resourcesPath + + "/Struct.csv' into table table1 options('delimiter'=','," + + "'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," + + "'complex_delimiter_level_2'='&')") +sql( + "load data inpath '" + resourcesPath + + "/Struct.csv' into table table1 options('delimiter'=','," + + "'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," + + "'complex_delimiter_level_2'='&')") +checkExistence(sql("show segments for table table1"),false, "Compacted") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.ENABLE_AUTO_LOAD_MERGE, "false") --- End diff -- add this in afterAll too ---
[GitHub] carbondata issue #2511: [CARBONDATA-2745] Added atomic file operations for S...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2511 @gvramana Build passed. Please review ---
[GitHub] carbondata pull request #2535: [CARBONDATA-2606]Fix Complex array Pushdown
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2535#discussion_r204296432 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestComplexDataType.scala --- @@ -885,4 +885,36 @@ class TestComplexDataType extends QueryTest with BeforeAndAfterAll { checkExistence(sql("select * from table1"),true,"1.0E9") } + test("test block compaction - auto merge") { +sql("DROP TABLE IF EXISTS table1") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.ENABLE_AUTO_LOAD_MERGE, "true") +sql( + "create table table1 (roll int,person Struct) stored " + + "by 'carbondata'") +sql( + "load data inpath '" + resourcesPath + + "/Struct.csv' into table table1 options('delimiter'=','," + + "'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," + + "'complex_delimiter_level_2'='&')") +sql( + "load data inpath '" + resourcesPath + + "/Struct.csv' into table table1 options('delimiter'=','," + + "'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," + + "'complex_delimiter_level_2'='&')") +sql( + "load data inpath '" + resourcesPath + + "/Struct.csv' into table table1 options('delimiter'=','," + + "'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," + + "'complex_delimiter_level_2'='&')") +sql( + "load data inpath '" + resourcesPath + + "/Struct.csv' into table table1 options('delimiter'=','," + + "'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," + + "'complex_delimiter_level_2'='&')") +checkAnswer(sql("select count(*) from table1"),Seq(Row(40))) --- End diff -- check for segments whether compaction has happened or not ---
[GitHub] carbondata pull request #2535: [CARBONDATA-2606]Fix Complex array Pushdown
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2535#discussion_r204296035 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala --- @@ -578,13 +578,19 @@ object CarbonDataRDDFactory { if (carbonTable.isHivePartitionTable) { carbonLoadModel.setFactTimeStamp(System.currentTimeMillis()) } -val compactedSegments = new util.ArrayList[String]() -handleSegmentMerging(sqlContext, - carbonLoadModel, - carbonTable, - compactedSegments, - operationContext) -carbonLoadModel.setMergedSegmentIds(compactedSegments) +// Block compaction for table containing complex datatype +if (carbonTable.getTableInfo.getFactTable.getListOfColumns.asScala + .exists(m => m.getDataType.isComplexType)) { + LOGGER.info("Compaction is skipped as table contains complex columns") --- End diff -- change to warn ---
[GitHub] carbondata pull request #2535: [CARBONDATA-2606]Fix Complex array Pushdown
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2535#discussion_r204296013 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala --- @@ -823,15 +824,21 @@ case class CarbonLoadDataCommand( } try { carbonLoadModel.setFactTimeStamp(System.currentTimeMillis()) - val compactedSegments = new util.ArrayList[String]() - // Trigger auto compaction - CarbonDataRDDFactory.handleSegmentMerging( -sparkSession.sqlContext, -carbonLoadModel, -table, -compactedSegments, -operationContext) - carbonLoadModel.setMergedSegmentIds(compactedSegments) + // Block compaction for table containing complex datatype + if (table.getTableInfo.getFactTable.getListOfColumns.asScala +.exists(m => m.getDataType.isComplexType)) { +LOGGER.info("Compaction is skipped as table contains complex columns") --- End diff -- change to warn ---
[GitHub] carbondata issue #2511: [CARBONDATA-2745] Added atomic file operations for S...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2511 retest this please ---
[GitHub] carbondata pull request #2536: [CARBONDATA-2766] Added null check on filesta...
Github user kunal642 closed the pull request at: https://github.com/apache/carbondata/pull/2536 ---
[GitHub] carbondata issue #2536: [CARBONDATA-2766] Added null check on filestatus
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2536 @xuchuanyin Closing this PR. Refer #2465 ---
[GitHub] carbondata issue #2484: [HOTFIX] added hadoop conf to thread local
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2484 retest this please ---
[GitHub] carbondata pull request #2536: [CARBONDATA-2766] Added null check on filesta...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2536 [CARBONDATA-2766] Added null check on filestatus **Problem:** While doing any operation on Carbon File if file status acquiring throws exception then it is logged and not thrown again therefore if any further operation on that object required filestatus then it can throw NPE. **Solution:** Add null check and throw proper exception. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata fix_npe Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2536.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2536 commit 060dc7fc461d9f2eec6603b45651d76588dbb9bc Author: kunal642 Date: 2018-07-22T11:39:08Z added null check on filestatus ---
[GitHub] carbondata pull request #2525: [CARBONDATA-2756] refactored code to use ZSTD...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2525 [CARBONDATA-2756] refactored code to use ZSTD compression using Reflection 1. refactored code to use ZSTD compression using Reflection 2. add license Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata zstd_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2525.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2525 commit 1cada8b2b23f846d31ab80c1592346804583d17b Author: kunal642 Date: 2018-07-19T06:47:04Z refactored code to use ZSTD compression using Reflection ---
[GitHub] carbondata issue #2484: [HOTFIX] added hadoop conf to thread local
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2484 retest this please ---
[GitHub] carbondata pull request #2484: [HOTFIX] added hadoop conf to thread local
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2484#discussion_r203263551 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java --- @@ -43,11 +45,19 @@ */ private static final LogService LOGGER = LogServiceFactory.getLogService(FileFactory.class.getName()); - private static Configuration configuration = null; - static { -configuration = new Configuration(); -configuration.addResource(new Path("../core-default.xml")); + public static Configuration getConfiguration() { +Configuration configuration; +if (ThreadLocalSessionInfo.getCarbonSessionInfo() == null) { + configuration = new Configuration(); + configuration.addResource(new Path("../core-default.xml")); +} else { + CarbonConfiguration carbonConfiguration = + (CarbonConfiguration) ThreadLocalSessionInfo.getCarbonSessionInfo().getThreadParams() --- End diff -- already handled If carbonConf is not present taking new CarbonConfiguration as default value ---
[GitHub] carbondata pull request #2484: [HOTFIX] added hadoop conf to thread local
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2484#discussion_r203263435 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonRDD.scala --- @@ -79,14 +66,11 @@ abstract class CarbonRDD[T: ClassTag](@transient sc: SparkContext, } def getConf: Configuration = { -val configuration = new Configuration(false) -val bai = new ByteArrayInputStream(CompressorFactory.getInstance().getCompressor - .unCompressByte(confBytes)) -val ois = new ObjectInputStream(bai) -configuration.readFields(ois) -ois.close() -configuration +val carbonConfiguration = carbonSessionInfo.getThreadParams.getExtraInfo("carbonConf") --- End diff -- done ---
[GitHub] carbondata issue #2502: [CARBONDATA-2738]Update documentation for Complex da...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2502 LGTM ---
[GitHub] carbondata issue #2501: [CARBONDATA-2738]Block Preaggregate, Compaction, Dic...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2501 LGTM ---
[GitHub] carbondata pull request #2484: [HOTFIX] added hadoop conf to thread local
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2484#discussion_r203071359 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonConfiguration.java --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.util; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.Serializable; + +import org.apache.carbondata.core.datastore.compression.CompressorFactory; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; + +public class CarbonConfiguration implements Serializable { + + private static final long serialVersionUID = 3811544264223154007L; + private transient Configuration configuration; + private byte[] confBytes; + + public CarbonConfiguration(Configuration configuration) { +ByteArrayOutputStream bao = new ByteArrayOutputStream(); +try { + ObjectOutputStream oos = new ObjectOutputStream(bao); + configuration.write(oos); + oos.close(); + this.confBytes = + CompressorFactory.getInstance().getCompressor().compressByte(bao.toByteArray()); +} catch (IOException e) { + throw new RuntimeException(e); +} + } + + public CarbonConfiguration() { +this.configuration = new Configuration(); +configuration.addResource(new Path("../core-default.xml")); + } + + public Configuration getConfiguration() { --- End diff -- Once CarbonConfiguration is set into ThreadLocal only FileFactory and CarbonRDD will try to deserialize the configuration. This will be done once per instance of CarbonConfiguration. If configuration is already deserialized we will return the same object. ---
[GitHub] carbondata issue #2448: [HotFix] Getting carbon table identifier to datamap ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2448 LGTM ---
[GitHub] carbondata pull request #2501: [CARBONDATA-2738]Block Preaggregate, Compacti...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2501#discussion_r202903937 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala --- @@ -59,6 +60,13 @@ private[sql] case class CarbonProjectForUpdateCommand( return Seq.empty } val carbonTable = CarbonEnv.getCarbonTable(databaseNameOp, tableName)(sparkSession) +columns.foreach { col => + val dataType = carbonTable.getColumnByName(tableName, col).getColumnSchema.getDataType + if (dataType.isComplexType) { +throw new MalformedCarbonCommandException("Unsupported operation on Complex data type") --- End diff -- throw UnsupportedOperationException ---
[GitHub] carbondata pull request #2501: [CARBONDATA-2738]Block Preaggregate, Compacti...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2501#discussion_r202903870 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAlterTableCompactionCommand.scala --- @@ -82,6 +82,12 @@ case class CarbonAlterTableCompactionCommand( throw new MalformedCarbonCommandException("Unsupported operation on non transactional table") } +if (table.getTableInfo.getFactTable.getListOfColumns.asScala + .exists(m => m.getDataType.isComplexType)) { + throw new MalformedCarbonCommandException( --- End diff -- throw UnsupportedOperationException ---
[GitHub] carbondata pull request #2501: [CARBONDATA-2738]Block Preaggregate, Compacti...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2501#discussion_r202904039 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala --- @@ -354,7 +354,13 @@ object PreAggregateUtil { !expression.isInstanceOf[AttributeReference]) { newColumnName } else { - expression.asInstanceOf[AttributeReference].name + if (expression.isInstanceOf[GetStructField] || expression.isInstanceOf[GetArrayItem]) { +throw new MalformedCarbonCommandException( --- End diff -- throw UnsupportedOperationException ---
[GitHub] carbondata pull request #2501: [CARBONDATA-2738]Block Preaggregate, Compacti...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2501#discussion_r202904751 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestComplexDataType.scala --- @@ -712,5 +713,166 @@ class TestComplexDataType extends QueryTest with BeforeAndAfterAll { checkAnswer(sql("select a.b,id,a.c,person.detail[0],d.e,d.f,person.detail[1],id from test"),Seq(Row(2,1,3,5,3,2,6,1))) checkAnswer(sql("select a.b,id,a.c,person.detail[0],d.e,d.f,person.detail[1],id,1,a.b from test"),Seq(Row(2,1,3,5,3,2,6,1,1,2))) } - + + test("test block Update for complex datatype") { +sql("DROP TABLE IF EXISTS test") +sql("create table test(id int,a struct,d array) stored by 'carbondata'") +sql("insert into test values(1,'2$3',4)") +val structException = intercept[MalformedCarbonCommandException]( +sql("update test set(a.b)=(4) where id=1").show(false)) +assertResult("Unsupported operation on Complex data type")(structException.getMessage) +val arrayException = intercept[MalformedCarbonCommandException]( +sql("update test set(a)=(4) where id=1").show(false)) +assertResult("Unsupported operation on Complex data type")(arrayException.getMessage) + } + + test("test block partition column") { +sql("DROP TABLE IF EXISTS list_table_area_origin") +val arrayException = intercept[AnalysisException]( +sql(""" + | CREATE TABLE IF NOT EXISTS list_table_area_origin + | ( + | id Int, + | vin string, + | logdate Timestamp, + | phonenumber Long, + | country array, + | salary Int + | ) + | PARTITIONED BY (area array) + | STORED BY 'carbondata' +""".stripMargin)) +assertResult("Cannot use array for partition column;")(arrayException.getMessage) +sql("DROP TABLE IF EXISTS list_table_area_origin") +val structException = intercept[AnalysisException]( + sql(""" +| CREATE TABLE IF NOT EXISTS list_table_area_origin +| ( +| id Int, +| vin string, +| logdate Timestamp, +| phonenumber Long, +| country array, +| salary Int +| ) +| PARTITIONED BY (area struct) +| STORED BY 'carbondata' + """.stripMargin) +) +assertResult("Cannot use struct for partition column;")(structException.getMessage) + } + + test("test block preaggregate") { +sql("DROP TABLE IF EXISTS test") +sql("create table test(id int,a struct) stored by 'carbondata'") +sql("insert into test values (1,2)") +sql("insert into test values (1,2)") +sql("insert into test values (1,2)") +val structException = intercept[MalformedCarbonCommandException]( + sql("create datamap preagg_sum on table test using 'preaggregate' as select id,sum(a.b) from test group by id")) +assertResult("Preaggregate is unsupported for ComplexData type column: a.b")(structException.getMessage) +sql("DROP TABLE IF EXISTS test") +sql("create table test(id int,a array) stored by 'carbondata'") +sql("insert into test values (1,2)") +val arrayException = intercept[MalformedCarbonCommandException]( + sql("create datamap preagg_sum on table test using 'preaggregate' as select id,sum(a[0]) from test group by id")) +assertResult("Preaggregate is unsupported for ComplexData type column: a[0]")(arrayException.getMessage) + } + + test("test block dictionary exclude for child column") { +sql("DROP TABLE IF EXISTS table1") +sql( + "create table table1 (roll int,a struct,j:int>) stored " + + "by " + + "'carbondata' tblproperties('dictionary_exclude'='a')") +sql("insert into table1 values(1,'1$abc$2$efg$3:mno:4$5')") +checkAnswer(sql("select a.b from table1"), Seq(Row(1))) +sql("DROP TABLE IF EXISTS table1") +val structException = intercept[MalformedCarbonCommandException]( +sql( + "create table table1 (roll int,a struct,j:int>) stored " + + "by " + + "'carbondata' tblproperties(
[GitHub] carbondata pull request #2501: [CARBONDATA-2738]Block Preaggregate, Compacti...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2501#discussion_r202904600 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestComplexDataType.scala --- @@ -712,5 +713,166 @@ class TestComplexDataType extends QueryTest with BeforeAndAfterAll { checkAnswer(sql("select a.b,id,a.c,person.detail[0],d.e,d.f,person.detail[1],id from test"),Seq(Row(2,1,3,5,3,2,6,1))) checkAnswer(sql("select a.b,id,a.c,person.detail[0],d.e,d.f,person.detail[1],id,1,a.b from test"),Seq(Row(2,1,3,5,3,2,6,1,1,2))) } - + + test("test block Update for complex datatype") { +sql("DROP TABLE IF EXISTS test") +sql("create table test(id int,a struct,d array) stored by 'carbondata'") +sql("insert into test values(1,'2$3',4)") +val structException = intercept[MalformedCarbonCommandException]( +sql("update test set(a.b)=(4) where id=1").show(false)) +assertResult("Unsupported operation on Complex data type")(structException.getMessage) +val arrayException = intercept[MalformedCarbonCommandException]( +sql("update test set(a)=(4) where id=1").show(false)) +assertResult("Unsupported operation on Complex data type")(arrayException.getMessage) + } + + test("test block partition column") { +sql("DROP TABLE IF EXISTS list_table_area_origin") +val arrayException = intercept[AnalysisException]( +sql(""" + | CREATE TABLE IF NOT EXISTS list_table_area_origin + | ( + | id Int, + | vin string, + | logdate Timestamp, + | phonenumber Long, + | country array, + | salary Int + | ) + | PARTITIONED BY (area array) + | STORED BY 'carbondata' +""".stripMargin)) +assertResult("Cannot use array for partition column;")(arrayException.getMessage) +sql("DROP TABLE IF EXISTS list_table_area_origin") +val structException = intercept[AnalysisException]( + sql(""" +| CREATE TABLE IF NOT EXISTS list_table_area_origin +| ( +| id Int, +| vin string, +| logdate Timestamp, +| phonenumber Long, +| country array, +| salary Int +| ) +| PARTITIONED BY (area struct) +| STORED BY 'carbondata' + """.stripMargin) +) +assertResult("Cannot use struct for partition column;")(structException.getMessage) + } + + test("test block preaggregate") { +sql("DROP TABLE IF EXISTS test") +sql("create table test(id int,a struct) stored by 'carbondata'") +sql("insert into test values (1,2)") +sql("insert into test values (1,2)") +sql("insert into test values (1,2)") +val structException = intercept[MalformedCarbonCommandException]( + sql("create datamap preagg_sum on table test using 'preaggregate' as select id,sum(a.b) from test group by id")) +assertResult("Preaggregate is unsupported for ComplexData type column: a.b")(structException.getMessage) +sql("DROP TABLE IF EXISTS test") +sql("create table test(id int,a array) stored by 'carbondata'") +sql("insert into test values (1,2)") +val arrayException = intercept[MalformedCarbonCommandException]( + sql("create datamap preagg_sum on table test using 'preaggregate' as select id,sum(a[0]) from test group by id")) +assertResult("Preaggregate is unsupported for ComplexData type column: a[0]")(arrayException.getMessage) + } + + test("test block dictionary exclude for child column") { +sql("DROP TABLE IF EXISTS table1") +sql( + "create table table1 (roll int,a struct,j:int>) stored " + + "by " + + "'carbondata' tblproperties('dictionary_exclude'='a')") +sql("insert into table1 values(1,'1$abc$2$efg$3:mno:4$5')") +checkAnswer(sql("select a.b from table1"), Seq(Row(1))) +sql("DROP TABLE IF EXISTS table1") +val structException = intercept[MalformedCarbonCommandException]( +sql( + "create table table1 (roll int,a struct,j:int>) stored " + + "by " + + "'carbondata' tblproperties('dictionary_exclude'='a.b')")) +assertResult( + "DICTIONARY_EXCLUDE column: a.b does not exist in table or unsupported for complex child " + + "column. Please check create table statement.")( + structException.getMessage) +sql("DROP TABLE IF EXISTS table1") --- End diff -- drop will not be executred if assertion fails. Please make sure all tables are deleted in afterAll also. ---
[GitHub] carbondata pull request #2511: [CARBONDATA-2745] Added atomic file operation...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2511 [CARBONDATA-2745] Added atomic file operations for S3 Problem: AtomicFileOperationImpl creates a temporary file and then renames the file to actual file name. This is risky in S3 storage as the file has to be deleted and then recreated. Solution: Create a seperate implementaion for s3 and while writing write with the same name with overwrite mode. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata s3atomicfileoperations Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2511.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2511 commit 8e84566cc5bd4a572ec31ad4a017ac7128de6998 Author: kunal642 Date: 2018-07-16T08:58:00Z Added atomic file operations for S3 ---
[GitHub] carbondata pull request #2484: [WIP] added hadoop conf to thread local
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2484#discussion_r202586005 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java --- @@ -43,11 +45,19 @@ */ private static final LogService LOGGER = LogServiceFactory.getLogService(FileFactory.class.getName()); - private static Configuration configuration = null; - static { -configuration = new Configuration(); + public static Configuration getConfiguration() { +Configuration configuration; +if (ThreadLocalSessionInfo.getCarbonSessionInfo() == null) { + configuration = new Configuration(); +} else { + CarbonConfiguration carbonConfiguration = + (CarbonConfiguration) ThreadLocalSessionInfo.getCarbonSessionInfo().getThreadParams() + .getExtraInfo("carbonConf", new CarbonConfiguration()); + configuration = carbonConfiguration.getConfiguration(); +} configuration.addResource(new Path("../core-default.xml")); --- End diff -- done ---
[GitHub] carbondata pull request #2484: [WIP] added hadoop conf to thread local
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2484#discussion_r202585892 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java --- @@ -43,11 +45,19 @@ */ private static final LogService LOGGER = LogServiceFactory.getLogService(FileFactory.class.getName()); - private static Configuration configuration = null; - static { -configuration = new Configuration(); + public static Configuration getConfiguration() { --- End diff -- removed config parameters from FileFactory ---
[GitHub] carbondata pull request #2484: [WIP] added hadoop conf to thread local
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2484#discussion_r202585994 --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonConfiguration.java --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.util; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.Serializable; + +import org.apache.carbondata.core.datastore.compression.CompressorFactory; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; + +public class CarbonConfiguration implements Serializable { + + private static final long serialVersionUID = 3811544264223154007L; + private transient Configuration configuration; + private byte[] confBytes; + + public CarbonConfiguration(Configuration configuration) { +ByteArrayOutputStream bao = new ByteArrayOutputStream(); +try { + ObjectOutputStream oos = new ObjectOutputStream(bao); + configuration.write(oos); + oos.close(); + this.confBytes = + CompressorFactory.getInstance().getCompressor().compressByte(bao.toByteArray()); +} catch (IOException e) { + throw new RuntimeException(e); +} + } + + public CarbonConfiguration() { +this.configuration = new Configuration(); + } + + public Configuration getConfiguration() { +if (configuration == null) { + if (confBytes == null) { +throw new RuntimeException("Configuration not specified"); + } + configuration = new Configuration(false); + ByteArrayInputStream bias = new ByteArrayInputStream( + CompressorFactory.getInstance().getCompressor().unCompressByte(confBytes)); + try { +ObjectInputStream ois = new ObjectInputStream(bias); +configuration.readFields(ois); +ois.close(); + } catch (IOException e) { +throw new RuntimeException(e); + } +} +configuration.addResource(new Path("../core-default.xml")); --- End diff -- done ---
[GitHub] carbondata pull request #2484: [WIP] added hadoop conf to thread local
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2484#discussion_r202585871 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/rdd/UpdateCoalescedRDD.scala --- @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.rdd + +import scala.reflect.ClassTag + +import org.apache.spark._ + +import org.apache.carbondata.spark.rdd.CarbonRDD + + +// This RDD distributes previous RDD data based on number of nodes. i.e., one partition for one node + +class UpdateCoalescedRDD[T: ClassTag]( --- End diff -- removed ---
[GitHub] carbondata issue #2489: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2489 LGTM ---
[GitHub] carbondata issue #2489: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2489 LGTM ---
[GitHub] carbondata pull request #2489: [CARBONDATA-2606][Complex DataType Enhancemen...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2489#discussion_r202076454 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java --- @@ -140,6 +140,13 @@ public DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) { continue; } fillMeasureData(scannedResult, row); + if (scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType")) { --- End diff -- put a comment explaining the logic. ---
[GitHub] carbondata pull request #2484: [WIP] added hadoop conf to thread local
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2484 [WIP] added hadoop conf to thread local Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata hadoop_conf_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2484.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2484 commit 89ff67e601ea842db2693c269f783d7ddc701c02 Author: kunal642 Date: 2018-07-10T06:38:41Z added hadoop conf to thread local ---
[GitHub] carbondata pull request #2472: [CARBONDATA-2717] fixed table id empty proble...
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2472 [CARBONDATA-2717] fixed table id empty problem while taking drop lock Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata table_identfifier_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2472.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2472 commit 25ca99908cd22b46cf295867d61c1f2c18c09249 Author: kunal642 Date: 2018-07-10T09:08:05Z fixed table id empty problem while taking drop lock ---
[GitHub] carbondata issue #2451: [CARBONDATA-2585][CARBONDATA-2586]Fix local dictiona...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2451 LGTM ---
[GitHub] carbondata issue #2447: [CARBONDATA-2589][CARBONDATA-2590][CARBONDATA-2602]L...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2447 LGTM ---
[GitHub] carbondata issue #2465: [WIP] Refactored CarbonFile interface
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2465 retest sdv please ---
[GitHub] carbondata pull request #2465: [WIP] Refactored CarbonFile interface
GitHub user kunal642 opened a pull request: https://github.com/apache/carbondata/pull/2465 [WIP] Refactored CarbonFile interface Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kunal642/carbondata fix_carbonfile_impl Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2465.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2465 commit 6c957c21357648b3b0754cda54525f8ab433c7a6 Author: kunal642 Date: 2018-07-09T07:16:54Z refactored CarbonFile interface ---
[GitHub] carbondata pull request #2451: [CARBONDATA-2585][CARBONDATA-2586]Fix local d...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2451#discussion_r200894716 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala --- @@ -136,14 +136,33 @@ case class PreAggregateTableHelper( parentTable.getTableInfo.getFactTable.getTableProperties.asScala .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_THRESHOLD, CarbonCommonConstants.LOCAL_DICTIONARY_THRESHOLD_DEFAULT)) -tableProperties - .put(CarbonCommonConstants.LOCAL_DICTIONARY_INCLUDE, -parentTable.getTableInfo.getFactTable.getTableProperties.asScala - .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_INCLUDE, "")) -tableProperties - .put(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE, -parentTable.getTableInfo.getFactTable.getTableProperties.asScala - .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE, "")) +val parentDictInclude = parentTable.getTableInfo.getFactTable.getTableProperties.asScala + .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_INCLUDE, "").split(",") + +val parentDictExclude = parentTable.getTableInfo.getFactTable.getTableProperties.asScala + .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE, "").split(",") + +var newDictInclude = Seq[String]() +var newDictExclude = Seq[String]() +parentDictInclude.foreach(parentcol => + fields.filter(col => fieldRelationMap(col).aggregateFunction.isEmpty && + parentcol.equals(fieldRelationMap(col). + columnTableRelationList.get.head.parentColumnName)) +.map(cols => newDictInclude :+= cols.column)) +parentDictExclude.foreach(parentcol => + fields.filter(col => fieldRelationMap(col).aggregateFunction.isEmpty && + parentcol.equals(fieldRelationMap(col). + columnTableRelationList.get.head.parentColumnName)) +.map(cols => newDictExclude :+= cols.column)) --- End diff -- change .filter(..).map(..) to .collect ---
[GitHub] carbondata pull request #2451: [CARBONDATA-2585][CARBONDATA-2586]Fix local d...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2451#discussion_r200894679 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala --- @@ -136,14 +136,33 @@ case class PreAggregateTableHelper( parentTable.getTableInfo.getFactTable.getTableProperties.asScala .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_THRESHOLD, CarbonCommonConstants.LOCAL_DICTIONARY_THRESHOLD_DEFAULT)) -tableProperties - .put(CarbonCommonConstants.LOCAL_DICTIONARY_INCLUDE, -parentTable.getTableInfo.getFactTable.getTableProperties.asScala - .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_INCLUDE, "")) -tableProperties - .put(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE, -parentTable.getTableInfo.getFactTable.getTableProperties.asScala - .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE, "")) +val parentDictInclude = parentTable.getTableInfo.getFactTable.getTableProperties.asScala + .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_INCLUDE, "").split(",") + +val parentDictExclude = parentTable.getTableInfo.getFactTable.getTableProperties.asScala + .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE, "").split(",") + +var newDictInclude = Seq[String]() +var newDictExclude = Seq[String]() +parentDictInclude.foreach(parentcol => + fields.filter(col => fieldRelationMap(col).aggregateFunction.isEmpty && --- End diff -- change .filter(..).map(..) to .collect ---
[GitHub] carbondata issue #2450: [CARBONDATA-2689] Added validations for complex colu...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2450 LGTM ---
[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...
Github user kunal642 closed the pull request at: https://github.com/apache/carbondata/pull/2406 ---
[GitHub] carbondata pull request #2450: [CARBONDATA-2689] Added validations for compl...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2450#discussion_r200610555 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/localdictionary/LocalDictionarySupportAlterTableTest.scala --- @@ -822,17 +821,17 @@ class LocalDictionarySupportAlterTableTest extends QueryTest with BeforeAndAfter sql( """ | CREATE TABLE local1(id int, name string, city string, age int,add string) -| STORED BY 'carbondata' tblproperties('local_dictionary_include'='city') +| STORED BY 'carbondata' tblproperties('local_dictionary_exclude'='city') --- End diff -- change local_dictionary_exclude to local_dictionary_include as the name suggests ---
[GitHub] carbondata issue #2422: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2422 LGTM ---
[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2406#discussion_r200554082 --- Diff: processing/src/test/java/org/apache/carbondata/lcm/locks/LocalFileLockTest.java --- @@ -68,4 +80,18 @@ Assert.assertTrue(localLock2.unlock()); } + @Test public void testConfigurablePathForLock() throws Exception { +Field f = secretClass.getDeclaredField("lockPath"); +f.setAccessible(true); +f.set(secretClass, rootPath + "/target/"); +AbsoluteTableIdentifier absoluteTableIdentifier = AbsoluteTableIdentifier + .from(CarbonProperties.getInstance().getProperty("carbon.storelocation"), "databaseName", +"tableName", "1"); --- End diff -- done ---
[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2406#discussion_r200554062 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala --- @@ -386,7 +386,8 @@ class CarbonFileMetastore extends CarbonMetaStore { val schemaMetadataPath = CarbonTablePath.getFolderContainingFile(schemaFilePath) val fileType = FileFactory.getFileType(schemaMetadataPath) if (!FileFactory.isFileExist(schemaMetadataPath, fileType)) { - val isDirCreated = FileFactory.mkdirs(schemaMetadataPath, fileType) + val isDirCreated = FileFactory --- End diff -- removed ---
[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2406#discussion_r200554022 --- Diff: core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java --- @@ -105,4 +120,10 @@ private static void getLockTypeConfigured() { LOGGER.info("Configured lock type is: " + lockTypeConfigured); } + private static void getLockpath() { +lockPath = CarbonProperties.getInstance() +.getProperty(CarbonCommonConstants.LOCK_PATH, "") +.toUpperCase(); --- End diff -- done ---
[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2406#discussion_r200554007 --- Diff: core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java --- @@ -52,44 +55,56 @@ */ public static ICarbonLock getCarbonLockObj(AbsoluteTableIdentifier absoluteTableIdentifier, String lockFile) { - -String tablePath = absoluteTableIdentifier.getTablePath(); +String tablePath; +if (lockPath.isEmpty()) { + tablePath = absoluteTableIdentifier.getTablePath(); +} else { + if (absoluteTableIdentifier + .getCarbonTableIdentifier().getTableId().isEmpty()) { +throw new RuntimeException("Table id is empty"); + } + tablePath = lockPath + CarbonCommonConstants.FILE_SEPARATOR + absoluteTableIdentifier + .getCarbonTableIdentifier().getTableId(); +} if (lockTypeConfigured.equals(CarbonCommonConstants.CARBON_LOCK_TYPE_ZOOKEEPER)) { - return new ZooKeeperLocking(absoluteTableIdentifier, lockFile); -} else if (tablePath.startsWith(CarbonCommonConstants.S3A_PREFIX) || -tablePath.startsWith(CarbonCommonConstants.S3N_PREFIX) || -tablePath.startsWith(CarbonCommonConstants.S3_PREFIX)) { + return new ZooKeeperLocking(tablePath, lockFile); +} else if (tablePath.startsWith(CarbonCommonConstants.S3A_PREFIX) || tablePath +.startsWith(CarbonCommonConstants.S3N_PREFIX) || tablePath +.startsWith(CarbonCommonConstants.S3_PREFIX)) { lockTypeConfigured = CarbonCommonConstants.CARBON_LOCK_TYPE_S3; - return new S3FileLock(absoluteTableIdentifier, lockFile); + return new S3FileLock(tablePath, --- End diff -- currently lock type is only used for Zookeeper locking other directly create the desired lock based on the path URI. ---
[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2406#discussion_r200553901 --- Diff: core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java --- @@ -52,44 +55,56 @@ */ public static ICarbonLock getCarbonLockObj(AbsoluteTableIdentifier absoluteTableIdentifier, String lockFile) { - -String tablePath = absoluteTableIdentifier.getTablePath(); +String tablePath; +if (lockPath.isEmpty()) { + tablePath = absoluteTableIdentifier.getTablePath(); +} else { + if (absoluteTableIdentifier + .getCarbonTableIdentifier().getTableId().isEmpty()) { +throw new RuntimeException("Table id is empty"); + } + tablePath = lockPath + CarbonCommonConstants.FILE_SEPARATOR + absoluteTableIdentifier + .getCarbonTableIdentifier().getTableId(); +} if (lockTypeConfigured.equals(CarbonCommonConstants.CARBON_LOCK_TYPE_ZOOKEEPER)) { - return new ZooKeeperLocking(absoluteTableIdentifier, lockFile); -} else if (tablePath.startsWith(CarbonCommonConstants.S3A_PREFIX) || -tablePath.startsWith(CarbonCommonConstants.S3N_PREFIX) || -tablePath.startsWith(CarbonCommonConstants.S3_PREFIX)) { + return new ZooKeeperLocking(tablePath, lockFile); +} else if (tablePath.startsWith(CarbonCommonConstants.S3A_PREFIX) || tablePath +.startsWith(CarbonCommonConstants.S3N_PREFIX) || tablePath +.startsWith(CarbonCommonConstants.S3_PREFIX)) { lockTypeConfigured = CarbonCommonConstants.CARBON_LOCK_TYPE_S3; - return new S3FileLock(absoluteTableIdentifier, lockFile); + return new S3FileLock(tablePath, --- End diff -- I think it is better to create s3 file lock as default for S3 files. We cannot be sure if HDFS is present or not ---
[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2406#discussion_r200553686 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/S3CarbonFile.java --- @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.datastore.filesystem; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.util.CarbonUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.LocatedFileStatus; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.RemoteIterator; + +public class S3CarbonFile extends HDFSCarbonFile { + + private static final LogService LOGGER = + LogServiceFactory.getLogService(HDFSCarbonFile.class.getName()); + + public S3CarbonFile(String filePath) { +super(filePath); + } + + public S3CarbonFile(String filePath, Configuration hadoopConf) { +super(filePath, hadoopConf); + } + + public S3CarbonFile(Path path) { +super(path); + } + + public S3CarbonFile(Path path, Configuration hadoopConf) { +super(path, hadoopConf); + } + + public S3CarbonFile(FileStatus fileStatus) { +super(fileStatus); + } + + @Override + public boolean renameForce(String changetoName) { +FileSystem fs; +try { + fs = fileStatus.getPath().getFileSystem(hadoopConf); + return fs.rename(fileStatus.getPath(), new Path(changetoName)); +} catch (IOException e) { + LOGGER.error("Exception occured: " + e.getMessage()); + return false; +} + } + + @Override + public DataOutputStream getDataOutputStreamUsingAppend(String path, FileFactory.FileType fileType) + throws IOException { +return getDataOutputStream(path, fileType, CarbonCommonConstants.BYTEBUFFER_SIZE, true); + } + + @Override public DataOutputStream getDataOutputStream(String path, FileFactory.FileType fileType, + int bufferSize, boolean append) throws IOException { +Path pt = new Path(path); +FileSystem fileSystem = pt.getFileSystem(FileFactory.getConfiguration()); +FSDataOutputStream stream; +if (append) { --- End diff -- HDFS client only takes care of append mode in case of DistributedFileSystem for other it will create with overwrite. Therefore we need to read the file and keep the contents in memory for overwriting the file. ---
[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...
Github user kunal642 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2406#discussion_r200553521 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/S3CarbonFile.java --- @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.datastore.filesystem; + +import java.io.DataInputStream; +import java.io.DataOutputStream; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.impl.FileFactory; +import org.apache.carbondata.core.util.CarbonUtil; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.LocatedFileStatus; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.RemoteIterator; + +public class S3CarbonFile extends HDFSCarbonFile { + + private static final LogService LOGGER = + LogServiceFactory.getLogService(HDFSCarbonFile.class.getName()); + + public S3CarbonFile(String filePath) { +super(filePath); + } + + public S3CarbonFile(String filePath, Configuration hadoopConf) { +super(filePath, hadoopConf); + } + + public S3CarbonFile(Path path) { +super(path); + } + + public S3CarbonFile(Path path, Configuration hadoopConf) { +super(path, hadoopConf); + } + + public S3CarbonFile(FileStatus fileStatus) { +super(fileStatus); + } + + @Override + public boolean renameForce(String changetoName) { +FileSystem fs; --- End diff -- This is done because majority of our existing code uses rename force. To handle we will delete the destination directory and then recreate with new name. ---