[GitHub] carbondata pull request #2623: [CARBONDATA-2844] Pass SK/AK to executor by s...

2018-08-26 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2623#discussion_r212869553
  
--- Diff: 
integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataPageSourceProvider.java
 ---
@@ -79,6 +80,7 @@
   @Override
   public ConnectorPageSource createPageSource(ConnectorTransactionHandle 
transactionHandle,
   ConnectorSession session, ConnectorSplit split, List 
columns) {
+ThreadLocalSessionInfo.getOrCreateCarbonSessionInfo();
--- End diff --

removed


---


[GitHub] carbondata issue #2623: [CARBONDATA-2844] Pass SK/AK to executor by serializ...

2018-08-26 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2623
  
@ravipesala fixed the comments


---


[GitHub] carbondata issue #2623: [CARBONDATA-2844] add sk ak to file factory on creat...

2018-08-26 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2623
  
@ravipesala Please review.


---


[GitHub] carbondata pull request #2655: [WIP][TEST] sk ak test

2018-08-23 Thread kunal642
Github user kunal642 closed the pull request at:

https://github.com/apache/carbondata/pull/2655


---


[GitHub] carbondata pull request #2655: [WIP][TEST] sk ak test

2018-08-23 Thread kunal642
GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2655

[WIP][TEST] sk ak test

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata sk_ak_test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2655.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2655


commit 1f95301cc74e042bdf71564d687b604646c16ce3
Author: kunal642 
Date:   2018-08-08T16:20:44Z

add sk ak to ThreadLocal in driver and executor.

commit 629fac6bc7f9e81ab91f8b300416ffeae3a6faa9
Author: kunal642 
Date:   2018-08-23T18:54:05Z

test PR




---


[GitHub] carbondata issue #2643: [Documentation] Formatting fix s3

2018-08-23 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2643
  
LGTM


---


[GitHub] carbondata pull request #2381: [WIP] fixed SparkCarbonFileFormat for flat fo...

2018-08-22 Thread kunal642
Github user kunal642 closed the pull request at:

https://github.com/apache/carbondata/pull/2381


---


[GitHub] carbondata pull request #2484: [WIP] added hadoop conf to thread local

2018-08-17 Thread kunal642
Github user kunal642 closed the pull request at:

https://github.com/apache/carbondata/pull/2484


---


[GitHub] carbondata pull request #2623: [CARBONDATA-2844] add sk ak to file factory o...

2018-08-09 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2623#discussion_r208935162
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonConfiguration.java ---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.util;
+
+import java.io.Serializable;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+
+public class CarbonConfiguration implements Serializable {
+
+  private static final long serialVersionUID = 3811544264223154007L;
+  private transient Configuration configuration;
+  private HashMap extraConf;
+
+  public CarbonConfiguration(Configuration configuration) {
+this.configuration = configuration;
+HashMap s3Conf = new HashMap();
+s3Conf.put("fs.s3a.access.key", 
configuration.get("fs.s3a.access.key"));
+s3Conf.put("fs.s3a.secret.key", 
configuration.get("fs.s3a.secret.key"));
--- End diff --

done


---


[GitHub] carbondata issue #2623: [CARBONDATA-2844] add sk ak to file factory on creat...

2018-08-09 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2623
  
retest this please


---


[GitHub] carbondata issue #2623: [HOTFIX] add sk ak to file factory on creation of ca...

2018-08-09 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2623
  
Retest this please


---


[GitHub] carbondata pull request #2623: [HOTFIX] add sk ak to file factory on creatio...

2018-08-08 Thread kunal642
GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2623

[HOTFIX] add sk ak to file factory on creation of carbon env

add SK AK to file factory on creation of carbon env

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata sk_ak_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2623.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2623


commit 3a69dc5818ee9d31f195a5737bffd1b80298ed3f
Author: kunal642 
Date:   2018-08-08T16:20:44Z

add sk ak to file factory on creation of carbon env




---


[GitHub] carbondata pull request #2612: [CARBONDATA-2834] Remove unnecessary nested l...

2018-08-06 Thread kunal642
GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2612

[CARBONDATA-2834] Remove unnecessary nested looping over 
loadMetadatadetails.

removed nested for loop which causes query performance degradation if…

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata nestedloop_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2612.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2612


commit ebe22d331dc4ea4ef6904e779702801c3eb5d859
Author: kunal642 
Date:   2018-08-06T12:47:28Z

removed nested for loop which causes query performance degradation if 
number of segments are too many




---


[GitHub] carbondata issue #2603: [Documentation] Editorial review comment fixed

2018-08-03 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2603
  
LGTM


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207250192
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,64 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+S3 is an Object Storage API on cloud, it is recommended for storing large 
data files. You can use 
+this feature if you want to store data on Amazon cloud or Huawei 
cloud(OBS).
+Since the data is stored on to cloud there are no restrictions on the size 
of data and the data can be accessed from anywhere at any time.
+Carbondata can support any Object Storage that conforms to Amazon S3 API.
--- End diff --

merged


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207250096
  
--- Diff: docs/configuration-parameters.md ---
@@ -106,7 +106,10 @@ This section provides the details of all the 
configurations required for CarbonD
 
|-|--|-|
 | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used 
during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed 
buffer size is 10485760 byte. |
 | carbon.lock.type | LOCALLOCK | This configuration specifies the type of 
lock to be acquired during concurrent operations on table. There are following 
types of lock implementation: - LOCALLOCK: Lock is created on local file system 
as file. This lock is useful when only one spark driver (thrift server) runs on 
a machine and no other CarbonData spark application is launched concurrently. - 
HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when 
multiple CarbonData spark applications are launched and no ZooKeeper is running 
on cluster and HDFS supports file based locking. |
-| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to be created. Recommended to configure zookeeper lock 
type or configure HDFS lock path(to this property) in case of S3 file system as 
locking is not feasible on S3.
+| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to 
+be created. Recommended to configure HDFS lock path(to this property) in 
case of S3 file system 
+as locking is not feasible on S3. 
+**Note:** If this property is not set to HDFS location for S3 store, then 
there is a possibility of data corruption. 
--- End diff --

done


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207250066
  
--- Diff: docs/configuration-parameters.md ---
@@ -106,7 +106,10 @@ This section provides the details of all the 
configurations required for CarbonD
 
|-|--|-|
 | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used 
during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed 
buffer size is 10485760 byte. |
 | carbon.lock.type | LOCALLOCK | This configuration specifies the type of 
lock to be acquired during concurrent operations on table. There are following 
types of lock implementation: - LOCALLOCK: Lock is created on local file system 
as file. This lock is useful when only one spark driver (thrift server) runs on 
a machine and no other CarbonData spark application is launched concurrently. - 
HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when 
multiple CarbonData spark applications are launched and no ZooKeeper is running 
on cluster and HDFS supports file based locking. |
-| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to be created. Recommended to configure zookeeper lock 
type or configure HDFS lock path(to this property) in case of S3 file system as 
locking is not feasible on S3.
+| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to 
--- End diff --

added description


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207249973
  
--- Diff: docs/data-management-on-carbondata.md ---
@@ -730,6 +736,8 @@ Users can specify which columns to include and exclude 
for local dictionary gene
   * If the IGNORE option is used, then bad records are neither loaded nor 
written to the separate CSV file.
   * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION 
is invalid and the load operation fails.
   * The maximum number of characters per column is 32000. If there are 
more than 32000 characters in a column, data loading will fail.
+  * Since Bad Records Path can be specified in both create, load and 
carbon properties. 
--- End diff --

done


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207249849
  
--- Diff: docs/datamap/preaggregate-datamap-guide.md ---
@@ -7,6 +24,7 @@
 * [Querying Data](#querying-data)
 * [Compaction](#compacting-pre-aggregate-tables)
 * [Data Management](#data-management-with-pre-aggregate-tables)
+* [Limitations](#Limitations)
--- End diff --

removed


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207249941
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,63 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+Amazon S3 is a cloud storage service that is recommended for storing large 
data files. You can 
+use this feature if you want to store data on amazon cloud. Since the data 
is stored on to cloud 
+storage there are no restrictions on the size of data and the data can be 
accessed from anywhere at any time.
+Carbon can support any Object store that conforms to Amazon S3 API. 
+
+#Writing to Object Store
+To store carbondata files on to Object Store location, you need to set 
`carbon
+.storelocation` property to Object Store path in CarbonProperties file. 
For example, carbon
+.storelocation=s3a://mybucket/carbonstore. By setting this property, all 
the tables will be created on the specified Object Store path.
+
+If your existing store is HDFS, and you want to store specific tables on 
S3 location, then `location` parameter has to be set during create 
+table. 
+For example:
+
+```
+CREATE TABLE IF NOT EXISTS db1.table1(col1 string, col2 int) STORED AS 
carbondata LOCATION 's3a://mybucket/carbonstore'
+``` 
+
+For more details on create table, Refer 
[data-management-on-carbondata](https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#create-table)
+
+#Authentication
+You need to set authentication properties to store the carbondata files on 
to S3 location. For 
+more details on authentication properties, refer 
+[hadoop authentication 
document](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties)
+
+Another way of setting the authentication parameters is as follows:
+
+```
+ SparkSession
+ .builder()
+ .master(masterURL)
+ .appName("S3Example")
+ .config("spark.driver.host", "localhost")
+ .config("spark.hadoop.fs.s3a.access.key", "")
+ .config("spark.hadoop.fs.s3a.secret.key", "")
+ .config("spark.hadoop.fs.s3a.endpoint", "1.1.1.1")
+ .getOrCreateCarbonSession()
+```
+
+#Recommendations
+1. Object stores like S3 does not support file leasing mechanism(supported 
by HDFS) that is 
+required to take locks which ensure consistency between concurrent 
operations therefore, it is 
+recommended to set the configurable lock path 
property([carbon.lock.path](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md#miscellaneous-configuration))
+ to a HDFS directory.
+2. As Object stores are eventual consistent meaning that any put request 
can take some time to reflect when trying to list objects from that bucket 
therefore concurrent queries are not supported. 
--- End diff --

done


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207249910
  
--- Diff: docs/s3-guide.md ---
@@ -0,0 +1,63 @@
+
+
+#S3 Guide (Alpha Feature 1.4.1)
+Amazon S3 is a cloud storage service that is recommended for storing large 
data files. You can 
--- End diff --

done


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2576#discussion_r207249485
  
--- Diff: docs/configuration-parameters.md ---
@@ -106,7 +106,12 @@ This section provides the details of all the 
configurations required for CarbonD
 
|-|--|-|
 | carbon.sort.file.write.buffer.size | 16384 | File write buffer size used 
during sorting. Minimum allowed buffer size is 10240 byte and Maximum allowed 
buffer size is 10485760 byte. |
 | carbon.lock.type | LOCALLOCK | This configuration specifies the type of 
lock to be acquired during concurrent operations on table. There are following 
types of lock implementation: - LOCALLOCK: Lock is created on local file system 
as file. This lock is useful when only one spark driver (thrift server) runs on 
a machine and no other CarbonData spark application is launched concurrently. - 
HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when 
multiple CarbonData spark applications are launched and no ZooKeeper is running 
on cluster and HDFS supports file based locking. |
-| carbon.lock.path | TABLEPATH | This configuration specifies the path 
where lock files have to be created. Recommended to configure zookeeper lock 
type or configure HDFS lock path(to this property) in case of S3 file system as 
locking is not feasible on S3.
+| carbon.lock.path | TABLEPATH | Locks on the files are used to prevent 
concurrent operation from modifying the same files. This 
+configuration specifies the path where lock files have to be created. 
Recommended to configure 
+HDFS lock path(to this property) in case of S3 file system as locking is 
not feasible on S3. 
+**Note:** If this property is not set to HDFS location for S3 store, then 
there is a possibility 
+of data corruption because multiple data manipulation calls might try to 
update the status file 
+and as lock is not acquired before updation data might get overwritten.
--- End diff --

added


---


[GitHub] carbondata issue #2599: [CARBONDATA-2812] Implement freeMemory for complex p...

2018-08-02 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2599
  
LGTM


---


[GitHub] carbondata issue #2571: [CARBONDATA-2792][schema restructure] Create externa...

2018-08-02 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2571
  
LGTM


---


[GitHub] carbondata issue #2600: [CARBONDATA-2813] Fixed code to get data size from L...

2018-08-02 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2600
  
retest this please


---


[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-08-02 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
retest this please


---


[GitHub] carbondata pull request #2600: [CARBONDATA-2813] Fixed code to get data size...

2018-08-02 Thread kunal642
GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2600

[CARBONDATA-2813] Fixed code to get data size from LoadDetails if size is 
written there.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata major_comp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2600.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2600


commit 44b66bb1e4fc2621c3641f536247d400a963cff1
Author: kunal642 
Date:   2018-08-02T06:14:20Z

Fixed code to get data size from LoadDetails if size is written there.




---


[GitHub] carbondata issue #2578: [CARBONDATA-2798] Fix Dictionary_Include for Complex...

2018-07-31 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2578
  
LGTM


---


[GitHub] carbondata pull request #2578: [CARBONDATA-2798] Fix Dictionary_Include for ...

2018-07-31 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2578#discussion_r206531160
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala
 ---
@@ -52,10 +52,11 @@ import 
org.apache.carbondata.core.datastore.page.encoding.DefaultEncodingFactory
 import org.apache.carbondata.core.metadata.ColumnarFormatVersion
 import org.apache.carbondata.core.metadata.datatype.DataTypes
 import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil, 
DataFileFooterConverterV3}
+import 
org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException
 import org.apache.carbondata.sdk.file._
 
 
-class TestNonTransactionalCarbonTable extends QueryTest with 
BeforeAndAfterAll {
+class TestNonTransactiogitnalCarbonTable extends QueryTest with 
BeforeAndAfterAll {
--- End diff --

Please revert this change


---


[GitHub] carbondata pull request #2571: [CARBONDATA-2792][schema restructure] Create ...

2018-07-31 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2571#discussion_r206480372
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/util/CarbonSparkUtil.scala
 ---
@@ -87,18 +88,20 @@ object CarbonSparkUtil {
 val fields = new Array[String](
   carbonRelation.dimensionsAttr.size + carbonRelation.measureAttr.size)
 val carbonTable = carbonRelation.carbonTable
+val columnSchemas: mutable.Buffer[ColumnSchema] = 
carbonTable.getTableInfo.getFactTable.
+  getListOfColumns.asScala.filter(_.getSchemaOrdinal != 
-1).filter(!_.isInvisible).
--- End diff --

both filter conditions can be combined to one like .filter(a>2 && b<10).


---


[GitHub] carbondata pull request #2578: Fix Dictionary_Include for ComplexDataType

2018-07-30 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2578#discussion_r206396378
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java
 ---
@@ -236,7 +236,7 @@
   }
 }
 
-if (carbonLoadModel.isCarbonTransactionalTable() && 
!CarbonDataProcessorUtil
+if (hadoopConf != null && !CarbonDataProcessorUtil
--- End diff --

why hadoopConf null check is needed?


---


[GitHub] carbondata issue #2576: [CARBONDATA-2795] Add documentation for S3

2018-07-30 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2576
  
@sraghunandan @sgururajshetty @chenliang613 Please review


---


[GitHub] carbondata issue #2578: Fix Dictionary_Include for ComplexDataType

2018-07-30 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2578
  
Please add a test case to verify the fix



---


[GitHub] carbondata pull request #2578: Fix Dictionary_Include for ComplexDataType

2018-07-30 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2578#discussion_r206091924
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/datatypes/ArrayDataType.java
 ---
@@ -172,7 +191,7 @@ public void setSurrogateIndex(int surrIndex) {
 
   @Override
   public void fillCardinality(List dimCardWithComplex) {
-if (children.getIsColumnDictionary()) {
+if (this.isDictionaryColumn) {
--- End diff --

return isDictionaryColumn from getIsColumnDictionary() also.


---


[GitHub] carbondata issue #2517: [CARBONDATA-2749][dataload] In HDFS Empty tablestatu...

2018-07-30 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2517
  
LGTM. The failures in 2.2.1 build are note related to this PR. Those are 
failing on master as well. Need to fix.


---


[GitHub] carbondata pull request #2563: WIP test

2018-07-29 Thread kunal642
Github user kunal642 closed the pull request at:

https://github.com/apache/carbondata/pull/2563


---


[GitHub] carbondata pull request #2576: [CARBONDATA-2795] disable preagg datamap on d...

2018-07-29 Thread kunal642
GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2576

[CARBONDATA-2795] disable preagg datamap on dataload for s3 store

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata disable_preagg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2576.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2576


commit ee154f1e94e8ada5fcba08cda3534e9d80ee506a
Author: kunal642 
Date:   2018-07-29T16:14:22Z

disable preagg datamap on dataload for s3 store




---


[GitHub] carbondata issue #2575: [WIP] fixed for ModularPlan exception during update ...

2018-07-29 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2575
  
@rahulforallp Please fix the build.


---


[GitHub] carbondata issue #2562: [HOTFIX] CreateDataMapPost Event was skipped in case...

2018-07-29 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2562
  
LGTM


---


[GitHub] carbondata issue #2552: [CARBONDATA-2781] Added fix for Null Pointer Excpeti...

2018-07-29 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2552
  
@praveenmeenakshi56 Please add detailed description stating the issue and 
the solution(Why specific handling for preagg and timeseries is required).


---


[GitHub] carbondata issue #2553: [HOTFIX] Fixed random test failure

2018-07-29 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2553
  
@mohammadshahidkhan Please add description


---


[GitHub] carbondata issue #2563: WIP test

2018-07-26 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2563
  
retest this please


---


[GitHub] carbondata issue #2563: WIP test

2018-07-26 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2563
  
retest this please


---


[GitHub] carbondata pull request #2563: WIP test

2018-07-26 Thread kunal642
GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2563

WIP test

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata temp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2563.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2563


commit f3e6792f2b96df0852c43cf816283a8f7dc64a4b
Author: kunal642 
Date:   2018-07-12T11:39:54Z

added hadoop conf in thread params




---


[GitHub] carbondata issue #2549: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

2018-07-25 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2549
  
@dhatchayani Please rebase


---


[GitHub] carbondata issue #2535: [CARBONDATA-2606]Fix Complex array Pushdown

2018-07-25 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2535
  
LGTM


---


[GitHub] carbondata pull request #2535: [CARBONDATA-2606]Fix Complex array Pushdown

2018-07-24 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2535#discussion_r204983701
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala
 ---
@@ -77,6 +77,7 @@ import 
org.apache.carbondata.spark.dictionary.provider.SecureDictionaryServicePr
 import org.apache.carbondata.spark.dictionary.server.SecureDictionaryServer
 import org.apache.carbondata.spark.load.{CsvRDDHelper, 
DataLoadProcessorStepOnSpark}
 import org.apache.carbondata.spark.rdd.CarbonDataRDDFactory
+import org.apache.carbondata.spark.rdd.CarbonDataRDDFactory.LOGGER
--- End diff --

no need for this import. CarbonLoadDataCommand already has a LOGGER


---


[GitHub] carbondata issue #2514: [CARBONDATA-2740]segment file is not getting deleted...

2018-07-24 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2514
  
LGTM


---


[GitHub] carbondata pull request #2548: [CARBONDATA-2778]Fixed bug when select after ...

2018-07-24 Thread kunal642
GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2548

[CARBONDATA-2778]Fixed bug when select after delete and cleanup is showing 
empty records

Problem: In case if delete operation when it is found that the data being 
deleted is leading to a state where one complete block data is getting deleted. 
In that case the status if that block is marked for delete and during the next 
delete operation run the block is deleted along with its carbonIndex file. The 
problem arises due to deletion of carbonIndex file because for multiple blocks 
there can be one carbonIndex file as one carbonIndex file represents one task.

Solution: Do not delete the carbondata and carbonIndex file. After 
compaction it will automatically take care of deleting the stale data and stale 
segments.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata iud_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2548.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2548


commit b5f9c3c857a93d7633580417f63e9c745cbcbbe6
Author: kunal642 
Date:   2018-07-24T10:42:54Z

Fixed bug when select after delete and cleanup is showing empty records




---


[GitHub] carbondata issue #2484: [HOTFIX] added hadoop conf to thread local

2018-07-24 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2484
  
retest this please


---


[GitHub] carbondata pull request #2535: [CARBONDATA-2606]Fix Complex array Pushdown

2018-07-24 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2535#discussion_r204642515
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestComplexDataType.scala
 ---
@@ -885,4 +885,47 @@ class TestComplexDataType extends QueryTest with 
BeforeAndAfterAll {
 checkExistence(sql("select * from table1"),true,"1.0E9")
   }
 
+  test("test block compaction - auto merge") {
+sql("DROP TABLE IF EXISTS table1")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.ENABLE_AUTO_LOAD_MERGE, "true")
+sql(
+  "create table table1 (roll int,person 
Struct) stored " +
+  "by 'carbondata'")
+sql(
+  "load data inpath '" + resourcesPath +
+  "/Struct.csv' into table table1 options('delimiter'=','," +
+  
"'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," +
+  "'complex_delimiter_level_2'='&')")
+sql(
+  "load data inpath '" + resourcesPath +
+  "/Struct.csv' into table table1 options('delimiter'=','," +
+  
"'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," +
+  "'complex_delimiter_level_2'='&')")
+sql(
+  "load data inpath '" + resourcesPath +
+  "/Struct.csv' into table table1 options('delimiter'=','," +
+  
"'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," +
+  "'complex_delimiter_level_2'='&')")
+sql(
+  "load data inpath '" + resourcesPath +
+  "/Struct.csv' into table table1 options('delimiter'=','," +
+  
"'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," +
+  "'complex_delimiter_level_2'='&')")
+checkExistence(sql("show segments for table table1"),false, 
"Compacted")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.ENABLE_AUTO_LOAD_MERGE, "false")
--- End diff --

add this in afterAll too


---


[GitHub] carbondata issue #2511: [CARBONDATA-2745] Added atomic file operations for S...

2018-07-24 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2511
  
@gvramana Build passed. Please review


---


[GitHub] carbondata pull request #2535: [CARBONDATA-2606]Fix Complex array Pushdown

2018-07-23 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2535#discussion_r204296432
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestComplexDataType.scala
 ---
@@ -885,4 +885,36 @@ class TestComplexDataType extends QueryTest with 
BeforeAndAfterAll {
 checkExistence(sql("select * from table1"),true,"1.0E9")
   }
 
+  test("test block compaction - auto merge") {
+sql("DROP TABLE IF EXISTS table1")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.ENABLE_AUTO_LOAD_MERGE, "true")
+sql(
+  "create table table1 (roll int,person 
Struct) stored " +
+  "by 'carbondata'")
+sql(
+  "load data inpath '" + resourcesPath +
+  "/Struct.csv' into table table1 options('delimiter'=','," +
+  
"'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," +
+  "'complex_delimiter_level_2'='&')")
+sql(
+  "load data inpath '" + resourcesPath +
+  "/Struct.csv' into table table1 options('delimiter'=','," +
+  
"'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," +
+  "'complex_delimiter_level_2'='&')")
+sql(
+  "load data inpath '" + resourcesPath +
+  "/Struct.csv' into table table1 options('delimiter'=','," +
+  
"'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," +
+  "'complex_delimiter_level_2'='&')")
+sql(
+  "load data inpath '" + resourcesPath +
+  "/Struct.csv' into table table1 options('delimiter'=','," +
+  
"'quotechar'='\"','fileheader'='roll,person','complex_delimiter_level_1'='$'," +
+  "'complex_delimiter_level_2'='&')")
+checkAnswer(sql("select count(*) from table1"),Seq(Row(40)))
--- End diff --

check for segments whether compaction has happened or not


---


[GitHub] carbondata pull request #2535: [CARBONDATA-2606]Fix Complex array Pushdown

2018-07-23 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2535#discussion_r204296035
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -578,13 +578,19 @@ object CarbonDataRDDFactory {
 if (carbonTable.isHivePartitionTable) {
   carbonLoadModel.setFactTimeStamp(System.currentTimeMillis())
 }
-val compactedSegments = new util.ArrayList[String]()
-handleSegmentMerging(sqlContext,
-  carbonLoadModel,
-  carbonTable,
-  compactedSegments,
-  operationContext)
-carbonLoadModel.setMergedSegmentIds(compactedSegments)
+// Block compaction for table containing complex datatype
+if (carbonTable.getTableInfo.getFactTable.getListOfColumns.asScala
+  .exists(m => m.getDataType.isComplexType)) {
+  LOGGER.info("Compaction is skipped as table contains complex 
columns")
--- End diff --

change to warn


---


[GitHub] carbondata pull request #2535: [CARBONDATA-2606]Fix Complex array Pushdown

2018-07-23 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2535#discussion_r204296013
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala
 ---
@@ -823,15 +824,21 @@ case class CarbonLoadDataCommand(
 }
 try {
   carbonLoadModel.setFactTimeStamp(System.currentTimeMillis())
-  val compactedSegments = new util.ArrayList[String]()
-  // Trigger auto compaction
-  CarbonDataRDDFactory.handleSegmentMerging(
-sparkSession.sqlContext,
-carbonLoadModel,
-table,
-compactedSegments,
-operationContext)
-  carbonLoadModel.setMergedSegmentIds(compactedSegments)
+  // Block compaction for table containing complex datatype
+  if (table.getTableInfo.getFactTable.getListOfColumns.asScala
+.exists(m => m.getDataType.isComplexType)) {
+LOGGER.info("Compaction is skipped as table contains complex 
columns")
--- End diff --

change to warn


---


[GitHub] carbondata issue #2511: [CARBONDATA-2745] Added atomic file operations for S...

2018-07-23 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2511
  
retest this please


---


[GitHub] carbondata pull request #2536: [CARBONDATA-2766] Added null check on filesta...

2018-07-23 Thread kunal642
Github user kunal642 closed the pull request at:

https://github.com/apache/carbondata/pull/2536


---


[GitHub] carbondata issue #2536: [CARBONDATA-2766] Added null check on filestatus

2018-07-23 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2536
  
@xuchuanyin Closing this PR. Refer #2465


---


[GitHub] carbondata issue #2484: [HOTFIX] added hadoop conf to thread local

2018-07-22 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2484
  
retest this please


---


[GitHub] carbondata pull request #2536: [CARBONDATA-2766] Added null check on filesta...

2018-07-22 Thread kunal642
GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2536

[CARBONDATA-2766] Added null check on filestatus

**Problem:**  While doing any operation on Carbon File if file status 
acquiring throws exception then it is logged and not thrown again therefore if 
any further operation on that object required filestatus then it can throw NPE.

**Solution:** Add null check and throw proper exception.
Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata fix_npe

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2536.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2536


commit 060dc7fc461d9f2eec6603b45651d76588dbb9bc
Author: kunal642 
Date:   2018-07-22T11:39:08Z

added null check on filestatus




---


[GitHub] carbondata pull request #2525: [CARBONDATA-2756] refactored code to use ZSTD...

2018-07-19 Thread kunal642
GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2525

[CARBONDATA-2756] refactored code to use ZSTD compression using Reflection

1. refactored code to use ZSTD compression using Reflection
2. add license

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata zstd_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2525.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2525


commit 1cada8b2b23f846d31ab80c1592346804583d17b
Author: kunal642 
Date:   2018-07-19T06:47:04Z

refactored code to use ZSTD compression using Reflection




---


[GitHub] carbondata issue #2484: [HOTFIX] added hadoop conf to thread local

2018-07-18 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2484
  
retest this please


---


[GitHub] carbondata pull request #2484: [HOTFIX] added hadoop conf to thread local

2018-07-18 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2484#discussion_r203263551
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java 
---
@@ -43,11 +45,19 @@
*/
   private static final LogService LOGGER =
   LogServiceFactory.getLogService(FileFactory.class.getName());
-  private static Configuration configuration = null;
 
-  static {
-configuration = new Configuration();
-configuration.addResource(new Path("../core-default.xml"));
+  public static Configuration getConfiguration() {
+Configuration configuration;
+if (ThreadLocalSessionInfo.getCarbonSessionInfo() == null) {
+  configuration = new Configuration();
+  configuration.addResource(new Path("../core-default.xml"));
+} else {
+  CarbonConfiguration carbonConfiguration =
+  (CarbonConfiguration) 
ThreadLocalSessionInfo.getCarbonSessionInfo().getThreadParams()
--- End diff --

already handled If carbonConf is not present taking new CarbonConfiguration 
 as default value


---


[GitHub] carbondata pull request #2484: [HOTFIX] added hadoop conf to thread local

2018-07-18 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2484#discussion_r203263435
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonRDD.scala
 ---
@@ -79,14 +66,11 @@ abstract class CarbonRDD[T: ClassTag](@transient sc: 
SparkContext,
   }
 
   def getConf: Configuration = {
-val configuration = new Configuration(false)
-val bai = new 
ByteArrayInputStream(CompressorFactory.getInstance().getCompressor
-  .unCompressByte(confBytes))
-val ois = new ObjectInputStream(bai)
-configuration.readFields(ois)
-ois.close()
-configuration
+val carbonConfiguration = 
carbonSessionInfo.getThreadParams.getExtraInfo("carbonConf")
--- End diff --

done


---


[GitHub] carbondata issue #2502: [CARBONDATA-2738]Update documentation for Complex da...

2018-07-17 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2502
  
LGTM


---


[GitHub] carbondata issue #2501: [CARBONDATA-2738]Block Preaggregate, Compaction, Dic...

2018-07-17 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2501
  
LGTM


---


[GitHub] carbondata pull request #2484: [HOTFIX] added hadoop conf to thread local

2018-07-17 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2484#discussion_r203071359
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonConfiguration.java ---
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.util;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.Serializable;
+
+import org.apache.carbondata.core.datastore.compression.CompressorFactory;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+
+public class CarbonConfiguration implements Serializable {
+
+  private static final long serialVersionUID = 3811544264223154007L;
+  private transient Configuration configuration;
+  private byte[] confBytes;
+
+  public CarbonConfiguration(Configuration configuration) {
+ByteArrayOutputStream bao = new ByteArrayOutputStream();
+try {
+  ObjectOutputStream oos = new ObjectOutputStream(bao);
+  configuration.write(oos);
+  oos.close();
+  this.confBytes =
+  
CompressorFactory.getInstance().getCompressor().compressByte(bao.toByteArray());
+} catch (IOException e) {
+  throw new RuntimeException(e);
+}
+  }
+
+  public CarbonConfiguration() {
+this.configuration = new Configuration();
+configuration.addResource(new Path("../core-default.xml"));
+  }
+
+  public Configuration getConfiguration() {
--- End diff --

Once CarbonConfiguration is set into ThreadLocal only FileFactory and 
CarbonRDD will try to deserialize the configuration. This will be done once per 
instance of CarbonConfiguration. If configuration is already deserialized we 
will return the same object.


---


[GitHub] carbondata issue #2448: [HotFix] Getting carbon table identifier to datamap ...

2018-07-17 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2448
  
LGTM


---


[GitHub] carbondata pull request #2501: [CARBONDATA-2738]Block Preaggregate, Compacti...

2018-07-17 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2501#discussion_r202903937
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
 ---
@@ -59,6 +60,13 @@ private[sql] case class CarbonProjectForUpdateCommand(
   return Seq.empty
 }
 val carbonTable = CarbonEnv.getCarbonTable(databaseNameOp, 
tableName)(sparkSession)
+columns.foreach { col =>
+  val dataType = carbonTable.getColumnByName(tableName, 
col).getColumnSchema.getDataType
+  if (dataType.isComplexType) {
+throw new MalformedCarbonCommandException("Unsupported operation 
on Complex data type")
--- End diff --

throw UnsupportedOperationException


---


[GitHub] carbondata pull request #2501: [CARBONDATA-2738]Block Preaggregate, Compacti...

2018-07-17 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2501#discussion_r202903870
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAlterTableCompactionCommand.scala
 ---
@@ -82,6 +82,12 @@ case class CarbonAlterTableCompactionCommand(
   throw new MalformedCarbonCommandException("Unsupported operation on 
non transactional table")
 }
 
+if (table.getTableInfo.getFactTable.getListOfColumns.asScala
+  .exists(m => m.getDataType.isComplexType)) {
+  throw new MalformedCarbonCommandException(
--- End diff --

throw UnsupportedOperationException


---


[GitHub] carbondata pull request #2501: [CARBONDATA-2738]Block Preaggregate, Compacti...

2018-07-17 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2501#discussion_r202904039
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateUtil.scala
 ---
@@ -354,7 +354,13 @@ object PreAggregateUtil {
 !expression.isInstanceOf[AttributeReference]) {
   newColumnName
 } else {
-  expression.asInstanceOf[AttributeReference].name
+  if (expression.isInstanceOf[GetStructField] || 
expression.isInstanceOf[GetArrayItem]) {
+throw new MalformedCarbonCommandException(
--- End diff --

throw UnsupportedOperationException


---


[GitHub] carbondata pull request #2501: [CARBONDATA-2738]Block Preaggregate, Compacti...

2018-07-17 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2501#discussion_r202904751
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestComplexDataType.scala
 ---
@@ -712,5 +713,166 @@ class TestComplexDataType extends QueryTest with 
BeforeAndAfterAll {
 checkAnswer(sql("select 
a.b,id,a.c,person.detail[0],d.e,d.f,person.detail[1],id from 
test"),Seq(Row(2,1,3,5,3,2,6,1)))
 checkAnswer(sql("select 
a.b,id,a.c,person.detail[0],d.e,d.f,person.detail[1],id,1,a.b from 
test"),Seq(Row(2,1,3,5,3,2,6,1,1,2)))
   }
-  
+
+  test("test block Update for complex datatype") {
+sql("DROP TABLE IF EXISTS test")
+sql("create table test(id int,a struct,d array) 
stored by 'carbondata'")
+sql("insert into test values(1,'2$3',4)")
+val structException = intercept[MalformedCarbonCommandException](
+sql("update test set(a.b)=(4) where id=1").show(false))
+assertResult("Unsupported operation on Complex data 
type")(structException.getMessage)
+val arrayException = intercept[MalformedCarbonCommandException](
+sql("update test set(a)=(4) where id=1").show(false))
+assertResult("Unsupported operation on Complex data 
type")(arrayException.getMessage)
+  }
+
+  test("test block partition column") {
+sql("DROP TABLE IF EXISTS list_table_area_origin")
+val arrayException = intercept[AnalysisException](
+sql("""
+  | CREATE TABLE IF NOT EXISTS list_table_area_origin
+  | (
+  | id Int,
+  | vin string,
+  | logdate Timestamp,
+  | phonenumber Long,
+  | country array,
+  | salary Int
+  | )
+  | PARTITIONED BY (area array)
+  | STORED BY 'carbondata'
+""".stripMargin))
+assertResult("Cannot use array for partition 
column;")(arrayException.getMessage)
+sql("DROP TABLE IF EXISTS list_table_area_origin")
+val structException = intercept[AnalysisException](
+  sql("""
+| CREATE TABLE IF NOT EXISTS list_table_area_origin
+| (
+| id Int,
+| vin string,
+| logdate Timestamp,
+| phonenumber Long,
+| country array,
+| salary Int
+| )
+| PARTITIONED BY (area struct)
+| STORED BY 'carbondata'
+  """.stripMargin)
+)
+assertResult("Cannot use struct for partition 
column;")(structException.getMessage)
+  }
+
+  test("test block preaggregate") {
+sql("DROP TABLE IF EXISTS test")
+sql("create table test(id int,a struct) stored by 'carbondata'")
+sql("insert into test values (1,2)")
+sql("insert into test values (1,2)")
+sql("insert into test values (1,2)")
+val structException = intercept[MalformedCarbonCommandException](
+  sql("create datamap preagg_sum on table test using 'preaggregate' as 
select id,sum(a.b) from test group by id"))
+assertResult("Preaggregate is unsupported for ComplexData type column: 
a.b")(structException.getMessage)
+sql("DROP TABLE IF EXISTS test")
+sql("create table test(id int,a array) stored by 'carbondata'")
+sql("insert into test values (1,2)")
+val arrayException = intercept[MalformedCarbonCommandException](
+  sql("create datamap preagg_sum on table test using 'preaggregate' as 
select id,sum(a[0]) from test group by id"))
+assertResult("Preaggregate is unsupported for ComplexData type column: 
a[0]")(arrayException.getMessage)
+  }
+
+  test("test block dictionary exclude for child column") {
+sql("DROP TABLE IF EXISTS table1")
+sql(
+  "create table table1 (roll int,a 
struct,j:int>) stored " +
+  "by " +
+  "'carbondata' tblproperties('dictionary_exclude'='a')")
+sql("insert into table1 values(1,'1$abc$2$efg$3:mno:4$5')")
+checkAnswer(sql("select a.b from table1"), Seq(Row(1)))
+sql("DROP TABLE IF EXISTS table1")
+val structException = intercept[MalformedCarbonCommandException](
+sql(
+  "create table table1 (roll int,a 
struct,j:int>) stored " +
+  "by " +
+  "'carbondata' tblproperties(

[GitHub] carbondata pull request #2501: [CARBONDATA-2738]Block Preaggregate, Compacti...

2018-07-17 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2501#discussion_r202904600
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestComplexDataType.scala
 ---
@@ -712,5 +713,166 @@ class TestComplexDataType extends QueryTest with 
BeforeAndAfterAll {
 checkAnswer(sql("select 
a.b,id,a.c,person.detail[0],d.e,d.f,person.detail[1],id from 
test"),Seq(Row(2,1,3,5,3,2,6,1)))
 checkAnswer(sql("select 
a.b,id,a.c,person.detail[0],d.e,d.f,person.detail[1],id,1,a.b from 
test"),Seq(Row(2,1,3,5,3,2,6,1,1,2)))
   }
-  
+
+  test("test block Update for complex datatype") {
+sql("DROP TABLE IF EXISTS test")
+sql("create table test(id int,a struct,d array) 
stored by 'carbondata'")
+sql("insert into test values(1,'2$3',4)")
+val structException = intercept[MalformedCarbonCommandException](
+sql("update test set(a.b)=(4) where id=1").show(false))
+assertResult("Unsupported operation on Complex data 
type")(structException.getMessage)
+val arrayException = intercept[MalformedCarbonCommandException](
+sql("update test set(a)=(4) where id=1").show(false))
+assertResult("Unsupported operation on Complex data 
type")(arrayException.getMessage)
+  }
+
+  test("test block partition column") {
+sql("DROP TABLE IF EXISTS list_table_area_origin")
+val arrayException = intercept[AnalysisException](
+sql("""
+  | CREATE TABLE IF NOT EXISTS list_table_area_origin
+  | (
+  | id Int,
+  | vin string,
+  | logdate Timestamp,
+  | phonenumber Long,
+  | country array,
+  | salary Int
+  | )
+  | PARTITIONED BY (area array)
+  | STORED BY 'carbondata'
+""".stripMargin))
+assertResult("Cannot use array for partition 
column;")(arrayException.getMessage)
+sql("DROP TABLE IF EXISTS list_table_area_origin")
+val structException = intercept[AnalysisException](
+  sql("""
+| CREATE TABLE IF NOT EXISTS list_table_area_origin
+| (
+| id Int,
+| vin string,
+| logdate Timestamp,
+| phonenumber Long,
+| country array,
+| salary Int
+| )
+| PARTITIONED BY (area struct)
+| STORED BY 'carbondata'
+  """.stripMargin)
+)
+assertResult("Cannot use struct for partition 
column;")(structException.getMessage)
+  }
+
+  test("test block preaggregate") {
+sql("DROP TABLE IF EXISTS test")
+sql("create table test(id int,a struct) stored by 'carbondata'")
+sql("insert into test values (1,2)")
+sql("insert into test values (1,2)")
+sql("insert into test values (1,2)")
+val structException = intercept[MalformedCarbonCommandException](
+  sql("create datamap preagg_sum on table test using 'preaggregate' as 
select id,sum(a.b) from test group by id"))
+assertResult("Preaggregate is unsupported for ComplexData type column: 
a.b")(structException.getMessage)
+sql("DROP TABLE IF EXISTS test")
+sql("create table test(id int,a array) stored by 'carbondata'")
+sql("insert into test values (1,2)")
+val arrayException = intercept[MalformedCarbonCommandException](
+  sql("create datamap preagg_sum on table test using 'preaggregate' as 
select id,sum(a[0]) from test group by id"))
+assertResult("Preaggregate is unsupported for ComplexData type column: 
a[0]")(arrayException.getMessage)
+  }
+
+  test("test block dictionary exclude for child column") {
+sql("DROP TABLE IF EXISTS table1")
+sql(
+  "create table table1 (roll int,a 
struct,j:int>) stored " +
+  "by " +
+  "'carbondata' tblproperties('dictionary_exclude'='a')")
+sql("insert into table1 values(1,'1$abc$2$efg$3:mno:4$5')")
+checkAnswer(sql("select a.b from table1"), Seq(Row(1)))
+sql("DROP TABLE IF EXISTS table1")
+val structException = intercept[MalformedCarbonCommandException](
+sql(
+  "create table table1 (roll int,a 
struct,j:int>) stored " +
+  "by " +
+  "'carbondata' tblproperties('dictionary_exclude'='a.b')"))
+assertResult(
+  "DICTIONARY_EXCLUDE column: a.b does not exist in table or 
unsupported for complex child " +
+  "column. Please check create table statement.")(
+  structException.getMessage)
+sql("DROP TABLE IF EXISTS table1")
--- End diff --

drop will not be executred if assertion fails. Please make sure all tables 
are deleted in afterAll also.


---


[GitHub] carbondata pull request #2511: [CARBONDATA-2745] Added atomic file operation...

2018-07-16 Thread kunal642
GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2511

[CARBONDATA-2745] Added atomic file operations for S3

Problem: AtomicFileOperationImpl creates a temporary file and then renames 
the file to actual file name. This is risky in S3 storage as the file has to be 
deleted and then recreated. 

Solution: Create a seperate implementaion for s3 and while writing write 
with the same name with overwrite mode.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata s3atomicfileoperations

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2511.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2511


commit 8e84566cc5bd4a572ec31ad4a017ac7128de6998
Author: kunal642 
Date:   2018-07-16T08:58:00Z

Added atomic file operations for S3




---


[GitHub] carbondata pull request #2484: [WIP] added hadoop conf to thread local

2018-07-16 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2484#discussion_r202586005
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java 
---
@@ -43,11 +45,19 @@
*/
   private static final LogService LOGGER =
   LogServiceFactory.getLogService(FileFactory.class.getName());
-  private static Configuration configuration = null;
 
-  static {
-configuration = new Configuration();
+  public static Configuration getConfiguration() {
+Configuration configuration;
+if (ThreadLocalSessionInfo.getCarbonSessionInfo() == null) {
+  configuration = new Configuration();
+} else {
+  CarbonConfiguration carbonConfiguration =
+  (CarbonConfiguration) 
ThreadLocalSessionInfo.getCarbonSessionInfo().getThreadParams()
+  .getExtraInfo("carbonConf", new CarbonConfiguration());
+  configuration = carbonConfiguration.getConfiguration();
+}
 configuration.addResource(new Path("../core-default.xml"));
--- End diff --

done


---


[GitHub] carbondata pull request #2484: [WIP] added hadoop conf to thread local

2018-07-16 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2484#discussion_r202585892
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java 
---
@@ -43,11 +45,19 @@
*/
   private static final LogService LOGGER =
   LogServiceFactory.getLogService(FileFactory.class.getName());
-  private static Configuration configuration = null;
 
-  static {
-configuration = new Configuration();
+  public static Configuration getConfiguration() {
--- End diff --

removed config parameters from FileFactory


---


[GitHub] carbondata pull request #2484: [WIP] added hadoop conf to thread local

2018-07-16 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2484#discussion_r202585994
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/util/CarbonConfiguration.java ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.util;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.Serializable;
+
+import org.apache.carbondata.core.datastore.compression.CompressorFactory;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+
+public class CarbonConfiguration implements Serializable {
+
+  private static final long serialVersionUID = 3811544264223154007L;
+  private transient Configuration configuration;
+  private byte[] confBytes;
+
+  public CarbonConfiguration(Configuration configuration) {
+ByteArrayOutputStream bao = new ByteArrayOutputStream();
+try {
+  ObjectOutputStream oos = new ObjectOutputStream(bao);
+  configuration.write(oos);
+  oos.close();
+  this.confBytes =
+  
CompressorFactory.getInstance().getCompressor().compressByte(bao.toByteArray());
+} catch (IOException e) {
+  throw new RuntimeException(e);
+}
+  }
+
+  public CarbonConfiguration() {
+this.configuration = new Configuration();
+  }
+
+  public Configuration getConfiguration() {
+if (configuration == null) {
+  if (confBytes == null) {
+throw new RuntimeException("Configuration not specified");
+  }
+  configuration = new Configuration(false);
+  ByteArrayInputStream bias = new ByteArrayInputStream(
+  
CompressorFactory.getInstance().getCompressor().unCompressByte(confBytes));
+  try {
+ObjectInputStream ois = new ObjectInputStream(bias);
+configuration.readFields(ois);
+ois.close();
+  } catch (IOException e) {
+throw new RuntimeException(e);
+  }
+}
+configuration.addResource(new Path("../core-default.xml"));
--- End diff --

done


---


[GitHub] carbondata pull request #2484: [WIP] added hadoop conf to thread local

2018-07-16 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2484#discussion_r202585871
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/rdd/UpdateCoalescedRDD.scala
 ---
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.rdd
+
+import scala.reflect.ClassTag
+
+import org.apache.spark._
+
+import org.apache.carbondata.spark.rdd.CarbonRDD
+
+
+// This RDD distributes previous RDD data based on number of nodes. i.e., 
one partition for one node
+
+class UpdateCoalescedRDD[T: ClassTag](
--- End diff --

removed


---


[GitHub] carbondata issue #2489: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

2018-07-13 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2489
  
LGTM


---


[GitHub] carbondata issue #2489: [CARBONDATA-2606][Complex DataType Enhancements]Fix ...

2018-07-12 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2489
  
LGTM


---


[GitHub] carbondata pull request #2489: [CARBONDATA-2606][Complex DataType Enhancemen...

2018-07-12 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2489#discussion_r202076454
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedResultCollector.java
 ---
@@ -140,6 +140,13 @@ public 
DictionaryBasedResultCollector(BlockExecutionInfo blockExecutionInfos) {
 continue;
   }
   fillMeasureData(scannedResult, row);
+  if 
(scannedResult.complexParentIndexToQueryMap.toString().contains("StructQueryType"))
 {
--- End diff --

put a comment explaining the logic.


---


[GitHub] carbondata pull request #2484: [WIP] added hadoop conf to thread local

2018-07-11 Thread kunal642
GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2484

[WIP] added hadoop conf to thread local

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata hadoop_conf_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2484.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2484


commit 89ff67e601ea842db2693c269f783d7ddc701c02
Author: kunal642 
Date:   2018-07-10T06:38:41Z

added hadoop conf to thread local




---


[GitHub] carbondata pull request #2472: [CARBONDATA-2717] fixed table id empty proble...

2018-07-10 Thread kunal642
GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2472

[CARBONDATA-2717] fixed table id empty problem while taking drop lock

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata table_identfifier_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2472.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2472


commit 25ca99908cd22b46cf295867d61c1f2c18c09249
Author: kunal642 
Date:   2018-07-10T09:08:05Z

fixed table id empty problem while taking drop lock




---


[GitHub] carbondata issue #2451: [CARBONDATA-2585][CARBONDATA-2586]Fix local dictiona...

2018-07-10 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2451
  
LGTM


---


[GitHub] carbondata issue #2447: [CARBONDATA-2589][CARBONDATA-2590][CARBONDATA-2602]L...

2018-07-09 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2447
  
LGTM


---


[GitHub] carbondata issue #2465: [WIP] Refactored CarbonFile interface

2018-07-09 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2465
  
retest sdv please


---


[GitHub] carbondata pull request #2465: [WIP] Refactored CarbonFile interface

2018-07-09 Thread kunal642
GitHub user kunal642 opened a pull request:

https://github.com/apache/carbondata/pull/2465

[WIP] Refactored CarbonFile interface

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/carbondata fix_carbonfile_impl

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2465.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2465


commit 6c957c21357648b3b0754cda54525f8ab433c7a6
Author: kunal642 
Date:   2018-07-09T07:16:54Z

refactored CarbonFile interface




---


[GitHub] carbondata pull request #2451: [CARBONDATA-2585][CARBONDATA-2586]Fix local d...

2018-07-09 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2451#discussion_r200894716
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala
 ---
@@ -136,14 +136,33 @@ case class PreAggregateTableHelper(
 parentTable.getTableInfo.getFactTable.getTableProperties.asScala
   .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_THRESHOLD,
 CarbonCommonConstants.LOCAL_DICTIONARY_THRESHOLD_DEFAULT))
-tableProperties
-  .put(CarbonCommonConstants.LOCAL_DICTIONARY_INCLUDE,
-parentTable.getTableInfo.getFactTable.getTableProperties.asScala
-  .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_INCLUDE, ""))
-tableProperties
-  .put(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE,
-parentTable.getTableInfo.getFactTable.getTableProperties.asScala
-  .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE, ""))
+val parentDictInclude = 
parentTable.getTableInfo.getFactTable.getTableProperties.asScala
+  .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_INCLUDE, 
"").split(",")
+
+val parentDictExclude = 
parentTable.getTableInfo.getFactTable.getTableProperties.asScala
+  .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE, 
"").split(",")
+
+var newDictInclude = Seq[String]()
+var newDictExclude = Seq[String]()
+parentDictInclude.foreach(parentcol =>
+  fields.filter(col => fieldRelationMap(col).aggregateFunction.isEmpty 
&&
+   parentcol.equals(fieldRelationMap(col).
+ 
columnTableRelationList.get.head.parentColumnName))
+.map(cols => newDictInclude :+= cols.column))
+parentDictExclude.foreach(parentcol =>
+  fields.filter(col => fieldRelationMap(col).aggregateFunction.isEmpty 
&&
+   parentcol.equals(fieldRelationMap(col).
+ 
columnTableRelationList.get.head.parentColumnName))
+.map(cols => newDictExclude :+= cols.column))
--- End diff --

change .filter(..).map(..) to .collect


---


[GitHub] carbondata pull request #2451: [CARBONDATA-2585][CARBONDATA-2586]Fix local d...

2018-07-09 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2451#discussion_r200894679
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateTableHelper.scala
 ---
@@ -136,14 +136,33 @@ case class PreAggregateTableHelper(
 parentTable.getTableInfo.getFactTable.getTableProperties.asScala
   .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_THRESHOLD,
 CarbonCommonConstants.LOCAL_DICTIONARY_THRESHOLD_DEFAULT))
-tableProperties
-  .put(CarbonCommonConstants.LOCAL_DICTIONARY_INCLUDE,
-parentTable.getTableInfo.getFactTable.getTableProperties.asScala
-  .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_INCLUDE, ""))
-tableProperties
-  .put(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE,
-parentTable.getTableInfo.getFactTable.getTableProperties.asScala
-  .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE, ""))
+val parentDictInclude = 
parentTable.getTableInfo.getFactTable.getTableProperties.asScala
+  .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_INCLUDE, 
"").split(",")
+
+val parentDictExclude = 
parentTable.getTableInfo.getFactTable.getTableProperties.asScala
+  .getOrElse(CarbonCommonConstants.LOCAL_DICTIONARY_EXCLUDE, 
"").split(",")
+
+var newDictInclude = Seq[String]()
+var newDictExclude = Seq[String]()
+parentDictInclude.foreach(parentcol =>
+  fields.filter(col => fieldRelationMap(col).aggregateFunction.isEmpty 
&&
--- End diff --

change .filter(..).map(..) to .collect


---


[GitHub] carbondata issue #2450: [CARBONDATA-2689] Added validations for complex colu...

2018-07-09 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2450
  
LGTM


---


[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...

2018-07-06 Thread kunal642
Github user kunal642 closed the pull request at:

https://github.com/apache/carbondata/pull/2406


---


[GitHub] carbondata pull request #2450: [CARBONDATA-2689] Added validations for compl...

2018-07-06 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2450#discussion_r200610555
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/localdictionary/LocalDictionarySupportAlterTableTest.scala
 ---
@@ -822,17 +821,17 @@ class LocalDictionarySupportAlterTableTest extends 
QueryTest with BeforeAndAfter
 sql(
   """
 | CREATE TABLE local1(id int, name string, city string, age 
int,add string)
-| STORED BY 'carbondata' 
tblproperties('local_dictionary_include'='city')
+| STORED BY 'carbondata' 
tblproperties('local_dictionary_exclude'='city')
--- End diff --

change local_dictionary_exclude to local_dictionary_include as the name 
suggests



---


[GitHub] carbondata issue #2422: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

2018-07-06 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/2422
  
LGTM


---


[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...

2018-07-05 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2406#discussion_r200554082
  
--- Diff: 
processing/src/test/java/org/apache/carbondata/lcm/locks/LocalFileLockTest.java 
---
@@ -68,4 +80,18 @@
 Assert.assertTrue(localLock2.unlock());
   }
 
+  @Test public void testConfigurablePathForLock() throws Exception {
+Field f = secretClass.getDeclaredField("lockPath");
+f.setAccessible(true);
+f.set(secretClass, rootPath + "/target/");
+AbsoluteTableIdentifier absoluteTableIdentifier = 
AbsoluteTableIdentifier
+
.from(CarbonProperties.getInstance().getProperty("carbon.storelocation"), 
"databaseName",
+"tableName", "1");
--- End diff --

done


---


[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...

2018-07-05 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2406#discussion_r200554062
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
 ---
@@ -386,7 +386,8 @@ class CarbonFileMetastore extends CarbonMetaStore {
 val schemaMetadataPath = 
CarbonTablePath.getFolderContainingFile(schemaFilePath)
 val fileType = FileFactory.getFileType(schemaMetadataPath)
 if (!FileFactory.isFileExist(schemaMetadataPath, fileType)) {
-  val isDirCreated = FileFactory.mkdirs(schemaMetadataPath, fileType)
+  val isDirCreated = FileFactory
--- End diff --

removed


---


[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...

2018-07-05 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2406#discussion_r200554022
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java ---
@@ -105,4 +120,10 @@ private static void getLockTypeConfigured() {
 LOGGER.info("Configured lock type is: " + lockTypeConfigured);
   }
 
+  private static void getLockpath() {
+lockPath = CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.LOCK_PATH, "")
+.toUpperCase();
--- End diff --

done


---


[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...

2018-07-05 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2406#discussion_r200554007
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java ---
@@ -52,44 +55,56 @@
*/
   public static ICarbonLock getCarbonLockObj(AbsoluteTableIdentifier 
absoluteTableIdentifier,
   String lockFile) {
-
-String tablePath = absoluteTableIdentifier.getTablePath();
+String tablePath;
+if (lockPath.isEmpty()) {
+  tablePath = absoluteTableIdentifier.getTablePath();
+} else {
+  if (absoluteTableIdentifier
+  .getCarbonTableIdentifier().getTableId().isEmpty()) {
+throw new RuntimeException("Table id is empty");
+  }
+  tablePath = lockPath + CarbonCommonConstants.FILE_SEPARATOR + 
absoluteTableIdentifier
+  .getCarbonTableIdentifier().getTableId();
+}
 if 
(lockTypeConfigured.equals(CarbonCommonConstants.CARBON_LOCK_TYPE_ZOOKEEPER)) {
-  return new ZooKeeperLocking(absoluteTableIdentifier, lockFile);
-} else if (tablePath.startsWith(CarbonCommonConstants.S3A_PREFIX) ||
-tablePath.startsWith(CarbonCommonConstants.S3N_PREFIX) ||
-tablePath.startsWith(CarbonCommonConstants.S3_PREFIX)) {
+  return new ZooKeeperLocking(tablePath, lockFile);
+} else if (tablePath.startsWith(CarbonCommonConstants.S3A_PREFIX) || 
tablePath
+.startsWith(CarbonCommonConstants.S3N_PREFIX) || tablePath
+.startsWith(CarbonCommonConstants.S3_PREFIX)) {
   lockTypeConfigured = CarbonCommonConstants.CARBON_LOCK_TYPE_S3;
-  return new S3FileLock(absoluteTableIdentifier, lockFile);
+  return new S3FileLock(tablePath,
--- End diff --

currently lock type is only used for Zookeeper locking other directly 
create the desired lock based on the path URI.


---


[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...

2018-07-05 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2406#discussion_r200553901
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java ---
@@ -52,44 +55,56 @@
*/
   public static ICarbonLock getCarbonLockObj(AbsoluteTableIdentifier 
absoluteTableIdentifier,
   String lockFile) {
-
-String tablePath = absoluteTableIdentifier.getTablePath();
+String tablePath;
+if (lockPath.isEmpty()) {
+  tablePath = absoluteTableIdentifier.getTablePath();
+} else {
+  if (absoluteTableIdentifier
+  .getCarbonTableIdentifier().getTableId().isEmpty()) {
+throw new RuntimeException("Table id is empty");
+  }
+  tablePath = lockPath + CarbonCommonConstants.FILE_SEPARATOR + 
absoluteTableIdentifier
+  .getCarbonTableIdentifier().getTableId();
+}
 if 
(lockTypeConfigured.equals(CarbonCommonConstants.CARBON_LOCK_TYPE_ZOOKEEPER)) {
-  return new ZooKeeperLocking(absoluteTableIdentifier, lockFile);
-} else if (tablePath.startsWith(CarbonCommonConstants.S3A_PREFIX) ||
-tablePath.startsWith(CarbonCommonConstants.S3N_PREFIX) ||
-tablePath.startsWith(CarbonCommonConstants.S3_PREFIX)) {
+  return new ZooKeeperLocking(tablePath, lockFile);
+} else if (tablePath.startsWith(CarbonCommonConstants.S3A_PREFIX) || 
tablePath
+.startsWith(CarbonCommonConstants.S3N_PREFIX) || tablePath
+.startsWith(CarbonCommonConstants.S3_PREFIX)) {
   lockTypeConfigured = CarbonCommonConstants.CARBON_LOCK_TYPE_S3;
-  return new S3FileLock(absoluteTableIdentifier, lockFile);
+  return new S3FileLock(tablePath,
--- End diff --

I think it is better to create s3 file lock as default for S3 files. We 
cannot be sure if HDFS is present or not


---


[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...

2018-07-05 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2406#discussion_r200553686
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/filesystem/S3CarbonFile.java
 ---
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.datastore.filesystem;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.RemoteIterator;
+
+public class S3CarbonFile extends HDFSCarbonFile {
+
+  private static final LogService LOGGER =
+  LogServiceFactory.getLogService(HDFSCarbonFile.class.getName());
+
+  public S3CarbonFile(String filePath) {
+super(filePath);
+  }
+
+  public S3CarbonFile(String filePath, Configuration hadoopConf) {
+super(filePath, hadoopConf);
+  }
+
+  public S3CarbonFile(Path path) {
+super(path);
+  }
+
+  public S3CarbonFile(Path path, Configuration hadoopConf) {
+super(path, hadoopConf);
+  }
+
+  public S3CarbonFile(FileStatus fileStatus) {
+super(fileStatus);
+  }
+
+  @Override
+  public boolean renameForce(String changetoName) {
+FileSystem fs;
+try {
+  fs = fileStatus.getPath().getFileSystem(hadoopConf);
+  return fs.rename(fileStatus.getPath(), new Path(changetoName));
+} catch (IOException e) {
+  LOGGER.error("Exception occured: " + e.getMessage());
+  return false;
+}
+  }
+
+  @Override
+  public DataOutputStream getDataOutputStreamUsingAppend(String path, 
FileFactory.FileType fileType)
+  throws IOException {
+return getDataOutputStream(path, fileType, 
CarbonCommonConstants.BYTEBUFFER_SIZE, true);
+  }
+
+  @Override public DataOutputStream getDataOutputStream(String path, 
FileFactory.FileType fileType,
+  int bufferSize, boolean append) throws IOException {
+Path pt = new Path(path);
+FileSystem fileSystem = 
pt.getFileSystem(FileFactory.getConfiguration());
+FSDataOutputStream stream;
+if (append) {
--- End diff --

HDFS client only takes care of append mode in case of DistributedFileSystem 
for other it will create with overwrite. Therefore we need to read the file and 
keep the contents in memory for overwriting the file.


---


[GitHub] carbondata pull request #2406: [CARBONDATA-2642] Added configurable Lock pat...

2018-07-05 Thread kunal642
Github user kunal642 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2406#discussion_r200553521
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/filesystem/S3CarbonFile.java
 ---
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.core.datastore.filesystem;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.util.CarbonUtil;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.LocatedFileStatus;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.RemoteIterator;
+
+public class S3CarbonFile extends HDFSCarbonFile {
+
+  private static final LogService LOGGER =
+  LogServiceFactory.getLogService(HDFSCarbonFile.class.getName());
+
+  public S3CarbonFile(String filePath) {
+super(filePath);
+  }
+
+  public S3CarbonFile(String filePath, Configuration hadoopConf) {
+super(filePath, hadoopConf);
+  }
+
+  public S3CarbonFile(Path path) {
+super(path);
+  }
+
+  public S3CarbonFile(Path path, Configuration hadoopConf) {
+super(path, hadoopConf);
+  }
+
+  public S3CarbonFile(FileStatus fileStatus) {
+super(fileStatus);
+  }
+
+  @Override
+  public boolean renameForce(String changetoName) {
+FileSystem fs;
--- End diff --

This is done because majority of our existing code uses rename force. To 
handle we will delete the destination directory and then recreate with new 
name. 


---


<    1   2   3   4   5   6   7   8   >