[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...

2019-01-09 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/3053
  
> Does this PR fix two problems?
> If it is yes, better to separate it into two.
> 

the one line change of rowId to rowId + 1 is coupled with this, when i 
removed the compress method in unSafeFixLengthColumnPage, i got this issue and 
fixed in this, so this is required in this PR only


---


[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...

2019-01-09 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/3053
  
@kumarvishal09 i have tested the fallback scenario by changing code, it is 
even failing with that also and i have raised discussion in snappy community 
also 
[https://groups.google.com/forum/#!topic/snappy-compression/4noNVKCMBqM](url)


---


[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...

2019-01-07 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/3053
  
> i think the performance of rawCompress is better than 
compressLong,compressInt, can we find the root cause of JVM crash?

i dont think there is much difference we get with timing, but problem is 
JVM crash happens randomly, since we get maxsizefor compression from snapy 
itself, we even allocated that memory sucessfully and passed the address to 
snappy, after that JVM crashed. This is very random. So better to remove that.


---


[GitHub] carbondata pull request #3053: [CARBONDATA-3233]Fix JVM crash issue in snapp...

2019-01-07 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3053#discussion_r245890379
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java
 ---
@@ -369,7 +367,7 @@ public BigDecimal getDecimal(int rowId) {
 
   @Override
   public double[] getDoublePage() {
-double[] data = new double[getPageSize()];
+double[] data = new double[getEndLoop()];
--- End diff --

during complex type enhancement, to convert value and to get the value, 
pageSize was changed and new method called getEndLoop added, this was missed 
for double, so when double datatype is there in complex type data miss match 
happens, this is handled here. you can refer #2417 


---


[GitHub] carbondata pull request #3053: [WIP]JVM crash issue in snappy compressor

2019-01-07 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/3053

[WIP]JVM crash issue in snappy compressor

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata jvmcrash

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3053.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3053


commit b50d1c8b9c69565231cabf1d5dd507a006312a19
Author: akashrn5 
Date:   2019-01-07T11:04:48Z

JVM crash issue in snappy compressor




---


[GitHub] carbondata issue #3044: [CARBONDATA-3202]Documentation for alter table colum...

2019-01-01 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/3044
  
@SteNicholas i did not get your point correctly, basicaly you are telling 
that,

1. column coment after column rename ?
2. whether spark 2.1 supports column rename?

am i right? if not, please can you explain it corectly, so that i can clear 
your doubt


---


[GitHub] carbondata pull request #3044: [CARBONDATA-3202]Documentation for alter tabl...

2019-01-01 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/3044

[CARBONDATA-3202]Documentation for alter table column rename

Add coumentation  for alter table column rename

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [x] Any interfaces changed?
 NA
 - [x] Any backward compatibility impacted?
 NA
 - [x] Document update required?
YES
 - [x] Testing done
NA
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata alter_doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3044.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3044


commit 0c5df2ef11d3e7376e7a71ef79af56d6311b822f
Author: akashrn5 
Date:   2019-01-02T06:39:05Z

Documentation for alter table column rename




---


[GitHub] carbondata issue #3027: [CARBONDATA-3202]update the schema to session catalo...

2018-12-28 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/3027
  
retest this please


---


[GitHub] carbondata issue #3027: [CARBONDATA-3202]update the schema to session catalo...

2018-12-28 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/3027
  
retest this please


---


[GitHub] carbondata issue #3027: [CARBONDATA-3202]update the schema to session catalo...

2018-12-28 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/3027
  
@manishgupta88 handled review coments, please review


---


[GitHub] carbondata pull request #3014: [CARBONDATA-3201] Added load level SORT_SCOPE

2018-12-26 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3014#discussion_r244092819
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableWithSortScope.scala
 ---
@@ -32,25 +32,6 @@ class TestCreateTableWithSortScope extends QueryTest 
with BeforeAndAfterAll {
 sql("DROP TABLE IF EXISTS tableWithBatchSort")
 sql("DROP TABLE IF EXISTS tableWithNoSort")
 sql("DROP TABLE IF EXISTS tableWithUnsupportSortScope")
-sql("DROP TABLE IF EXISTS tableLoadWithSortScope")
-  }
-
-  test("Do not support load data with specify sort scope") {
-sql(
-s"""
-   | CREATE TABLE tableLoadWithSortScope(
-   | intField INT,
-   | stringField STRING
-   | )
-   | STORED BY 'carbondata'
-   | TBLPROPERTIES('SORT_COLUMNS'='stringField')
-   """.stripMargin)
-
-val exception_loaddata_sortscope: Exception = intercept[Exception] {
-  sql("LOAD DATA LOCAL INPATH '/path/to/data' INTO TABLE 
tableLoadWithSortScope " +
-  "OPTIONS('SORT_SCOPE'='GLOBAL_SORT')")
-}
-assert(exception_loaddata_sortscope.getMessage.contains("Error: 
Invalid option(s): sort_scope"))
--- End diff --

please add a test case and you can give different sort scope in create 
tavble, load and property and  check the sort scope in describe formatted.


---


[GitHub] carbondata pull request #3014: [CARBONDATA-3201] Added load level SORT_SCOPE

2018-12-26 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3014#discussion_r244092650
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala
 ---
@@ -191,10 +191,17 @@ case class CarbonLoadDataCommand(
 optionsFinal
   .put("complex_delimiter_level_4",
 ComplexDelimitersEnum.COMPLEX_DELIMITERS_LEVEL_4.value())
-optionsFinal.put("sort_scope", 
tableProperties.asScala.getOrElse("sort_scope",
-  
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE,
-carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE,
-  CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT
+optionsFinal.put(
+  "sort_scope",
+  options.getOrElse(
+"sort_scope",
+tableProperties.asScala.getOrElse(
+  "sort_scope",
+  carbonProperty.getProperty(
+CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE,
+carbonProperty.getProperty(
+  CarbonCommonConstants.LOAD_SORT_SCOPE,
+  CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT)
--- End diff --

please handle for SDK and file format also


---


[GitHub] carbondata pull request #3014: [CARBONDATA-3201] Added load level SORT_SCOPE

2018-12-26 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/3014#discussion_r244092283
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala
 ---
@@ -191,10 +191,17 @@ case class CarbonLoadDataCommand(
 optionsFinal
   .put("complex_delimiter_level_4",
 ComplexDelimitersEnum.COMPLEX_DELIMITERS_LEVEL_4.value())
-optionsFinal.put("sort_scope", 
tableProperties.asScala.getOrElse("sort_scope",
-  
carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE,
-carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE,
-  CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT
+optionsFinal.put(
--- End diff --

please format the code


---


[GitHub] carbondata pull request #3027: [CARBONDATA-3202]update the schema to session...

2018-12-26 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/3027

[CARBONDATA-3202]update the schema to session catalog after add column, 
drop column and column rename

### Why this PR?

**Problem:**For alter table rename, once we change the table name in 
carbon, we fire alter table rename DDL using hive client. But for add, drop and 
column rename Spark does not support there features, but hive supports. so 
after rename, or add or drop column, the new updated schema is not updated in 
catalog.

**Solution:**We can directly call the spark API **alterTableDataSchema** by 
passing the updated schema, which in turn updates the shema in sessioncatalog

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [x] Any interfaces changed?
 YES
 - [x] Any backward compatibility impacted?
 NA
 - [x] Document update required?
NA
 - [x] Testing done
tested in three node cluster for various spark versions
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata addcolumn

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/3027.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3027


commit ee78136e55b309a4521914fdee57adf59cda8531
Author: akashrn5 
Date:   2018-12-27T06:01:44Z

update the schema to session catalog after add column, drop column and 
column rename




---


[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename

2018-12-20 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2990
  
@jackylk thank you 


---


[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename

2018-12-20 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2990
  
@jackylk handled comments, please review


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-20 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r243261978
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala
 ---
@@ -0,0 +1,324 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.schema
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import 
org.apache.spark.sql.execution.command.{AlterTableDataTypeChangeModel, 
DataTypeInfo,
+  MetadataCommand}
+import org.apache.spark.sql.hive.CarbonSessionCatalog
+import org.apache.spark.util.AlterTableUtil
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.features.TableOperation
+import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage}
+import 
org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl
+import org.apache.carbondata.core.metadata.datatype.DecimalType
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn
+import 
org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent,
+  AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, 
OperationListenerBus}
+import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, 
TableInfo}
+import org.apache.carbondata.spark.util.DataTypeConverterUtil
+
+abstract class CarbonAlterTableColumnRenameCommand(oldColumnName: String, 
newColumnName: String)
+  extends MetadataCommand {
+
+  protected def validColumnsForRenaming(carbonColumns: 
mutable.Buffer[CarbonColumn],
+  oldCarbonColumn: CarbonColumn,
+  carbonTable: CarbonTable): Unit = {
+// check whether new column name is already an existing column name
+if 
(carbonColumns.exists(_.getColName.equalsIgnoreCase(newColumnName))) {
+  throw new MalformedCarbonCommandException(s"Column Rename Operation 
failed. New " +
+s"column name 
$newColumnName already exists" +
+s" in table ${ 
carbonTable.getTableName }")
+}
+
+// if the column rename is for complex column, block the operation
+if (oldCarbonColumn.isComplex) {
+  throw new MalformedCarbonCommandException(s"Column Rename Operation 
failed. Rename " +
+s"column is unsupported 
for complex datatype " +
+s"column ${ 
oldCarbonColumn.getColName }")
+}
+
+// if column rename operation is on partition column, then fail the 
rename operation
+if (null != carbonTable.getPartitionInfo) {
+  val partitionColumns = 
carbonTable.getPartitionInfo.getColumnSchemaList
+  partitionColumns.asScala.foreach {
+col =>
+  if (col.getColumnName.equalsIgnoreCase(oldColumnName)) {
+throw new MalformedCarbonCommandException(
+  s"Column Rename Operation failed. Renaming " +
+  s"the partition column $newColumnName is not " +
+  s"allowed")
+  }
+  }
+}
+
+  }
+}
+
+private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand(
+alterTableColRenameAndDataTypeChangeModel: 
AlterTableDataTypeChangeModel,
+childTableColumnRename: Boolean = false)
+  extends 
CarbonAlterTableColumnRenameCommand(alterTableColRenameAndDataTypeChangeModel.columnName,
+alterTableColRenameAndDataTypeChangeModel.newColu

[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-20 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r243260747
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -1511,7 +1514,16 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 }
 DataTypeInfo("decimal", precision, scale)
   case _ =>
-throw new MalformedCarbonCommandException("Data type provided is 
invalid.")
+if (isColumnRename) {
+  dataType match {
--- End diff --

ok, refactored, if rename operation then just return the datatypeInfo 
object, else throw invalid datatype exception as before


---


[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename

2018-12-19 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2990
  
@manishgupta88 handled, please review


---


[GitHub] carbondata issue #2990: [CARBONDATA-3149]Support alter table column rename

2018-12-19 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2990
  
@manishgupta88 handled comments, please review


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-16 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242030254
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -1464,31 +1464,46 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
* @param values
* @return
*/
-  def parseDataType(dataType: String, values: Option[List[(Int, Int)]]): 
DataTypeInfo = {
+  def parseDataType(
+  dataType: String,
+  values: Option[List[(Int, Int)]],
+  isColumnRename: Boolean): DataTypeInfo = {
+def validateAndGetDecimalDatatype: DataTypeInfo = {
+  var precision: Int = 0
+  var scale: Int = 0
+  if (values.isDefined) {
+precision = values.get(0)._1
+scale = values.get(0)._2
+  } else {
+throw new MalformedCarbonCommandException("Decimal format provided 
is invalid")
+  }
+  // precision should be > 0 and <= 38 and scale should be >= 0 and <= 
38
+  if (precision < 1 || precision > 38) {
--- End diff --

since the number is fixed, i havent changed the old code, no need to define 
constants, so i kept teh same as old


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-16 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242026814
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala
 ---
@@ -0,0 +1,345 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.schema
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import 
org.apache.spark.sql.execution.command.{AlterTableColRenameAndDataTypeChangeModel,
 DataTypeInfo, MetadataCommand}
+import org.apache.spark.sql.hive.CarbonSessionCatalog
+import org.apache.spark.util.AlterTableUtil
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.features.TableOperation
+import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage}
+import 
org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn
+import 
org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent, 
AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, 
OperationListenerBus}
+import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, 
TableInfo}
+import org.apache.carbondata.spark.util.DataTypeConverterUtil
+
+private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand(
+alterTableColRenameAndDataTypeChangeModel: 
AlterTableColRenameAndDataTypeChangeModel)
+  extends MetadataCommand {
+
+  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+val tableName = alterTableColRenameAndDataTypeChangeModel.tableName
+val dbName = alterTableColRenameAndDataTypeChangeModel.databaseName
+  .getOrElse(sparkSession.catalog.currentDatabase)
+var isColumnRenameOnly = false
+var isDataTypeChangeOnly = false
+var isBothColRenameAndDataTypeChange = false
+setAuditTable(dbName, tableName)
+setAuditInfo(Map(
+  "column" -> alterTableColRenameAndDataTypeChangeModel.columnName,
+  "newColumn" -> 
alterTableColRenameAndDataTypeChangeModel.newColumnName,
+  "newType" -> 
alterTableColRenameAndDataTypeChangeModel.dataTypeInfo.dataType))
+val locksToBeAcquired = List(LockUsage.METADATA_LOCK, 
LockUsage.COMPACTION_LOCK)
+var locks = List.empty[ICarbonLock]
+// get the latest carbon table and check for column existence
+var carbonTable: CarbonTable = null
+var timeStamp = 0L
+try {
+  locks = AlterTableUtil
+.validateTableAndAcquireLock(dbName, tableName, 
locksToBeAcquired)(sparkSession)
+  val metastore = CarbonEnv.getInstance(sparkSession).carbonMetastore
+  carbonTable = CarbonEnv.getCarbonTable(Some(dbName), 
tableName)(sparkSession)
+  if (!carbonTable.canAllow(carbonTable, 
TableOperation.ALTER_COL_RENAME_AND_CHANGE_DATATYPE,
+alterTableColRenameAndDataTypeChangeModel.columnName)) {
+throw new MalformedCarbonCommandException(
+  "alter table change datatype or column rename is not supported 
for index datamap")
+  }
+  val operationContext = new OperationContext
+  val alterTableColRenameAndDataTypeChangePreEvent =
+AlterTableColRenameAndDataTypeChangePreEvent(sparkSession, 
carbonTable,
+  alterTableColRenameAndDataTypeChangeModel)
+  OperationListenerBus.getInstance()
+.fireEvent(alterTableColRenameAndDa

[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-16 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242026928
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
 ---
@@ -286,12 +286,16 @@ class CarbonFileMetastore extends CarbonMetaStore {
   newTableIdentifier: CarbonTableIdentifier,
   oldTableIdentifier: CarbonTableIdentifier,
   thriftTableInfo: org.apache.carbondata.format.TableInfo,
-  schemaEvolutionEntry: SchemaEvolutionEntry,
+  schemaEvolutionEntry: List[SchemaEvolutionEntry],
--- End diff --

yes, can change to schemaEvolutionEntryList


---


[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-16 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2990#discussion_r242026723
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableColRenameDataTypeChangeCommand.scala
 ---
@@ -0,0 +1,345 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.schema
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import 
org.apache.spark.sql.execution.command.{AlterTableColRenameAndDataTypeChangeModel,
 DataTypeInfo, MetadataCommand}
+import org.apache.spark.sql.hive.CarbonSessionCatalog
+import org.apache.spark.util.AlterTableUtil
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.features.TableOperation
+import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage}
+import 
org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn
+import 
org.apache.carbondata.events.{AlterTableColRenameAndDataTypeChangePostEvent, 
AlterTableColRenameAndDataTypeChangePreEvent, OperationContext, 
OperationListenerBus}
+import org.apache.carbondata.format.{ColumnSchema, SchemaEvolutionEntry, 
TableInfo}
+import org.apache.carbondata.spark.util.DataTypeConverterUtil
+
+private[sql] case class CarbonAlterTableColRenameDataTypeChangeCommand(
+alterTableColRenameAndDataTypeChangeModel: 
AlterTableColRenameAndDataTypeChangeModel)
+  extends MetadataCommand {
+
+  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
+val tableName = alterTableColRenameAndDataTypeChangeModel.tableName
+val dbName = alterTableColRenameAndDataTypeChangeModel.databaseName
+  .getOrElse(sparkSession.catalog.currentDatabase)
+var isColumnRenameOnly = false
+var isDataTypeChangeOnly = false
+var isBothColRenameAndDataTypeChange = false
+setAuditTable(dbName, tableName)
+setAuditInfo(Map(
+  "column" -> alterTableColRenameAndDataTypeChangeModel.columnName,
+  "newColumn" -> 
alterTableColRenameAndDataTypeChangeModel.newColumnName,
+  "newType" -> 
alterTableColRenameAndDataTypeChangeModel.dataTypeInfo.dataType))
+val locksToBeAcquired = List(LockUsage.METADATA_LOCK, 
LockUsage.COMPACTION_LOCK)
+var locks = List.empty[ICarbonLock]
+// get the latest carbon table and check for column existence
+var carbonTable: CarbonTable = null
+var timeStamp = 0L
+try {
+  locks = AlterTableUtil
+.validateTableAndAcquireLock(dbName, tableName, 
locksToBeAcquired)(sparkSession)
+  val metastore = CarbonEnv.getInstance(sparkSession).carbonMetastore
+  carbonTable = CarbonEnv.getCarbonTable(Some(dbName), 
tableName)(sparkSession)
+  if (!carbonTable.canAllow(carbonTable, 
TableOperation.ALTER_COL_RENAME_AND_CHANGE_DATATYPE,
+alterTableColRenameAndDataTypeChangeModel.columnName)) {
+throw new MalformedCarbonCommandException(
+  "alter table change datatype or column rename is not supported 
for index datamap")
+  }
+  val operationContext = new OperationContext
+  val alterTableColRenameAndDataTypeChangePreEvent =
+AlterTableColRenameAndDataTypeChangePreEvent(sparkSession, 
carbonTable,
+  alterTableColRenameAndDataTypeChangeModel)
+  OperationListenerBus.getInstance()
+.fireEvent(alterTableColRenameAndDa

[GitHub] carbondata pull request #2990: [CARBONDATA-3149]Support alter table column r...

2018-12-14 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/2990

[CARBONDATA-3149]Support alter table column rename

### Why this PR?
This PR is to support column rename feature in carbondata. Carbon already 
supports datatype change, alter table add column and drop column. This PR uses 
same DDL as datatype change and supports the column rename.

Any column canbe 


Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata alter_rename

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2990.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2990


commit e84b09dd244803746fcacd3e9c6eda4105dd7bef
Author: akashrn5 
Date:   2018-12-14T11:20:09Z

Support alter table column rename




---


[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

2018-12-04 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r238658593
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
--- End diff --

change the comment as starndard doc


---


[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

2018-12-04 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r238657955
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
+   * return a byte array
+   */
+  private byte[] compressData(byte[] data) {
+
+ByteArrayOutputStream byteArrayOutputStream = new 
ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzipCompressorOutputStream =
+  new GzipCompressorOutputStream(byteArrayOutputStream);
+  try {
+gzipCompressorOutputStream.write(data);
+  } catch (IOException e) {
+throw new RuntimeException("Error during Compression step " + 
e.getMessage());
+  } finally {
+gzipCompressorOutputStream.close();
+  }
+} catch (IOException e) {
+  throw new RuntimeException("Error during Compression step " + 
e.getMessage());
+}
+
+return byteArrayOutputStream.toByteArray();
+  }
+
+  /*
+   * Method called for decompressing the data and
+   * return a byte array
+   */
+  private byte[] decompressData(byte[] data) {
+
+ByteArrayInputStream byteArrayOutputStream = new 
ByteArrayInputStream(data);
+ByteArrayOutputStream byteOutputStream = new ByteArrayOutputStream();
+
--- End diff --

remove empty line


---


[GitHub] carbondata pull request #2967: [CARBONDATA-3140]Block create like table comm...

2018-11-30 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/2967

[CARBONDATA-3140]Block create like table command in carbon

### Why this PR?
when like table is created using carbon table as source table, and the new 
table is dropped, it deletes the source table in spark-2.1 and works fine in 
other. Blocking the create like command for carbon table


Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata like

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2967.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2967


commit cc8dbbf337976bae9e3a4a43b317c853b783fcff
Author: akashrn5 
Date:   2018-11-30T10:31:34Z

Block create like table command in carbon




---


[GitHub] carbondata issue #2967: [CARBONDATA-3140]Block create like table command in ...

2018-11-30 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2967
  
@KanakaKumar @ravipesala please review


---


[GitHub] carbondata pull request #2967: [CARBONDATA-3140]Block create like table comm...

2018-11-30 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2967#discussion_r237817147
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/memory/UnsafeMemoryManager.java 
---
@@ -129,7 +129,7 @@ private synchronized MemoryBlock 
allocateMemory(MemoryType memoryType, String ta
   memoryBlock = MemoryAllocator.HEAP.allocate(memoryRequested);
   if (LOGGER.isDebugEnabled()) {
 LOGGER.debug(String
-.format("Creating onheap working Memory block (%d) with size: 
", memoryBlock.size()));
+.format("Creating onheap working Memory block with size: 
(%d)", memoryBlock.size()));
--- End diff --

corrected the error message here


---


[GitHub] carbondata pull request #2791: [WIP][HOTFIX]correct the exception handling i...

2018-11-29 Thread akashrn5
Github user akashrn5 closed the pull request at:

https://github.com/apache/carbondata/pull/2791


---


[GitHub] carbondata pull request #2953: [CARBONDATA-3132]Correct the task disrtibutio...

2018-11-27 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2953#discussion_r236646439
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala
 ---
@@ -401,7 +401,11 @@ class CarbonMergerRDD[K, V](
   .add(new CarbonInputSplitTaskInfo(entry._1, 
entry._2).asInstanceOf[Distributable])
 )
 
-val nodeBlockMap = CarbonLoaderUtil.nodeBlockMapping(taskInfoList, -1)
+// get all the active nodes of cluster and prepare the nodeBlockMap 
based on these nodes
+val activeNodes = DistributionUtil
+  .ensureExecutorsAndGetNodeList(taskInfoList.asScala, sparkContext)
+
+val nodeBlockMap = CarbonLoaderUtil.nodeBlockMapping(taskInfoList, -1, 
activeNodes.asJava)
--- End diff --

done


---


[GitHub] carbondata pull request #2953: [CARBONDATA-3132]Correct the task disrtibutio...

2018-11-26 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2953#discussion_r236552294
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -536,7 +537,8 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
*/
   public static Map> 
nodeBlockMapping(List blockInfos) {
 // -1 if number of nodes has to be decided based on block location 
information
-return nodeBlockMapping(blockInfos, -1);
+return nodeBlockMapping(blockInfos, -1, null,
--- End diff --

here i just did the refactoring


---


[GitHub] carbondata pull request #2953: [CARBONDATA-3132]Correct the task disrtibutio...

2018-11-26 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2953#discussion_r236551537
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
 ---
@@ -536,7 +537,8 @@ public static Dictionary 
getDictionary(AbsoluteTableIdentifier absoluteTableIden
*/
   public static Map> 
nodeBlockMapping(List blockInfos) {
 // -1 if number of nodes has to be decided based on block location 
information
-return nodeBlockMapping(blockInfos, -1);
+return nodeBlockMapping(blockInfos, -1, null,
--- End diff --

yes, in compaction case `BlockAssignmentStrategy.BLOCK_NUM_FIRST` is 
default , same as before


---


[GitHub] carbondata pull request #2953: [CARBONDATA-3132]Correct the task disrtibutio...

2018-11-26 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2953#discussion_r236537747
  
--- Diff: 
tools/cli/src/test/java/org/apache/carbondata/tool/CarbonCliTest.java ---
@@ -205,7 +206,7 @@ public void testSummaryOutputAll() {
 expectedOutput = buildLines(
 "## version Details",
 "written_by  Version ",
-"TestUtil1.6.0-SNAPSHOT  ");
+"TestUtil"+ CarbonVersionConstants.CARBONDATA_VERSION+"  ");
--- End diff --

removed the harded value here


---


[GitHub] carbondata pull request #2953: [CARBONDATA-3132]Correct the task disrtibutio...

2018-11-26 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/2953

[CARBONDATA-3132]Correct the task disrtibution in case of compaction when 
the actual block nodes and active nodes are different

### Why This PR?
There is an unequal distribution of tasks during compaction
ex: When the load is done using replication factor 2 and all nodes are 
active and during compaction one node is down, basically it is not active 
executor, so the task distribution should take care to distribute the tasks 
equally among all the active executors instead of giving more tasks to single 
executor and less to other executor. But sometimes the unequal distribution 
happens and the compaction becomes sow.

### Solution
Currently we are not getting active executors before the node block mapping 
and sending the list of active executors as null, which will lead to the above 
problem sometimes. so get the active executors and send for node block mapping 
logic which will handle to distribute equally.



Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [x] Any interfaces changed?
 NA
 - [x] Any backward compatibility impacted?
 NA
 - [x] Document update required?
NA

 - [x] Testing done
tested using 3 node and 6 node cluster
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata distribute

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2953.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2953


commit caaa044b461a64f28be6107e55e8ab2a1ae2bf4d
Author: brijoobopanna 
Date:   2018-11-27T06:38:15Z

Correct the task disrtibution in case of compaction when the actual block 
nodes and active nodes are different




---


[GitHub] carbondata pull request #2925: [CARBONDATA-3102] Fix NoClassDefFoundError wh...

2018-11-20 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2925#discussion_r235015512
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala 
---
@@ -157,8 +157,8 @@ object S3Example {
   }
 
   def getSparkMaster(args: Array[String]): String = {
-  if (args.length == 5) args(4)
-  else if (args(3).contains("spark:") || args(3).contains("mesos:")) 
args(3)
-  else "local"
-}
--- End diff --

revert the unnecessary changes


---


[GitHub] carbondata issue #2928: [CARBONDATA-3106] Written_by_APPNAME not serialized ...

2018-11-19 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2928
  
In `CarbonFactDataWriterImplV3.java` , `writeFootrToFile` , if appName 
after getting from carbonProperties is null,you can fail the dataload with 
dataload exception, please handle that also


---


[GitHub] carbondata pull request #2872: [WIP] Added reusable buffer code

2018-11-16 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2872#discussion_r234233366
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveIntegralCodec.java
 ---
@@ -119,24 +120,34 @@ public ColumnPage decode(byte[] input, int offset, 
int length)
 return LazyColumnPage.newPage(page, converter);
   }
 
-  @Override
-  public void decodeAndFillVector(byte[] input, int offset, int length,
-  ColumnVectorInfo vectorInfo, BitSet nullBits, boolean 
isLVEncoded)
-  throws MemoryException, IOException {
-ColumnPage page = null;
-if (DataTypes.isDecimal(meta.getSchemaDataType())) {
-  page = ColumnPage.decompressDecimalPage(meta, input, offset, 
length);
-  vectorInfo.decimalConverter = ((DecimalColumnPage) 
page).getDecimalConverter();
+  @Override public void decodeAndFillVector(byte[] input, int offset, 
int length,
+  ColumnVectorInfo vectorInfo, BitSet nullBits, boolean 
isLVEncoded, int pageSize,
+  ReusableDataBuffer reusableDataBuffer) throws MemoryException, 
IOException {
+Compressor compressor =
+
CompressorFactory.getInstance().getCompressor(meta.getCompressorName());
+byte[] unCompressData;
+if (null != reusableDataBuffer && 
compressor.supportReusableBuffer()) {
+  int uncompressedLength = compressor.unCompressedLength(input, 
offset, length);
+  unCompressData = 
reusableDataBuffer.getDataBuffer(uncompressedLength);
 } else {
-  page = ColumnPage.decompress(meta, input, offset, length, 
isLVEncoded);
+  unCompressData = compressor.unCompressByte(input, offset, 
length);
 }
-page.setNullBits(nullBits);
-converter.decodeAndFillVector(page, vectorInfo);
+compressor.rawUncompress(input, offset, length, unCompressData);
--- End diff --

i think, when ZSTD compressor is configured, it will throw unsupported 
exception, plese check and handle for `AdaptiveIntegralCodec` and 
`AdaptiveDeltaIntegralCodec`


---


[GitHub] carbondata issue #2922: [HOTFIX]s3 lock file fix

2018-11-15 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2922
  
@jackylk please review 


---


[GitHub] carbondata issue #2922: [HOTFIX]s3 lock file fix

2018-11-15 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2922
  
@jackylk CARBONDATA-3103 is different issue


---


[GitHub] carbondata pull request #2922: [WIP]s3 lock file fix

2018-11-14 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/2922

[WIP]s3 lock file fix

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata s3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2922.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2922


commit a34f191f8e9026c5d6950ea86962ead0282d957f
Author: akashrn5 
Date:   2018-11-14T16:23:39Z

s3 lock file fix




---


[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaul...

2018-11-12 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2886
  
retest this please


---


[GitHub] carbondata issue #2903: [CARBONDATA-3084]dataload failure fix when float val...

2018-11-12 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2903
  
retest this please


---


[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaul...

2018-11-11 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2886
  
@ravipesala please review the documentation commit added


---


[GitHub] carbondata pull request #2903: [CARBONDATA-3084]dataload failure fix when fl...

2018-11-05 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/2903

[CARBONDATA-3084]dataload failure fix when float value exceeds the limit

### Problem:
when the float value exceeds the range and we try to insert that data, data 
load fails.

### Analysis:
when the value exceeds the range, the max is set as `Infinity`, so the 
decimal count of that value will be 0, so when decimal count is zero we go for 
CodecByAlgorithmForIntegral, so it fails

### Solution:
when the value exceeds, and decimal count is zero , source datatype is 
float, then select DirectCompressCodec

### How this tested:
test cases are added to validate the load and data

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata float

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2903.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2903


commit 798592441f77300dd89b1dec4dd51c15bc6b2c07
Author: akashrn5 
Date:   2018-11-06T07:06:24Z

dataload failure fix when float value exceeds the limit




---


[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaul...

2018-11-05 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2886
  
2.2.1 and 2.3.1 failed test cases, which are not related and these are 
random


---


[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaul...

2018-11-05 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2886
  
retest this please


---


[GitHub] carbondata pull request #2886: [CARBONDATA-3065]make inverted index false by...

2018-11-05 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2886#discussion_r230684559
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestNoInvertedIndexLoadAndQuery.scala
 ---
@@ -289,6 +292,44 @@ class TestNoInvertedIndexLoadAndQuery extends 
QueryTest with BeforeAndAfterAll {
 checkAnswer(sql("""select c2 from testNull where c2 is null"""), 
Seq(Row(null), Row(null), Row(null), Row(null), Row(null), Row(null)))
   }
 
+  test("inverted index with Dictionary_EXCLUDE and INVERTED_INDEX") {
+sql("drop table if exists index1")
+sql(
+  """
+   CREATE TABLE IF NOT EXISTS index1
+   (id Int, name String, city String)
+   STORED BY 'org.apache.carbondata.format'
+   
TBLPROPERTIES('DICTIONARY_EXCLUDE'='city','INVERTED_INDEX'='city')
+  """)
+sql(
+  s"""
+   LOAD DATA LOCAL INPATH '$testData1' into table index1
+   """)
+checkAnswer(
+  sql(
+"""
+   SELECT * FROM index1 WHERE city = "Bangalore"
+"""),
+  Seq(Row(19.0, "Emily", "Bangalore")))
--- End diff --

done


---


[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaul...

2018-11-05 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2886
  
@ravipesala
in existing flow, all the sort columns and dimensions will have inverted 
index and,
 in current implementation, if measure columns are given in inverted_index 
property, then i ll not set it as inverted index. Basically i wont throw any 
exception. Do we need to throw exception?
and currently im allowing the inverted index for all dimension columns and 
not restricting to sort columns only. 
So basically, user can mention columns in inverted_index columns 
(dimensions), i am setting those as inverted index. it is correct right


---


[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

2018-11-04 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r230637603
  
--- Diff: 
tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java ---
@@ -311,26 +309,39 @@ private void printColumnStats(String columnName) 
throws IOException, MemoryExcep
 max = new String(blockletMax, 
Charset.forName(DEFAULT_CHARSET));
   }
 } else {
-  minPercent = String.format("%.1f", 
blocklet.getColumnChunk().getMinPercentage() * 100);
-  maxPercent = String.format("%.1f", 
blocklet.getColumnChunk().getMaxPercentage() * 100);
+  // for complex columns min and max and percentage
+  if 
(blocklet.getColumnChunk().column.getColumnName().contains(".val") ||
+  
blocklet.getColumnChunk().column.getColumnName().contains(".")) {
+minPercent = "NA";
+maxPercent = "NA";
+  } else {
+minPercent =
+String.format("%.1f", 
Math.abs(blocklet.getColumnChunk().getMinPercentage() * 100));
+maxPercent =
+String.format("%.1f", 
Math.abs(blocklet.getColumnChunk().getMaxPercentage() * 100));
+  }
   DataFile.ColumnChunk columnChunk = blocklet.columnChunk;
-  if (columnChunk.column.isDimensionColumn() && DataTypeUtil
+  // need to consider no  dictionary complex column
+  if (columnChunk.column.hasEncoding(Encoding.DICTIONARY) || 
blocklet
+  .getColumnChunk().column.getColumnName().contains(".val") || 
blocklet
--- End diff --

can have a method which tells the column is complex column based on name, 
we already have method which tells the column by datatype, and for dictionary 
include and complex type, togetther no need to check, because again for child 
columns need to have other method, as we cant give child columns in dictionary


---


[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

2018-11-04 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r230637639
  
--- Diff: 
tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java ---
@@ -311,26 +309,39 @@ private void printColumnStats(String columnName) 
throws IOException, MemoryExcep
 max = new String(blockletMax, 
Charset.forName(DEFAULT_CHARSET));
   }
 } else {
-  minPercent = String.format("%.1f", 
blocklet.getColumnChunk().getMinPercentage() * 100);
-  maxPercent = String.format("%.1f", 
blocklet.getColumnChunk().getMaxPercentage() * 100);
+  // for complex columns min and max and percentage
+  if 
(blocklet.getColumnChunk().column.getColumnName().contains(".val") ||
+  
blocklet.getColumnChunk().column.getColumnName().contains(".")) {
+minPercent = "NA";
+maxPercent = "NA";
+  } else {
+minPercent =
+String.format("%.1f", 
Math.abs(blocklet.getColumnChunk().getMinPercentage() * 100));
+maxPercent =
+String.format("%.1f", 
Math.abs(blocklet.getColumnChunk().getMaxPercentage() * 100));
+  }
   DataFile.ColumnChunk columnChunk = blocklet.columnChunk;
-  if (columnChunk.column.isDimensionColumn() && DataTypeUtil
+  // need to consider no  dictionary complex column
+  if (columnChunk.column.hasEncoding(Encoding.DICTIONARY) || 
blocklet
+  .getColumnChunk().column.getColumnName().contains(".val") || 
blocklet
--- End diff --

handled, please review


---


[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

2018-11-04 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r230637383
  
--- Diff: 
examples/spark2/src/main/java/org/apache/carbondata/examples/sdk/CarbonReaderExample.java
 ---
@@ -61,7 +61,8 @@ public static void main(String[] args) {
 CarbonWriter writer = CarbonWriter.builder()
 .outputPath(path)
 .withLoadOptions(map)
-.withCsvInput(new Schema(fields)).build();
+.withCsvInput(new Schema(fields))
+.writtenBy("CarbonReaderExample").build();
--- End diff --

done, actually initially it was like that, and im using carbon formatting 
only, but i do not know why it is formatting like this, need to once check the 
xml


---


[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

2018-11-02 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r230340585
  
--- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataFile.java 
---
@@ -453,7 +455,16 @@ private double computePercentage(byte[] data, byte[] 
min, byte[] max, ColumnSche
 return dataValue.divide(factorValue).doubleValue();
   }
   double dataValue, minValue, factorValue;
-  if (column.getDataType() == DataTypes.SHORT) {
+  if (columnChunk.column.isDimensionColumn() && DataTypeUtil
--- End diff --

done


---


[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

2018-11-02 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r230319095
  
--- Diff: 
examples/spark2/src/main/java/org/apache/carbondata/examples/sdk/CarbonReaderExample.java
 ---
@@ -61,7 +61,7 @@ public static void main(String[] args) {
 CarbonWriter writer = CarbonWriter.builder()
 .outputPath(path)
 .withLoadOptions(map)
-.withCsvInput(new Schema(fields)).build();
+.withCsvInput(new 
Schema(fields)).writtenBy("CarbonReaderExample").build();
--- End diff --

done


---


[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

2018-11-02 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r230319129
  
--- Diff: 
tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java ---
@@ -314,23 +312,26 @@ private void printColumnStats(String columnName) 
throws IOException, MemoryExcep
   minPercent = String.format("%.1f", 
blocklet.getColumnChunk().getMinPercentage() * 100);
   maxPercent = String.format("%.1f", 
blocklet.getColumnChunk().getMaxPercentage() * 100);
   DataFile.ColumnChunk columnChunk = blocklet.columnChunk;
-  if (columnChunk.column.isDimensionColumn() && DataTypeUtil
+  if (columnChunk.column.hasEncoding(Encoding.DICTIONARY) || 
blocklet
+  .getColumnChunk().column.getColumnName().contains(".val") || 
blocklet
--- End diff --

done


---


[GitHub] carbondata pull request #2886: [CARBONDATA-3065]make inverted index false by...

2018-11-02 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2886#discussion_r230309345
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -359,8 +359,13 @@ private CarbonCommonConstants() {
   public static final String TABLE_BLOCKSIZE = "table_blocksize";
   // table blocklet size in MB
   public static final String TABLE_BLOCKLET_SIZE = "table_blocklet_size";
-  // set in column level to disable inverted index
+  /**
+   * set in column level to disable inverted index
+   * @Deprecated :This property is deprecated, it is kep just for 
compatibility
--- End diff --

done


---


[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaut

2018-11-02 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2886
  
@kevinjmh basically, with my changes the column level sorting will be 
skipped and only row level sorting will be done in sort step.


---


[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaut

2018-11-02 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2886
  
> @akashrn5 Please expose these properties from SDK and fileformat as well.

@ravipesala handed for SDK and fileformat, please review


---


[GitHub] carbondata issue #2886: [CARBONDATA-3065]make inverted index false by defaut

2018-11-01 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2886
  
@kevinjmh 
1. when you set in INVERTED_INDEX , but not in SORT_COLUMNS, then data will 
not be sorted, only RLE will be applied on data.
2. `isInvertedIndex ` basically this boolean cannot say that, it is do RLE, 
basically RLE will be applied on both inverted and no inverted case, but after 
checking `isInvertedIndex ` it decides to sort based on isSort check



---


[GitHub] carbondata pull request #2888: [CARBONDATA-3066]add documentation for writte...

2018-11-01 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2888#discussion_r230272135
  
--- Diff: docs/sdk-guide.md ---
@@ -429,6 +429,15 @@ public CarbonWriterBuilder 
withAvroInput(org.apache.avro.Schema avroSchema);
 public CarbonWriterBuilder withJsonInput(Schema carbonSchema);
 ```
 
+```
+/**
+* To support writing the ApplicationName which is writing the carbondata 
file
+* @param application name which is writing the carbondata files
+* @return CarbonWriterBuilder
--- End diff --

this is mandatory because SDK can be used by different application, one 
application can write and other can read this, so we store this info, during 
load case we take from spark application name, SDK needs to provide this


---


[GitHub] carbondata pull request #2888: [CARBONDATA-3066]add documentation for writte...

2018-11-01 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2888#discussion_r230270106
  
--- Diff: docs/sdk-guide.md ---
@@ -124,7 +124,7 @@ public class TestSdkAvro {
 try {
   CarbonWriter writer = CarbonWriter.builder()
   .outputPath(path)
-  .withAvroInput(new 
org.apache.avro.Schema.Parser().parse(avroSchema)).build();
+  .withAvroInput(new 
org.apache.avro.Schema.Parser().parse(avroSchema))..writtenBy("SDK").build();
--- End diff --

done


---


[GitHub] carbondata pull request #2888: [CARBONDATA-3066]add documentation for writte...

2018-11-01 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2888#discussion_r230270097
  
--- Diff: docs/sdk-guide.md ---
@@ -686,6 +695,16 @@ Find example code at 
[CarbonReaderExample](https://github.com/apache/carbondata/
   public static Schema readSchemaInIndexFile(String indexFilePath);
 ```
 
+```
+  /**
+   * This method return the version details in formatted string by reading 
from carbondata file
+   * @param dataFilePath complete path including carbondata file name
+   * @return string with information of who has written this file in which 
carbondata project version
--- End diff --

done


---


[GitHub] carbondata pull request #2888: [CARBONDATA-3066]add documentation for writte...

2018-10-31 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/2888

[CARBONDATA-3066]add documentation for writtenBy and getVersionDetails APIs 
in SDK

This PR adds the documentation for new APIs added in SDK 
builder API-   writtenBy()
reader API-   getVersionDetails()

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata sdk_doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2888.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2888


commit cf6b6ebe4b8f05695d8beea79ec677a6ce500044
Author: akashrn5 
Date:   2018-10-31T14:43:02Z

add documentation for writtenBy and getVersionDetails APIs in SDK




---


[GitHub] carbondata pull request #2886: [WIP]make inverted index false by defaut

2018-10-31 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/2886

[WIP]make inverted index false by defaut

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata 
disable_inverted_index

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2886.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2886


commit c8b434294ee52ccd1a5445ce694f99f738908b0c
Author: akashrn5 
Date:   2018-10-31T08:43:48Z

make inverted index false by defaut




---


[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

2018-10-30 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r229300837
  
--- Diff: 
tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java ---
@@ -314,23 +312,26 @@ private void printColumnStats(String columnName) 
throws IOException, MemoryExcep
   minPercent = String.format("%.1f", 
blocklet.getColumnChunk().getMinPercentage() * 100);
   maxPercent = String.format("%.1f", 
blocklet.getColumnChunk().getMaxPercentage() * 100);
   DataFile.ColumnChunk columnChunk = blocklet.columnChunk;
-  if (columnChunk.column.isDimensionColumn() && DataTypeUtil
+  if (columnChunk.column.hasEncoding(Encoding.DICTIONARY) || 
blocklet
+  .getColumnChunk().column.getColumnName().contains(".val") || 
blocklet
--- End diff --

this will be for no dictionary colmplex column, for complex column min max 
can be shown as NA, that will be ok right


---


[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

2018-10-30 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2882#discussion_r229297954
  
--- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/DataFile.java 
---
@@ -443,7 +444,8 @@ void computePercentage(byte[] shardMin, byte[] 
shardMax) {
  * @return result
  */
 private double computePercentage(byte[] data, byte[] min, byte[] max, 
ColumnSchema column) {
-  if (column.getDataType() == DataTypes.STRING) {
+  if (column.getDataType() == DataTypes.STRING || column.getDataType() 
== DataTypes.BOOLEAN
+  || column.hasEncoding(Encoding.DICTIONARY)) {
--- End diff --

yes, but min max will be surrogate keys right, showing min and max as 
dictionary value is not useful right


---


[GitHub] carbondata pull request #2861: [CARBONDATA-3025]handle passing spark appname...

2018-10-30 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2861#discussion_r229294323
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
 ---
@@ -175,6 +172,11 @@ with Serializable {
   dataSchema: StructType,
   context: TaskAttemptContext): OutputWriter = {
 val model = 
CarbonTableOutputFormat.getLoadModel(context.getConfiguration)
+val appName = 
context.getConfiguration.get(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME)
+if (null != appName) {
--- End diff --

actually the appname will be always set ,spark will always set the appname, 
this check is added for one of the test cases , i will remove this


---


[GitHub] carbondata pull request #2882: [CARBONDATA-3060]Improve the command for cli ...

2018-10-30 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/2882

[CARBONDATA-3060]Improve the command for cli and fixed other issues

### Changes proposed in this PR:

**1. improve the syntax for CLI DDL, now the command can be given as** 
`CarbonCli for table  options('-cmd summary/benchmark-a  -s 
 -v  -c  -m')`

the options will take one string, which is basically a command, which user 
can directly paste into command promt and run as java command
Now user no nned to give -P also, internally when above commad is run we 
take table path into consideration in command line arguments

**2. other issues are fixed in this PR are**
1. when numeric columns are included in dictionary
2. timestamp column's min and max details
3. help command was not working in beeline
4. for all the complex columns, like parent and child columns, min and max 
will be given as "NA"




Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata improve

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2882.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2882


commit ca8061e1d6e0652c786c1231ad82efe17e504ee4
Author: akashrn5 
Date:   2018-10-29T12:44:24Z

Improve the command for cli and fixed other issues




---


[GitHub] carbondata pull request #2861: [CARBONDATA-3025]handle passing spark appname...

2018-10-30 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2861#discussion_r229235332
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
 ---
@@ -175,6 +174,12 @@ with Serializable {
   dataSchema: StructType,
   context: TaskAttemptContext): OutputWriter = {
 val model = 
CarbonTableOutputFormat.getLoadModel(context.getConfiguration)
+val appName = 
context.getConfiguration.get(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME)
+if (null != appName) {
+  CarbonProperties.getInstance()
+.addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME,
+  appName)
--- End diff --

done


---


[GitHub] carbondata issue #2861: [HOTFIX]handle passing spark appname for partition t...

2018-10-26 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2861
  
retest this please


---


[GitHub] carbondata pull request #2861: [HOTFIX]handle passing spark appname for part...

2018-10-26 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/2861

[HOTFIX]handle passing spark appname for partition table and file format

Dataload with partion table file format fails, as the appname is not in 
carbonproperties in executor.
This PR sets the spark appname in carbon properties which will be written 
to carbondata footer.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata written

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2861.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2861


commit 92a97277a96453e321a120eec2e1dc0b6d327c17
Author: akashrn5 
Date:   2018-10-26T14:04:04Z

handle passing spark appname for partition table and file format




---


[GitHub] carbondata issue #2829: [CARBONDATA-3025]add more metadata in carbon file fo...

2018-10-25 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2829
  
> @akashrn5 Instead of changing many classes to pass writtenBy and appName 
can't we set to CarbonProperties and in writer step we can get from the same 
and write to thrift??

handled 


---


[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...

2018-10-25 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2829#discussion_r228124274
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java 
---
@@ -371,8 +381,14 @@ public CarbonWriter build() throws IOException, 
InvalidLoadOptionException {
   "Writer type is not set, use withCsvInput() or withAvroInput() 
or withJsonInput()  "
   + "API based on input");
 }
+if (this.writtenByApp == null) {
--- End diff --

added


---


[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...

2018-10-25 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2829#discussion_r228124241
  
--- Diff: format/src/main/thrift/carbondata.thrift ---
@@ -206,6 +206,7 @@ struct FileFooter3{
 4: optional list blocklet_info_list3;   // Information 
about blocklets of all columns in this file for V3 format
 5: optional dictionary.ColumnDictionaryChunk dictionary; // Blocklet 
local dictionary
 6: optional bool is_sort; // True if the data is sorted in this file, 
it is used for compaction to decide whether to use merge sort or not
+7: optional map extra_info; // written by is used to 
write who wrote the file, it can be Aplication name, or SDK etc and version in 
which this carbondata file is written etc
--- End diff --

modified


---


[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...

2018-10-25 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2829#discussion_r228124157
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/CarbonDataLoadConfiguration.java
 ---
@@ -460,4 +464,20 @@ public String getColumnCompressor() {
   public void setColumnCompressor(String columnCompressor) {
 this.columnCompressor = columnCompressor;
   }
+
+  public String getAppName() {
--- End diff --

handled as carbon property


---


[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...

2018-10-25 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2829#discussion_r228123974
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCarbonFileInputFormatWithExternalCarbonTable.scala
 ---
@@ -56,7 +56,7 @@ class TestCarbonFileInputFormatWithExternalCarbonTable 
extends QueryTest with Be
   val builder = CarbonWriter.builder()
   val writer =
 builder.outputPath(writerPath + "/Fact/Part0/Segment_null")
-  .withCsvInput(Schema.parseJson(schema)).build()
+  .withCsvInput(Schema.parseJson(schema)).writtenBy("SDK").build()
--- End diff --

added classname for writtenby


---


[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...

2018-10-25 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2829#discussion_r228124063
  
--- Diff: 
integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceTest.scala
 ---
@@ -984,7 +984,7 @@ class SparkCarbonDataSourceTest extends FunSuite with 
BeforeAndAfterAll {
   val writer =
 builder.outputPath(path)
   .uniqueIdentifier(System.nanoTime()).withBlockSize(2)
-  .withCsvInput(new Schema(structType)).build()
+  .withCsvInput(new 
Schema(structType)).writtenBy("DataSource").build()
--- End diff --

ok


---


[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...

2018-10-25 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2829#discussion_r228124017
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala
 ---
@@ -139,13 +139,13 @@ class TestNonTransactionalCarbonTable extends 
QueryTest with BeforeAndAfterAll {
 .sortBy(sortColumns.toArray)
 .uniqueIdentifier(
   
System.currentTimeMillis).withBlockSize(2).withLoadOptions(options)
-.withCsvInput(Schema.parseJson(schema)).build()
+
.withCsvInput(Schema.parseJson(schema)).writtenBy("SDK").build()
--- End diff --

changed


---


[GitHub] carbondata pull request #2830: [CARBONDATA-3025]Added CLI enhancements

2018-10-25 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2830#discussion_r228123785
  
--- Diff: 
tools/cli/src/main/java/org/apache/carbondata/tool/DataSummary.java ---
@@ -226,25 +282,46 @@ private int getColumnIndex(String columnName) {
   private boolean collected = false;
 
   private void printColumnStats(String columnName) throws IOException, 
MemoryException {
-out.println();
-out.println("## Column Statistics for '" + columnName + "'");
+outPuts.add("");
+outPuts.add("## Column Statistics for '" + columnName + "'");
 collectStats(columnName);
 
 int columnIndex = getColumnIndex(columnName);
 String[] header = new String[]{"BLK", "BLKLT", "Meta Size", "Data 
Size",
-"LocalDict", "DictEntries", "DictSize", "AvgPageSize", "Min%", 
"Max%"};
+"LocalDict", "DictEntries", "DictSize", "AvgPageSize", "Min%", 
"Max%", "Min", "Max"};
 
-ShardPrinter printer = new ShardPrinter(header);
+ShardPrinter printer = new ShardPrinter(header, outPuts);
 for (Map.Entry entry : dataFiles.entrySet()) {
   DataFile file = entry.getValue();
   for (DataFile.Blocklet blocklet : file.getAllBlocklets()) {
-String min, max;
+String min, max, minPercent, maxPercent;
--- End diff --

handled


---


[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...

2018-10-25 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2829#discussion_r228123905
  
--- Diff: format/src/main/thrift/carbondata.thrift ---
@@ -206,6 +206,8 @@ struct FileFooter3{
 4: optional list blocklet_info_list3;   // Information 
about blocklets of all columns in this file for V3 format
 5: optional dictionary.ColumnDictionaryChunk dictionary; // Blocklet 
local dictionary
 6: optional bool is_sort; // True if the data is sorted in this file, 
it is used for compaction to decide whether to use merge sort or not
+7: optional string written_by; // written by is used to write who 
wrote the file, it can be LOAD, or SDK etc
--- End diff --

added a map


---


[GitHub] carbondata pull request #2830: [CARBONDATA-3025]Added CLI enhancements

2018-10-25 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2830#discussion_r228123653
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonShowSummaryCommand.scala
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.management
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference}
+import org.apache.spark.sql.execution.command.{Checker, DataCommand}
+import org.apache.spark.sql.types.StringType
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+import org.apache.carbondata.tool.CarbonCli
+
+case class CarbonShowSummaryCommand(
--- End diff --

done


---


[GitHub] carbondata pull request #2830: [CARBONDATA-3025]Added CLI enhancements

2018-10-25 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2830#discussion_r228123741
  
--- Diff: tools/cli/src/main/java/org/apache/carbondata/tool/CarbonCli.java 
---
@@ -40,6 +41,10 @@
 @InterfaceStability.Unstable
 public class CarbonCli {
 
+  private static ArrayList outPuts;
+
+  private static boolean isPrintInConsole = true;
--- End diff --

added


---


[GitHub] carbondata pull request #2830: [CARBONDATA-3025]Added CLI enhancements

2018-10-25 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2830#discussion_r228123616
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala
 ---
@@ -493,6 +493,23 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser 
{
   showHistory.isDefined)
 }
 
+
+  protected lazy val cli: Parser[LogicalPlan] =
+(SHOW ~> SUMMARY ~> FOR ~> TABLE) ~ (ident <~ ".").? ~ ident ~
--- End diff --

changed


---


[GitHub] carbondata pull request #2853: [CARBONDATA_3025]make cli compilable with jav...

2018-10-25 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/2853

[CARBONDATA_3025]make cli compilable with java 1.7

This PR helps to compile the cli module to be compile with java 1.7

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata cli1.7

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2853.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2853


commit d0b4c86b6c6928030d63539796201d4b6f284ce6
Author: akashrn5 
Date:   2018-10-24T12:05:00Z

make cli compilable with java 1.7




---


[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...

2018-10-24 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2829#discussion_r228034556
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1845,6 +1845,21 @@
   public static final int CARBON_MINMAX_ALLOWED_BYTE_COUNT_MIN = 10;
   public static final int CARBON_MINMAX_ALLOWED_BYTE_COUNT_MAX = 1000;
 
+  /**
+   * Written by detail to be written in carbondata footer for better 
maintanability
+   */
+  public static final String CARBON_WRITTEN_BY_FOOTER_INFO = "written_by";
+
+  /**
+   * carbon version detail to be written in carbondata footer for better 
maintanability
+   */
+  public static final String CARBON_VERSION_FOOTER_INFO = "version";
--- End diff --

yes, even this suits better


---


[GitHub] carbondata issue #2829: [CARBONDATA-3025]add more metadata in carbon file fo...

2018-10-22 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2829
  
retest this please


---


[GitHub] carbondata issue #2829: [CARBONDATA-3025]add more metadata in carbon file fo...

2018-10-22 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2829
  
retest this please


---


[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...

2018-10-19 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2829#discussion_r226722751
  
--- Diff: format/src/main/thrift/carbondata.thrift ---
@@ -206,6 +206,7 @@ struct FileFooter3{
 4: optional list blocklet_info_list3;   // Information 
about blocklets of all columns in this file for V3 format
 5: optional dictionary.ColumnDictionaryChunk dictionary; // Blocklet 
local dictionary
 6: optional bool is_sort; // True if the data is sorted in this file, 
it is used for compaction to decide whether to use merge sort or not
+7: optional map extra_info; // written by is used to 
write who wrote the file, it can be Aplication name, or SDK etc and version in 
which this carbondata file is written etc
--- End diff --

for all the extra info, create map, it didnt get much of it, currently, it 
is map, and this suits for adding extra meta, and about changing test case, 
since those are the api level changes, we need to change those test cases.


---


[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...

2018-10-18 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2829#discussion_r226233891
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java 
---
@@ -76,6 +80,17 @@ public static Schema readSchemaInDataFile(String 
dataFilePath) throws IOExceptio
 return new Schema(columnSchemaList);
   }
 
+  public static String getVersionDetails(String dataFilePath) throws 
IOException {
--- End diff --

yes this info is from footer, in SDK we expose API is schema reader right, 
i thought may be i can expose one more API there only to get the version 
details, will this be ok? or need to write this in new class for footer info? 
please suggest


---


[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...

2018-10-18 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2829#discussion_r226229213
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/loading/CarbonDataLoadConfiguration.java
 ---
@@ -460,4 +464,20 @@ public String getColumnCompressor() {
   public void setColumnCompressor(String columnCompressor) {
 this.columnCompressor = columnCompressor;
   }
+
+  public String getAppName() {
+return appName;
+  }
+
+  public void setAppName(String appName) {
+this.appName = appName;
+  }
+
+  public String getVersion() {
+return version;
+  }
+
+  public void setVersion(String version) {
--- End diff --

it is carbon version


---


[GitHub] carbondata issue #2830: [CARBONDATA-3025]Added CLI enhancements

2018-10-18 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2830
  
> @akashrn5 , I think it's better to support showing a specified segment 
info too, if users want to get the details of one segment but there are many 
blocks in this segment, it's difficult for user to use.. maybe can add this 
feature later.

yes, segment level can be added later


---


[GitHub] carbondata issue #2830: [CARBONDATA-3025]Added CLI enhancements

2018-10-18 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2830
  
@zzcclp it shows all segment info, segment level filtering is not there, 
but we can send a specific block path to get only that block info


---


[GitHub] carbondata pull request #2828: [CARBONDATA-3025]Added DDL support for cli an...

2018-10-17 Thread akashrn5
Github user akashrn5 closed the pull request at:

https://github.com/apache/carbondata/pull/2828


---


[GitHub] carbondata pull request #2830: [CARBONDATA-3025]Added CLI enhancements

2018-10-17 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/2830

[CARBONDATA-3025]Added CLI enhancements

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata integrate2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2830.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2830


commit 9b398144f83bf5378b4378ba1ddb47f50bf7a6da
Author: akashrn5 
Date:   2018-10-17T14:26:49Z

add more metadata in carbon file footer

commit e31d757ce8757ac276233c62a8e9d1947553fc4a
Author: akashrn5 
Date:   2018-10-17T14:28:59Z

Added CLI enhancements for better maintainability




---


[GitHub] carbondata pull request #2829: [CARBONDATA-3025]add more metadata in carbon ...

2018-10-17 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/2829

[CARBONDATA-3025]add more metadata in carbon file footer

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [ ] Testing done
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata integrate1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2829.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2829


commit 9b398144f83bf5378b4378ba1ddb47f50bf7a6da
Author: akashrn5 
Date:   2018-10-17T14:26:49Z

add more metadata in carbon file footer




---


[GitHub] carbondata pull request #2828: Added DDL support for cli and added more info...

2018-10-17 Thread akashrn5
GitHub user akashrn5 opened a pull request:

https://github.com/apache/carbondata/pull/2828

Added DDL support for cli and added more info in carbon file footer

### **Changes Proposed in this PR:**

1. Add more info to carbon file footer, like `written_by` (which will be 
spark application_name) in case of insert into and load command. To read this 
info one can use CLI
   For SDK this API will be exposed to write this info in footer and one 
API will exposed to read this info from SDK.
   
2. footer will have information about in which `version` of carbon the file 
is written, which will be helpful for getting details, for comaptibility etc

Enhancement in CLI tool
3.  a new option called "`-v`" is added to get the written_by and version 
details in addition to existing options
4.  SQL support is added for CLI
This is introduced so that user can get the details in beeline 
only and no need to execute separately. Currently the command is as below, 
based on comment we can change this DDL
`Show summary for table  
options('command'='-cmd,summary,-a,-p,-b');`

5.  when we get details for column statistics ,we will get the Min and Max 
percentage, if we add actual min and max values, then it will be helpful for 
the developer
```
BLK  BLKLT  Meta Size  Data Size  LocalDict  DictEntries DictSize  
AvgPageSize  Min%  Max%  Min  Max  
0  0 2.90KB   4.87MB false  0
0.0B   93.76KB 0.0 100.0  0290  
0  1 2.90KB   2.29MB false  0
0.0B   93.76KB 0.0 100.0  1292  
1  0 2.90KB   4.87MB false  0
0.0B   93.76KB 0.0 100.0  3293  
1  1 2.90KB   2.29MB false  0
0.0B   93.76KB 0.0 100.0  4295  
2  0 2.90KB   5.52MB false  0
0.0B   93.76KB 0.0 100.0  6297  
2  1 2.90KB   2.94MB false  0
0.0B   93.76KB 0.0 100.0  8298  
2  2 830.0B   586.81KB   false  0
0.0B   83.71KB0.0 100.0  9299 

```  
6.  Currently CLI tool get blocklet details for all the blockfiles, so if 
we have more number of carbondata files, then it will take lot of time to get 
the details,
so limit is added to it, by default when we give option as "-b", only 4 
outputs will be given, if we want more, we can pass the option value for b as 
limit number, 
Example:   "`-b 30`"   =>>  Then limit will be increased to 30

7. one more is a new Option is added called "`-B`", which takes mandatory 
option value as a block path. This is added just to get the block detail like, 
number of blocklets, number of pages, rows and size

Example:  "`-B /home/ss/ss/part-0-0_batchno0-0-0-1539782855178.carbondata`"
```
## Filtered Block Details for: 
part-0-0_batchno0-0-0-1539782855178.carbondata
BLKT  NumPages  NumRows  Size
04 20   10.0B
```

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [x] Any interfaces changed?
 NA
 - [x] Any backward compatibility impacted?
 Need o handle
 - [x] Document update required?
Yes, will be raised in separate PR
 - [x] Testing done
UTs are added
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance 
test report.
- Any additional information to help reviewers in testing this 
change.
   
 - [x] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
NA



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akashrn5/incubator-carbondata integrate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2828.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2828


commit b747030671155067bf729e032ebdc88b87033534
Author: akashrn5 
Date:   2018-10-10T13:15:31Z

added DDL support for cli and add more info in carbon file footer




---


[GitHub] carbondata issue #2791: [WIP][HOTFIX]correct the exception handling in looku...

2018-10-04 Thread akashrn5
Github user akashrn5 commented on the issue:

https://github.com/apache/carbondata/pull/2791
  
i am marking it as WIP, i will check and whitelost all the other exception 
which can be thrown from spark


---


[GitHub] carbondata pull request #2791: [HOTFIX]correct the exception handling in loo...

2018-10-03 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2791#discussion_r222537606
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
 ---
@@ -23,6 +23,7 @@ import java.net.URI
 import scala.collection.mutable.ArrayBuffer
 
 import org.apache.hadoop.fs.permission.{FsAction, FsPermission}
+import org.apache.hadoop.hive.ql.metadata.HiveException
--- End diff --

scalastyle fails if i give the space, i have checked it



---


[GitHub] carbondata pull request #2791: [HOTFIX]correct the exception handling in loo...

2018-10-03 Thread akashrn5
Github user akashrn5 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2791#discussion_r222537366
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
 ---
@@ -208,7 +209,10 @@ class CarbonFileMetastore extends CarbonMetaStore {
 try {
   lookupRelation(tableIdentifier)(sparkSession)
 } catch {
-  case _: Exception =>
+  case ex: Exception =>
+if (ex.getCause.isInstanceOf[HiveException]) {
+  throw ex
+}
--- End diff --

here we arereturning false directly after catching exception, telling the 
table does not exists, this is wrong, i have check the flow , we might get 
hiveException(regarding permission) also if the user is not allowed to access 
the table, that time also we return false and we will not get the proper error.


---


  1   2   3   4   >