Re: describe formatted table command does not show block size

2016-10-20 Thread Harmeet Singh
Hey  杰 ,

Yes I am checking the PR, so I suppose there is no need for raise the bug on
jira. But still the PR is not merge, Is there some specific reasons? . 



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/describe-formatted-table-command-does-not-show-block-size-tp2098p2138.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


??????[jira] [Created] (CARBONDATA-332) Create successfully Database,tables and columns using carbon reserve keywords

2016-10-20 Thread ??
hi, Harmeet
  
  i will raise a pr to solve this.


thanks
Jay




--  --
??: "Harmeet Singh (JIRA)";;
: 2016??10??20??(??) 6:36
??: "dev"; 

: [jira] [Created] (CARBONDATA-332) Create successfully Database,tables and 
columns using carbon reserve keywords



Harmeet Singh created CARBONDATA-332:


 Summary: Create successfully Database, tables and columns using 
carbon reserve keywords
 Key: CARBONDATA-332
 URL: https://issues.apache.org/jira/browse/CARBONDATA-332
 Project: CarbonData
  Issue Type: Bug
Reporter: Harmeet Singh


Hey team, I am trying to create database, tables and columns with carbon 
reserve keywords name and carbon allow us for creating. I am expecting an 
error. In hive, we are facing an error. Following are the steps : 

Step1: 
0: jdbc:hive2://127.0.0.1:1> create database double;
+-+--+
| result  |
+-+--+
+-+--+
No rows selected (6.225 seconds)

Step 2: 
0: jdbc:hive2://127.0.0.1:1> use double;
+-+--+
| result  |
+-+--+
+-+--+
No rows selected (0.104 seconds)

Step 3:
0: jdbc:hive2://127.0.0.1:1> create table decimal(int int, string string) 
stored by 'carbondata';
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (2.372 seconds)

Step 4:
0: jdbc:hive2://127.0.0.1:1> show tables;
++--+--+
| tableName  | isTemporary  |
++--+--+
| decimal| false|
++--+--+
1 row selected (0.071 seconds)

Step 5:
0: jdbc:hive2://127.0.0.1:1> desc decimal;
+---++--+--+
| col_name  | data_type  | comment  |
+---++--+--+
| string| string |  |
| int   | bigint |  |
+---++--+--+
2 rows selected (0.556 seconds)

Step 6:
0: jdbc:hive2://127.0.0.1:1> load data inpath 
'hdfs://localhost:54310/home/harmeet/reservewords.csv' into table decimal;
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (0.863 seconds)

Step 7:
0: jdbc:hive2://127.0.0.1:1> select * from decimal;
+-+--+--+
| string  | int  |
+-+--+--+
|  james  | 10   |
+-+--+--+
1 row selected (0.413 seconds)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: describe formatted table command does not show block size

2016-10-20 Thread ??
hi, Harmeet


 there is already pr#230 for this problem.  pls check.


thanks




-- Original --
From:  "Harmeet Singh";;
Date:  Thu, Oct 20, 2016 09:11 PM
To:  "dev"; 

Subject:  Re: describe formatted table command does not show block size



Hey Ravi, can we raise the issue in jira ?



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/describe-formatted-table-command-does-not-show-block-size-tp2098p2104.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84404263
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -1114,36 +1121,36 @@ object CarbonDataRDDFactory extends Logging {
 
   if (isUpdationRequired) {
 try {
-// Update load metadate file after cleaning deleted nodes
-if (carbonTableStatusLock.lockWithRetries()) {
-  logger.info("Table status lock has been successfully acquired.")
+  // Update load metadate file after cleaning deleted nodes
+  if (carbonTableStatusLock.lockWithRetries()) {
+logger.info("Table status lock has been successfully 
acquired.")
 
-  // read latest table status again.
-  val latestMetadata = segmentStatusManager
-.readLoadMetadata(loadMetadataFilePath)
+// read latest table status again.
+val latestMetadata = segmentStatusManager
+  .readLoadMetadata(loadMetadataFilePath)
 
-  // update the metadata details from old to new status.
+// update the metadata details from old to new status.
 
-  val latestStatus = CarbonLoaderUtil
-.updateLoadMetadataFromOldToNew(details, latestMetadata)
+val latestStatus = CarbonLoaderUtil
--- End diff --

in this case, move `CarbonLoaderUtil` to next line, do not break the line 
between object and function


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84404340
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonGlobalDictionaryRDD.scala
 ---
@@ -385,22 +385,22 @@ class CarbonGlobalDictionaryGenerateRDD(
   distinctValues)
 sortIndexWriteTask.execute()
   }
-  val sortIndexWriteTime = (System.currentTimeMillis() - t4)
-  
CarbonTimeStatisticsFactory.getLoadStatisticsInstance().recordDicShuffleAndWriteTime()
+  val sortIndexWriteTime = System.currentTimeMillis() - t4
+  
CarbonTimeStatisticsFactory.getLoadStatisticsInstance.recordDicShuffleAndWriteTime()
   // After sortIndex writing, update dictionaryMeta
   dictWriteTask.updateMetaData()
   // clear the value buffer after writing dictionary data
   valuesBuffer.clear
   org.apache.carbondata.core.util.CarbonUtil
--- End diff --

remove package name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84404900
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -267,10 +267,10 @@ object GlobalDictionaryUtil extends Logging {
   }
 
   def isHighCardinalityColumn(columnCardinality: Int,
-  rowCount: Long,
-  model: DictionaryLoadModel): Boolean = {
+  rowCount: Long,
+  model: DictionaryLoadModel): Boolean = {
 (columnCardinality > model.highCardThreshold) && (rowCount > 0) &&
--- End diff --

move `(rowCount > 0)` to next line so that every condition is one line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84404173
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -1043,32 +1050,32 @@ object CarbonDataRDDFactory extends Logging {
 CarbonLoaderUtil.deleteSegment(carbonLoadModel, currentLoadCount)
 logInfo("clean up done**")
 logger.audit(s"Data load is failed for " +
-  
s"${carbonLoadModel.getDatabaseName}.${carbonLoadModel.getTableName}")
+ s"${ carbonLoadModel.getDatabaseName }.${ 
carbonLoadModel.getTableName }")
 logWarning("Cannot write load metadata file as data load failed")
 throw new Exception(errorMessage)
   } else {
-  val metadataDetails = status(0)._2
-  if (!isAgg) {
-val status = CarbonLoaderUtil
-  .recordLoadMetadata(currentLoadCount,
-metadataDetails,
-carbonLoadModel,
-loadStatus,
-loadStartTime
-  )
-if (!status) {
-  val errorMessage = "Dataload failed due to failure in table 
status updation."
-  logger.audit("Data load is failed for " +
-   
s"${carbonLoadModel.getDatabaseName}.${carbonLoadModel.getTableName}")
-  logger.error("Dataload failed due to failure in table status 
updation.")
-  throw new Exception(errorMessage)
-}
-  } else if (!carbonLoadModel.isRetentionRequest) {
-// TODO : Handle it
-logInfo("Database updated**")
+val metadataDetails = status(0)._2
+if (!isAgg) {
+  val status = CarbonLoaderUtil
+.recordLoadMetadata(currentLoadCount,
--- End diff --

move to previous line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84404204
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -1114,36 +1121,36 @@ object CarbonDataRDDFactory extends Logging {
 
   if (isUpdationRequired) {
 try {
-// Update load metadate file after cleaning deleted nodes
-if (carbonTableStatusLock.lockWithRetries()) {
-  logger.info("Table status lock has been successfully acquired.")
+  // Update load metadate file after cleaning deleted nodes
+  if (carbonTableStatusLock.lockWithRetries()) {
+logger.info("Table status lock has been successfully 
acquired.")
 
-  // read latest table status again.
-  val latestMetadata = segmentStatusManager
-.readLoadMetadata(loadMetadataFilePath)
+// read latest table status again.
+val latestMetadata = segmentStatusManager
+  .readLoadMetadata(loadMetadataFilePath)
--- End diff --

move to previous line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84404111
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -505,129 +512,129 @@ object CarbonDataRDDFactory extends Logging {
   )
 }
 
-  val compactionThread = new Thread {
-override def run(): Unit = {
+val compactionThread = new Thread {
+  override def run(): Unit = {
 
+try {
+  // compaction status of the table which is triggered by the user.
+  var triggeredCompactionStatus = false
+  var exception: Exception = null
   try {
-// compaction status of the table which is triggered by the 
user.
-var triggeredCompactionStatus = false
-var exception : Exception = null
-try {
-  executeCompaction(carbonLoadModel: CarbonLoadModel,
-hdfsStoreLocation: String,
-compactionModel: CompactionModel,
-partitioner: Partitioner,
-executor, sqlContext, kettleHomePath, storeLocation
+executeCompaction(carbonLoadModel: CarbonLoadModel,
+  hdfsStoreLocation: String,
+  compactionModel: CompactionModel,
+  partitioner: Partitioner,
+  executor, sqlContext, kettleHomePath, storeLocation
+)
+triggeredCompactionStatus = true
+  }
+  catch {
+case e: Exception =>
+  logger.error("Exception in compaction thread " + 
e.getMessage)
+  exception = e
+  }
+  // continue in case of exception also, check for all the tables.
+  val isConcurrentCompactionAllowed = 
CarbonProperties.getInstance()
+
.getProperty(CarbonCommonConstants.ENABLE_CONCURRENT_COMPACTION,
+  CarbonCommonConstants.DEFAULT_ENABLE_CONCURRENT_COMPACTION
+).equalsIgnoreCase("true")
+
+  if (!isConcurrentCompactionAllowed) {
+logger.info("System level compaction lock is enabled.")
+val skipCompactionTables = ListBuffer[CarbonTableIdentifier]()
+var tableForCompaction = CarbonCompactionUtil
+  
.getNextTableToCompact(CarbonEnv.getInstance(sqlContext).carbonCatalog.metadata
+.tablesMeta.toArray, skipCompactionTables.toList.asJava
   )
-  triggeredCompactionStatus = true
-}
-catch {
-  case e: Exception =>
-logger.error("Exception in compaction thread " + 
e.getMessage)
-exception = e
-}
-// continue in case of exception also, check for all the 
tables.
-val isConcurrentCompactionAllowed = 
CarbonProperties.getInstance()
-  
.getProperty(CarbonCommonConstants.ENABLE_CONCURRENT_COMPACTION,
-CarbonCommonConstants.DEFAULT_ENABLE_CONCURRENT_COMPACTION
-  ).equalsIgnoreCase("true")
-
-if (!isConcurrentCompactionAllowed) {
-  logger.info("System level compaction lock is enabled.")
-  val skipCompactionTables = 
ListBuffer[CarbonTableIdentifier]()
-  var tableForCompaction = CarbonCompactionUtil
-
.getNextTableToCompact(CarbonEnv.getInstance(sqlContext).carbonCatalog.metadata
-  .tablesMeta.toArray, skipCompactionTables.toList.asJava
+while (null != tableForCompaction) {
+  logger
+.info("Compaction request has been identified for table " 
+ tableForCompaction
+  .carbonTable.getDatabaseName + "." + 
tableForCompaction.carbonTableIdentifier
+.getTableName
 )
-  while (null != tableForCompaction) {
-logger
-  .info("Compaction request has been identified for table 
" + tableForCompaction
-.carbonTable.getDatabaseName + "." + 
tableForCompaction.carbonTableIdentifier
-  .getTableName
-  )
-val table: CarbonTable = tableForCompaction.carbonTable
-val metadataPath = table.getMetaDataFilepath
-val compactionType = 
CarbonCompactionUtil.determineCompactionType(metadataPath)
-
-val newCarbonLoadModel = new CarbonLoadModel()
-prepareCarbonLoadModel(hdfsStoreLocation, table, 
newCarbonLoadModel)
-val tableCreationTime = 

[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84405181
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 ---
@@ -839,10 +841,10 @@ object GlobalDictionaryUtil extends Logging {
   headers = headers.map(headerName => headerName.trim)
   // prune columns according to the CSV file header, dimension 
columns
   val (requireDimension, requireColumnNames) =
-pruneDimensions(dimensions, headers, headers)
+  pruneDimensions(dimensions, headers, headers)
--- End diff --

incorrect indentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84404806
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 ---
@@ -49,7 +49,7 @@ object CommonUtil {
   "Column group doesn't support Timestamp datatype:" + x)
   }
   // if invalid column is present
-  else if (dims.filter { dim => dim.column.equalsIgnoreCase(x) 
}.isEmpty) {
+  else if (!dims.exists { dim => dim.column.equalsIgnoreCase(x) }) {
--- End diff --

use `!dims.exists(_.column.equalsIgnoreCase(x))` instead of having `{` 
modify all places


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84404075
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -505,129 +512,129 @@ object CarbonDataRDDFactory extends Logging {
   )
 }
 
-  val compactionThread = new Thread {
-override def run(): Unit = {
+val compactionThread = new Thread {
+  override def run(): Unit = {
 
+try {
+  // compaction status of the table which is triggered by the user.
+  var triggeredCompactionStatus = false
+  var exception: Exception = null
   try {
-// compaction status of the table which is triggered by the 
user.
-var triggeredCompactionStatus = false
-var exception : Exception = null
-try {
-  executeCompaction(carbonLoadModel: CarbonLoadModel,
-hdfsStoreLocation: String,
-compactionModel: CompactionModel,
-partitioner: Partitioner,
-executor, sqlContext, kettleHomePath, storeLocation
+executeCompaction(carbonLoadModel: CarbonLoadModel,
+  hdfsStoreLocation: String,
+  compactionModel: CompactionModel,
+  partitioner: Partitioner,
+  executor, sqlContext, kettleHomePath, storeLocation
+)
+triggeredCompactionStatus = true
+  }
+  catch {
+case e: Exception =>
+  logger.error("Exception in compaction thread " + 
e.getMessage)
+  exception = e
+  }
+  // continue in case of exception also, check for all the tables.
+  val isConcurrentCompactionAllowed = 
CarbonProperties.getInstance()
+
.getProperty(CarbonCommonConstants.ENABLE_CONCURRENT_COMPACTION,
+  CarbonCommonConstants.DEFAULT_ENABLE_CONCURRENT_COMPACTION
+).equalsIgnoreCase("true")
+
+  if (!isConcurrentCompactionAllowed) {
+logger.info("System level compaction lock is enabled.")
+val skipCompactionTables = ListBuffer[CarbonTableIdentifier]()
+var tableForCompaction = CarbonCompactionUtil
+  
.getNextTableToCompact(CarbonEnv.getInstance(sqlContext).carbonCatalog.metadata
+.tablesMeta.toArray, skipCompactionTables.toList.asJava
   )
-  triggeredCompactionStatus = true
-}
-catch {
-  case e: Exception =>
-logger.error("Exception in compaction thread " + 
e.getMessage)
-exception = e
-}
-// continue in case of exception also, check for all the 
tables.
-val isConcurrentCompactionAllowed = 
CarbonProperties.getInstance()
-  
.getProperty(CarbonCommonConstants.ENABLE_CONCURRENT_COMPACTION,
-CarbonCommonConstants.DEFAULT_ENABLE_CONCURRENT_COMPACTION
-  ).equalsIgnoreCase("true")
-
-if (!isConcurrentCompactionAllowed) {
-  logger.info("System level compaction lock is enabled.")
-  val skipCompactionTables = 
ListBuffer[CarbonTableIdentifier]()
-  var tableForCompaction = CarbonCompactionUtil
-
.getNextTableToCompact(CarbonEnv.getInstance(sqlContext).carbonCatalog.metadata
-  .tablesMeta.toArray, skipCompactionTables.toList.asJava
+while (null != tableForCompaction) {
+  logger
+.info("Compaction request has been identified for table " 
+ tableForCompaction
--- End diff --

please give proper format of log message. please modify all places


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84401743
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -505,129 +512,129 @@ object CarbonDataRDDFactory extends Logging {
   )
 }
 
-  val compactionThread = new Thread {
-override def run(): Unit = {
+val compactionThread = new Thread {
+  override def run(): Unit = {
 
+try {
+  // compaction status of the table which is triggered by the user.
+  var triggeredCompactionStatus = false
+  var exception: Exception = null
   try {
-// compaction status of the table which is triggered by the 
user.
-var triggeredCompactionStatus = false
-var exception : Exception = null
-try {
-  executeCompaction(carbonLoadModel: CarbonLoadModel,
-hdfsStoreLocation: String,
-compactionModel: CompactionModel,
-partitioner: Partitioner,
-executor, sqlContext, kettleHomePath, storeLocation
+executeCompaction(carbonLoadModel: CarbonLoadModel,
+  hdfsStoreLocation: String,
+  compactionModel: CompactionModel,
+  partitioner: Partitioner,
+  executor, sqlContext, kettleHomePath, storeLocation
+)
+triggeredCompactionStatus = true
+  }
+  catch {
+case e: Exception =>
+  logger.error("Exception in compaction thread " + 
e.getMessage)
+  exception = e
+  }
+  // continue in case of exception also, check for all the tables.
+  val isConcurrentCompactionAllowed = 
CarbonProperties.getInstance()
+
.getProperty(CarbonCommonConstants.ENABLE_CONCURRENT_COMPACTION,
+  CarbonCommonConstants.DEFAULT_ENABLE_CONCURRENT_COMPACTION
+).equalsIgnoreCase("true")
+
+  if (!isConcurrentCompactionAllowed) {
+logger.info("System level compaction lock is enabled.")
+val skipCompactionTables = ListBuffer[CarbonTableIdentifier]()
+var tableForCompaction = CarbonCompactionUtil
+  
.getNextTableToCompact(CarbonEnv.getInstance(sqlContext).carbonCatalog.metadata
+.tablesMeta.toArray, skipCompactionTables.toList.asJava
   )
-  triggeredCompactionStatus = true
-}
-catch {
-  case e: Exception =>
-logger.error("Exception in compaction thread " + 
e.getMessage)
-exception = e
-}
-// continue in case of exception also, check for all the 
tables.
-val isConcurrentCompactionAllowed = 
CarbonProperties.getInstance()
-  
.getProperty(CarbonCommonConstants.ENABLE_CONCURRENT_COMPACTION,
-CarbonCommonConstants.DEFAULT_ENABLE_CONCURRENT_COMPACTION
-  ).equalsIgnoreCase("true")
-
-if (!isConcurrentCompactionAllowed) {
-  logger.info("System level compaction lock is enabled.")
-  val skipCompactionTables = 
ListBuffer[CarbonTableIdentifier]()
-  var tableForCompaction = CarbonCompactionUtil
-
.getNextTableToCompact(CarbonEnv.getInstance(sqlContext).carbonCatalog.metadata
-  .tablesMeta.toArray, skipCompactionTables.toList.asJava
+while (null != tableForCompaction) {
+  logger
+.info("Compaction request has been identified for table " 
+ tableForCompaction
+  .carbonTable.getDatabaseName + "." + 
tableForCompaction.carbonTableIdentifier
+.getTableName
 )
-  while (null != tableForCompaction) {
-logger
-  .info("Compaction request has been identified for table 
" + tableForCompaction
-.carbonTable.getDatabaseName + "." + 
tableForCompaction.carbonTableIdentifier
-  .getTableName
-  )
-val table: CarbonTable = tableForCompaction.carbonTable
-val metadataPath = table.getMetaDataFilepath
-val compactionType = 
CarbonCompactionUtil.determineCompactionType(metadataPath)
-
-val newCarbonLoadModel = new CarbonLoadModel()
-prepareCarbonLoadModel(hdfsStoreLocation, table, 
newCarbonLoadModel)
-val tableCreationTime = 

[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84401600
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -505,129 +512,129 @@ object CarbonDataRDDFactory extends Logging {
   )
 }
 
-  val compactionThread = new Thread {
-override def run(): Unit = {
+val compactionThread = new Thread {
+  override def run(): Unit = {
 
--- End diff --

remove empty line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84401697
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -505,129 +512,129 @@ object CarbonDataRDDFactory extends Logging {
   )
 }
 
-  val compactionThread = new Thread {
-override def run(): Unit = {
+val compactionThread = new Thread {
+  override def run(): Unit = {
 
+try {
+  // compaction status of the table which is triggered by the user.
+  var triggeredCompactionStatus = false
+  var exception: Exception = null
   try {
-// compaction status of the table which is triggered by the 
user.
-var triggeredCompactionStatus = false
-var exception : Exception = null
-try {
-  executeCompaction(carbonLoadModel: CarbonLoadModel,
-hdfsStoreLocation: String,
-compactionModel: CompactionModel,
-partitioner: Partitioner,
-executor, sqlContext, kettleHomePath, storeLocation
+executeCompaction(carbonLoadModel: CarbonLoadModel,
+  hdfsStoreLocation: String,
+  compactionModel: CompactionModel,
+  partitioner: Partitioner,
+  executor, sqlContext, kettleHomePath, storeLocation
+)
+triggeredCompactionStatus = true
+  }
+  catch {
+case e: Exception =>
+  logger.error("Exception in compaction thread " + 
e.getMessage)
+  exception = e
+  }
+  // continue in case of exception also, check for all the tables.
+  val isConcurrentCompactionAllowed = 
CarbonProperties.getInstance()
+
.getProperty(CarbonCommonConstants.ENABLE_CONCURRENT_COMPACTION,
+  CarbonCommonConstants.DEFAULT_ENABLE_CONCURRENT_COMPACTION
+).equalsIgnoreCase("true")
+
+  if (!isConcurrentCompactionAllowed) {
+logger.info("System level compaction lock is enabled.")
+val skipCompactionTables = ListBuffer[CarbonTableIdentifier]()
+var tableForCompaction = CarbonCompactionUtil
+  
.getNextTableToCompact(CarbonEnv.getInstance(sqlContext).carbonCatalog.metadata
+.tablesMeta.toArray, skipCompactionTables.toList.asJava
   )
-  triggeredCompactionStatus = true
-}
-catch {
-  case e: Exception =>
-logger.error("Exception in compaction thread " + 
e.getMessage)
-exception = e
-}
-// continue in case of exception also, check for all the 
tables.
-val isConcurrentCompactionAllowed = 
CarbonProperties.getInstance()
-  
.getProperty(CarbonCommonConstants.ENABLE_CONCURRENT_COMPACTION,
-CarbonCommonConstants.DEFAULT_ENABLE_CONCURRENT_COMPACTION
-  ).equalsIgnoreCase("true")
-
-if (!isConcurrentCompactionAllowed) {
-  logger.info("System level compaction lock is enabled.")
-  val skipCompactionTables = 
ListBuffer[CarbonTableIdentifier]()
-  var tableForCompaction = CarbonCompactionUtil
-
.getNextTableToCompact(CarbonEnv.getInstance(sqlContext).carbonCatalog.metadata
-  .tablesMeta.toArray, skipCompactionTables.toList.asJava
+while (null != tableForCompaction) {
+  logger
+.info("Compaction request has been identified for table " 
+ tableForCompaction
+  .carbonTable.getDatabaseName + "." + 
tableForCompaction.carbonTableIdentifier
+.getTableName
 )
-  while (null != tableForCompaction) {
-logger
-  .info("Compaction request has been identified for table 
" + tableForCompaction
-.carbonTable.getDatabaseName + "." + 
tableForCompaction.carbonTableIdentifier
-  .getTableName
-  )
-val table: CarbonTable = tableForCompaction.carbonTable
-val metadataPath = table.getMetaDataFilepath
-val compactionType = 
CarbonCompactionUtil.determineCompactionType(metadataPath)
-
-val newCarbonLoadModel = new CarbonLoadModel()
-prepareCarbonLoadModel(hdfsStoreLocation, table, 
newCarbonLoadModel)
-val tableCreationTime = 

[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84400885
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataLoadRDD.scala
 ---
@@ -439,42 +436,42 @@ class DataFileLoaderRDD[K, V](
   }
 
   override def getPreferredLocations(split: Partition): Seq[String] = {
-isTableSplitPartition match {
-  case true =>
-// for table split partition
-val theSplit = split.asInstanceOf[CarbonTableSplitPartition]
-val location = 
theSplit.serializableHadoopSplit.value.getLocations.asScala
-location
-  case false =>
-// for node partition
-val theSplit = split.asInstanceOf[CarbonNodePartition]
-val firstOptionLocation: Seq[String] = 
List(theSplit.serializableHadoopSplit)
-logInfo("Preferred Location for split : " + firstOptionLocation(0))
-val blockMap = new util.LinkedHashMap[String, Integer]()
-val tableBlocks = theSplit.blocksDetails
-tableBlocks.foreach(tableBlock => tableBlock.getLocations.foreach(
-  location => {
-if (!firstOptionLocation.exists(location.equalsIgnoreCase(_))) 
{
-  val currentCount = blockMap.get(location)
-  if (currentCount == null) {
-blockMap.put(location, 1)
-  } else {
-blockMap.put(location, currentCount + 1)
-  }
+if (isTableSplitPartition) {
+  // for table split partition
+  val theSplit = split.asInstanceOf[CarbonTableSplitPartition]
+  val location = 
theSplit.serializableHadoopSplit.value.getLocations.asScala
+  location
+} else {
+  // for node partition
+  val theSplit = split.asInstanceOf[CarbonNodePartition]
+  val firstOptionLocation: Seq[String] = 
List(theSplit.serializableHadoopSplit)
+  logInfo("Preferred Location for split : " + firstOptionLocation.head)
+  val blockMap = new util.LinkedHashMap[String, Integer]()
+  val tableBlocks = theSplit.blocksDetails
+  tableBlocks.foreach(tableBlock => tableBlock.getLocations.foreach(
+location => {
+  if (!firstOptionLocation.exists(location.equalsIgnoreCase)) {
+val currentCount = blockMap.get(location)
+if (currentCount == null) {
+  blockMap.put(location, 1)
+} else {
+  blockMap.put(location, currentCount + 1)
 }
   }
-)
-)
-
-val sortedList = 
blockMap.entrySet().asScala.toSeq.sortWith((nodeCount1, nodeCount2) => {
-  nodeCount1.getValue > nodeCount2.getValue
 }
-)
+  )
+  )
--- End diff --

indentation is not correct


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84400407
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataLoadRDD.scala
 ---
@@ -196,30 +196,29 @@ class DataFileLoaderRDD[K, V](
   sc.setLocalProperty("spark.scheduler.pool", "DDL")
 
   override def getPartitions: Array[Partition] = {
-isTableSplitPartition match {
-  case true =>
-// for table split partition
-var splits = Array[TableSplit]()
-if (carbonLoadModel.isDirectLoad) {
-  splits = 
CarbonQueryUtil.getTableSplitsForDirectLoad(carbonLoadModel.getFactFilePath,
-partitioner.nodeList, partitioner.partitionCount)
-}
-else {
-  splits = 
CarbonQueryUtil.getTableSplits(carbonLoadModel.getDatabaseName,
-carbonLoadModel.getTableName, null, partitioner)
-}
+if (isTableSplitPartition) {
+  // for table split partition
+  var splits = Array[TableSplit]()
+  if (carbonLoadModel.isDirectLoad) {
+splits = 
CarbonQueryUtil.getTableSplitsForDirectLoad(carbonLoadModel.getFactFilePath,
+  partitioner.nodeList, partitioner.partitionCount)
+  }
+  else {
--- End diff --

put to previous line, please modify all places


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84400353
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/CarbonFilters.scala
 ---
@@ -259,29 +259,29 @@ object CarbonFilters {
 Some(new NotEqualsExpression(transformExpression(a).get, 
transformExpression(l).get))
 
 case Not(In(a: Attribute, list))
- if !list.exists(!_.isInstanceOf[Literal]) =>
- if (list.exists(x => (isNullLiteral(x.asInstanceOf[Literal] {
-  Some(new FalseExpression(transformExpression(a).get))
- }
-else {
-  Some(new NotInExpression(transformExpression(a).get,
+  if !list.exists(!_.isInstanceOf[Literal]) =>
+  if (list.exists(x => isNullLiteral(x.asInstanceOf[Literal]))) {
+Some(new FalseExpression(transformExpression(a).get))
+  }
+  else {
--- End diff --

put to previous line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84401103
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -175,8 +177,12 @@ object CarbonDataRDDFactory extends Logging {
   }
 
   def configSplitMaxSize(context: SparkContext, filePaths: String,
-hadoopConfiguration: Configuration): Unit = {
-val defaultParallelism = if (context.defaultParallelism < 1) 1 else 
context.defaultParallelism
+  hadoopConfiguration: Configuration): Unit = {
--- End diff --

in this case, write in one line is ok, no need to modify. better approach 
is to use min function instead of if else


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84401276
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -189,15 +195,15 @@ object CarbonDataRDDFactory extends Logging {
   }
   hadoopConfiguration.set(FileInputFormat.SPLIT_MAXSIZE, 
newSplitSize.toString)
   logInfo("totalInputSpaceConsumed : " + spaceConsumed +
-" , defaultParallelism : " + defaultParallelism)
+  " , defaultParallelism : " + defaultParallelism)
--- End diff --

change to `s"total... $spaceConsumed ... "` instead of string concat


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84400739
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataLoadRDD.scala
 ---
@@ -439,42 +436,42 @@ class DataFileLoaderRDD[K, V](
   }
 
   override def getPreferredLocations(split: Partition): Seq[String] = {
-isTableSplitPartition match {
-  case true =>
-// for table split partition
-val theSplit = split.asInstanceOf[CarbonTableSplitPartition]
-val location = 
theSplit.serializableHadoopSplit.value.getLocations.asScala
-location
-  case false =>
-// for node partition
-val theSplit = split.asInstanceOf[CarbonNodePartition]
-val firstOptionLocation: Seq[String] = 
List(theSplit.serializableHadoopSplit)
-logInfo("Preferred Location for split : " + firstOptionLocation(0))
-val blockMap = new util.LinkedHashMap[String, Integer]()
-val tableBlocks = theSplit.blocksDetails
-tableBlocks.foreach(tableBlock => tableBlock.getLocations.foreach(
-  location => {
-if (!firstOptionLocation.exists(location.equalsIgnoreCase(_))) 
{
-  val currentCount = blockMap.get(location)
-  if (currentCount == null) {
-blockMap.put(location, 1)
-  } else {
-blockMap.put(location, currentCount + 1)
-  }
+if (isTableSplitPartition) {
+  // for table split partition
+  val theSplit = split.asInstanceOf[CarbonTableSplitPartition]
+  val location = 
theSplit.serializableHadoopSplit.value.getLocations.asScala
+  location
+} else {
+  // for node partition
+  val theSplit = split.asInstanceOf[CarbonNodePartition]
+  val firstOptionLocation: Seq[String] = 
List(theSplit.serializableHadoopSplit)
+  logInfo("Preferred Location for split : " + firstOptionLocation.head)
+  val blockMap = new util.LinkedHashMap[String, Integer]()
+  val tableBlocks = theSplit.blocksDetails
+  tableBlocks.foreach(tableBlock => tableBlock.getLocations.foreach(
--- End diff --

move `tableBlock.getLocations.foreach` to next line, and give proper 
indentation.
change `(` to `{` after foreach. the code standard is:
```
xxx.foreach { x => 
  x.yyy
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #248: [CARBONDATA-328] Improve Code and Fi...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/248#discussion_r84400383
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/CarbonFilters.scala
 ---
@@ -259,29 +259,29 @@ object CarbonFilters {
 Some(new NotEqualsExpression(transformExpression(a).get, 
transformExpression(l).get))
 
 case Not(In(a: Attribute, list))
- if !list.exists(!_.isInstanceOf[Literal]) =>
- if (list.exists(x => (isNullLiteral(x.asInstanceOf[Literal] {
-  Some(new FalseExpression(transformExpression(a).get))
- }
-else {
-  Some(new NotInExpression(transformExpression(a).get,
+  if !list.exists(!_.isInstanceOf[Literal]) =>
+  if (list.exists(x => isNullLiteral(x.asInstanceOf[Literal]))) {
+Some(new FalseExpression(transformExpression(a).get))
+  }
+  else {
+Some(new NotInExpression(transformExpression(a).get,
   new 
ListExpression(convertToJavaList(list.map(transformExpression(_).get)
-}
+  }
 case In(a: Attribute, list) if 
!list.exists(!_.isInstanceOf[Literal]) =>
   Some(new InExpression(transformExpression(a).get,
 new 
ListExpression(convertToJavaList(list.map(transformExpression(_).get)
 case Not(In(Cast(a: Attribute, _), list))
   if !list.exists(!_.isInstanceOf[Literal]) =>
-/* if any illogical expression comes in NOT IN Filter like
- NOT IN('scala',NULL) this will be treated as false expression and 
will
- always return no result. */
-  if (list.exists(x => (isNullLiteral(x.asInstanceOf[Literal] {
-  Some(new FalseExpression(transformExpression(a).get))
- }
-else {
-  Some(new NotInExpression(transformExpression(a).get, new 
ListExpression(
+  /* if any illogical expression comes in NOT IN Filter like
+   NOT IN('scala',NULL) this will be treated as false expression 
and will
+   always return no result. */
+  if (list.exists(x => isNullLiteral(x.asInstanceOf[Literal]))) {
+Some(new FalseExpression(transformExpression(a).get))
+  }
+  else {
--- End diff --

put to previous line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84243237
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/encoding/impl/NonDictionaryFieldEncoderImpl.java
 ---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.encoding.impl;
+
+import java.nio.ByteBuffer;
+import java.nio.charset.Charset;
+
+import org.apache.carbondata.core.carbon.metadata.datatype.DataType;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.DataTypeUtil;
+import org.apache.carbondata.processing.newflow.DataField;
+import org.apache.carbondata.processing.newflow.encoding.FieldEncoder;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+public class NonDictionaryFieldEncoderImpl implements 
FieldEncoder {
+
+  private DataType dataType;
+
+  private int index;
+
+  public NonDictionaryFieldEncoderImpl(DataField dataField, int index) {
+this.dataType = dataField.getColumn().getDataType();
+this.index = index;
+  }
+
+  @Override public ByteBuffer encode(CarbonRow row) {
+String dimensionValue = row.getString(index);
+if (dataType != DataType.STRING) {
+  if (null == DataTypeUtil.normalizeIntAndLongValues(dimensionValue, 
dataType)) {
+dimensionValue = CarbonCommonConstants.MEMBER_DEFAULT_VAL;
+  }
+}
+ByteBuffer buffer = ByteBuffer
--- End diff --

move `ByteBuffer` to next line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-332) Create successfully Database, tables and columns using carbon reserve keywords

2016-10-20 Thread Harmeet Singh (JIRA)
Harmeet Singh created CARBONDATA-332:


 Summary: Create successfully Database, tables and columns using 
carbon reserve keywords
 Key: CARBONDATA-332
 URL: https://issues.apache.org/jira/browse/CARBONDATA-332
 Project: CarbonData
  Issue Type: Bug
Reporter: Harmeet Singh


Hey team, I am trying to create database, tables and columns with carbon 
reserve keywords name and carbon allow us for creating. I am expecting an 
error. In hive, we are facing an error. Following are the steps : 

Step1: 
0: jdbc:hive2://127.0.0.1:1> create database double;
+-+--+
| result  |
+-+--+
+-+--+
No rows selected (6.225 seconds)

Step 2: 
0: jdbc:hive2://127.0.0.1:1> use double;
+-+--+
| result  |
+-+--+
+-+--+
No rows selected (0.104 seconds)

Step 3:
0: jdbc:hive2://127.0.0.1:1> create table decimal(int int, string string) 
stored by 'carbondata';
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (2.372 seconds)

Step 4:
0: jdbc:hive2://127.0.0.1:1> show tables;
++--+--+
| tableName  | isTemporary  |
++--+--+
| decimal| false|
++--+--+
1 row selected (0.071 seconds)

Step 5:
0: jdbc:hive2://127.0.0.1:1> desc decimal;
+---++--+--+
| col_name  | data_type  | comment  |
+---++--+--+
| string| string |  |
| int   | bigint |  |
+---++--+--+
2 rows selected (0.556 seconds)

Step 6:
0: jdbc:hive2://127.0.0.1:1> load data inpath 
'hdfs://localhost:54310/home/harmeet/reservewords.csv' into table decimal;
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (0.863 seconds)

Step 7:
0: jdbc:hive2://127.0.0.1:1> select * from decimal;
+-+--+--+
| string  | int  |
+-+--+--+
|  james  | 10   |
+-+--+--+
1 row selected (0.413 seconds)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-carbondata pull request #249: [CARBONDATA-329] constant final clas...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/249#discussion_r84315712
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -19,866 +19,859 @@
 
 package org.apache.carbondata.core.constants;
 
-public final class CarbonCommonConstants {
+public interface CarbonCommonConstants {
--- End diff --

why to change it to interface? I do not think it is a good approach to keep 
constant value inside interface


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: describe formatted table command does not show block size

2016-10-20 Thread Harmeet Singh
Hey Ravi, can we raise the issue in jira ?



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/describe-formatted-table-command-does-not-show-block-size-tp2098p2104.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


Re: describe formatted table command does not show block size

2016-10-20 Thread ravipesala
Yes, better show all table properties in desc command.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/describe-formatted-table-command-does-not-show-block-size-tp2098p2103.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84226906
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/AbstractDataLoadProcessorStep.java
 ---
@@ -114,11 +114,15 @@ protected CarbonRowBatch 
processRowBatch(CarbonRowBatch rowBatch) {
   /**
* It is called when task is called successfully.
*/
-  public abstract void finish();
+  public void finish() {
+// implementation classes can override to update the status.
+  }
 
   /**
* Closing of resources after step execution can be done here.
*/
-  public abstract void close();
+  public void close() {
+// implementation classes can override to close the resources if any 
available.
--- End diff --

The comment is not so clear, is this called in case of failure?
in `finish()`, resources also need to be released, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: load data error

2016-10-20 Thread foryou2030
try hdfs://name001:9000/carbondata/sample.csv
  Instead of 
hdfs:///name001:9000/carbondata/sample.csv

发自我的 iPhone

> 在 2016年10月20日,上午10:52,仲景武  写道:
> 
> 
> when run command (thrift sever):
> 
> jdbc:hive2://taonongyuan.com:10099/default> load 
> data inpath 'hdfs://name001:9000/carbondata/sample.csv' into table 
> test_table3;
> 
> 
> throw exception:
> 
> Driver stacktrace: (state=,code=0)
> 0: jdbc:hive2://taonongyuan.com:10099/default> load 
> data inpath 'hdfs:///name001:9000/carbondata/sample.csv' into table 
> test_table3;
> Error: java.lang.IllegalArgumentException: Pathname 
> /name001:9000/carbondata/sample.csv from 
> hdfs:/name001:9000/carbondata/sample.csv is not a valid DFS filename. 
> (state=,code=0)
> 0: jdbc:hive2://taonongyuan.com:10099/default> load 
> data inpath 'hdfs://name001:9000/carbondata/sample.csv' into table 
> test_table3;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 5.0 (TID 18, data002): java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://name001:9000/user/hive/warehouse/carbon.store/default/test_table3/Metadata/fdd8c8c4-5cdd-4542-aab1-785be20b9f36.dictmeta,
>  expected: file:///
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:529)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:140)
> at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
> at 
> org.apache.carbondata.core.datastorage.store.impl.FileFactory.getDataInputStream(FileFactory.java:146)
> at org.apache.carbondata.core.reader.ThriftReader.open(ThriftReader.java:79)
> at 
> org.apache.carbondata.core.reader.CarbonDictionaryMetadataReaderImpl.openThriftReader(CarbonDictionaryMetadataReaderImpl.java:181)
> at 
> org.apache.carbondata.core.reader.CarbonDictionaryMetadataReaderImpl.readLastEntryOfDictionaryMetaChunk(CarbonDictionaryMetadataReaderImpl.java:128)
> at 
> org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.readLastChunkFromDictionaryMetadataFile(AbstractDictionaryCache.java:129)
> at 
> org.apache.carbondata.core.cache.dictionary.AbstractDictionaryCache.checkAndLoadDictionaryData(AbstractDictionaryCache.java:204)
> at 
> org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.getDictionary(ReverseDictionaryCache.java:181)
> at 
> org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.get(ReverseDictionaryCache.java:69)
> at 
> org.apache.carbondata.core.cache.dictionary.ReverseDictionaryCache.get(ReverseDictionaryCache.java:40)
> at 
> org.apache.carbondata.spark.load.CarbonLoaderUtil.getDictionary(CarbonLoaderUtil.java:508)
> at 
> org.apache.carbondata.spark.load.CarbonLoaderUtil.getDictionary(CarbonLoaderUtil.java:514)
> at 
> org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:362)
> at 
> org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:293)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 
> 
> 
> 在 2016年10月19日,下午4:55,仲景武 
> > 写道:
> 
> 
> hi, all
> 
> I have installed carbonate succeed  following the document 
> “https://cwiki.apache.org/confluence/display/CARBONDATA/“
> 
> but when load data into carbonate table  throws exception:
> 
> 
> run command:
> cc.sql("load data local inpath '../carbondata/sample.csv' into table 
> test_table")
> 
> errors:
> 
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
> not exist: /home/bigdata/bigdata/carbondata/sample.csv
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
> at 
> 

load data from local file system using beeline

2016-10-20 Thread Harmeet Singh
Hey team, I am using Thriftserver and Beeline client for executing carbon
data queries. I want to load the data from csv into carbon data table. The
CSV file is exist at out local system. When i am trying to execute the
query, i got an exception, invalid path. Below is the example; 

0: jdbc:hive2://127.0.0.1:1> load data local inpath
'/home/harmeet/CarbonDataInputs/sample3.csv' into table newone
options('quotechar'='`');
Error: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
path does not exist: /home/harmeet/CarbonDataInputs/sample3.csv
(state=,code=0)

Please confirm this is an issue, or carbon data not support load file from
local system. If this is not a feature, then i will suggest that feature. We
will load the data from local csv file instead of hadoop. 



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/load-data-from-local-file-system-using-beeline-tp2101.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


Unable to perform compaction,

2016-10-20 Thread prabhatkashyap
Hello,
There is some issue with compaction. Auto and force compaction are not
working.
In spark logs I got this error:









--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Unable-to-perform-compaction-tp2099.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84242215
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/encodestep/EncoderProcessorStepImpl.java
 ---
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.processing.newflow.steps.encodestep;
+
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import org.apache.carbondata.processing.newflow.encoding.RowEncoder;
+import 
org.apache.carbondata.processing.newflow.encoding.impl.RowEncoderImpl;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+/**
+ * Encode data with dictionary values and composed with bit/byte packed 
key.
+ * nondictionary values are packed as bytes, and complex types are also 
packed as bytes.
+ */
+public class EncoderProcessorStepImpl extends 
AbstractDataLoadProcessorStep {
--- End diff --

Actually this step is to transform the row into a temporarily encoded row 
(3 elements with dictionary dimension, no dictionary dimension, measure), the 
dimension byte array is what we want for sorting step, so can we call this step 
`SortKeyPreparationStep`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


describe formatted table command does not show block size

2016-10-20 Thread Harmeet Singh
Hey team, carbon data creating a table block size with 1GB default value. But
when i am trying to show the block size value using desc formatted
, the detail is show, but block size doest not appear. I suggest
in tables details the block size must be visible. Even we manually define
the table block size, the size does not appear. Following are the details: 

*First table automatic block size:* 

0: jdbc:hive2://127.0.0.1:1> create table testOne(name string, age int)
stored by 'carbondata';

0: jdbc:hive2://127.0.0.1:1> desc formatted testOne;
+-+--+
|   
 
result  
   
|
+-+--+
| name string   
   
DICTIONARY, KEY COLUMN|
| age  bigint   
   
MEASURE   |
|   

 
|
| ##Detailed Table Information  

 
|
| Database Name :  fordrop  

 
|
| Table Name : testone  

 
|
| CARBON Store Path : 
hdfs://localhost:54310/Opt/CarbonStore  
  
|
|   

 
|
| ##Detailed Column property

 
|
| NONE  

 
|
|   

 
|
| ##Column Group Information

 
|


*Second Table manually block size: *

0: jdbc:hive2://127.0.0.1:1> create table testTwo(name string, age int)
stored by 'carbondata' tblproperties('table_blocksize"="1");

0: jdbc:hive2://127.0.0.1:1> desc formatted testTwo;
+-+--+
|   
 
result  
   
|
+-+--+
| name string   
   
DICTIONARY, KEY COLUMN|
| age  bigint   
   
MEASURE   |
|   

 
|
| ##Detailed Table Information

Drop Database cascade hive command not supported.

2016-10-20 Thread Harmeet Singh
Hey team, I am creating a database and that database contains tables. I want
to drop the database and clean the table in one command using cascade, but
carbon data gave us an error. If i am using hive, hive allow me for dropping
database with cascade. Below are the details: 

In *CarbonData*: 
0: jdbc:hive2://127.0.0.1:1> use databse fordrop;
Error: org.apache.spark.sql.AnalysisException: extraneous input 'fordrop'
expecting EOF near ''; line 1 pos 12 (state=,code=0)

0: jdbc:hive2://127.0.0.1:1> use fordrop;
+-+--+
| result  |
+-+--+
+-+--+
No rows selected (0.039 seconds)

0: jdbc:hive2://127.0.0.1:1> create table testOne(name string, age int)
stored by 'carbondata';
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (0.16 seconds)
0: jdbc:hive2://127.0.0.1:1> drop database fordrop cascade;
Error:
org.apache.carbondata.spark.exception.MalformedCarbonCommandException:
Unsupported cascade operation in drop database/schema command
(state=,code=0)

In *Hive*: 
hive> create database fordrop;
OK
Time taken: 8.679 seconds
hive> use fordrop;
OK
Time taken: 0.656 seconds
hive> create table testOne(name String, age int);
OK
Time taken: 2.854 seconds
hive> drop database fordrop cascade;
OK
Time taken: 5.303 seconds

This seems an issue, please confirm ?




--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Drop-Database-dbname-cascade-hive-command-not-supported-tp2097.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.


[GitHub] incubator-carbondata pull request #247: [CARBONDATA-301] Added Sort processo...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/247#discussion_r84245645
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/sort/SortProcessorStepImpl.java
 ---
@@ -0,0 +1,237 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.steps.sort;
+
+import java.io.File;
+import java.util.Iterator;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+import 
org.apache.carbondata.processing.sortandgroupby.exception.CarbonSortKeyAndGroupByException;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortDataRows;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortIntermediateFileMerger;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortParameters;
+import 
org.apache.carbondata.processing.store.SingleThreadFinalSortFilesMerger;
+import 
org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException;
+import org.apache.carbondata.processing.util.CarbonDataProcessorUtil;
+
+/**
+ * It sorts the data and write them to intermediate temp files. These 
files will be further read
+ * by next step for writing to carbondata files.
+ */
+public class SortProcessorStepImpl extends AbstractDataLoadProcessorStep {
--- End diff --

In this step, better to abstract an interface called `Sorter`. We can move 
current sort implementation (SortDataRows/SortIntermediateFileMerger) into 
`MergeSortSorter` which implements `Sorter`, and we can add `ExternalSorter` 
later for in memory sort

So in this class, there will be an member variable called sorter.

Or, we can create another PR to do it if hard to do in one PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #252: [CARBONDATA-331] Modify Compressor i...

2016-10-20 Thread jackylk
GitHub user jackylk opened a pull request:

https://github.com/apache/incubator-carbondata/pull/252

[CARBONDATA-331] Modify Compressor interface and add no compression option

Modification in this PR:
1. Modify the compressor inteface to make it easier to add new compressors.
2. And add a DummyCompressor for not doing compression.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata dummy-compress

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/252.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #252


commit e90b96da57c084bfdfa8f8a81f5774b7dd1bbe43
Author: jackylk 
Date:   2016-10-19T14:48:02Z

change Compressor interface

commit acf8f7a5c4f907353a2713bb0573ffc1f1fa2c77
Author: jackylk 
Date:   2016-10-19T18:17:31Z

add dummy compressor

commit f22ca1d6938a6c90a057c53c23455241391ffbee
Author: jackylk 
Date:   2016-10-20T01:47:59Z

add example and fix style




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #251: [CARBONDATA-302]Added Writer process...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/251#discussion_r84247597
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/writer/DataWriterProcessorStepImpl.java
 ---
@@ -0,0 +1,374 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.steps.writer;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.carbon.CarbonTableIdentifier;
+import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.carbon.metadata.CarbonMetadata;
+import org.apache.carbondata.core.carbon.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.carbon.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.carbon.path.CarbonStorePath;
+import org.apache.carbondata.core.carbon.path.CarbonTablePath;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.constants.IgnoreDictionary;
+import org.apache.carbondata.core.keygenerator.KeyGenerator;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.CarbonUtilException;
+import org.apache.carbondata.processing.datatypes.GenericDataType;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+import org.apache.carbondata.processing.store.CarbonDataFileAttributes;
+import 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar;
+import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel;
+import org.apache.carbondata.processing.store.CarbonFactHandler;
+import 
org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException;
+import org.apache.carbondata.processing.util.CarbonDataProcessorUtil;
+
+/**
+ * It reads data from sorted files which are generated in previous sort 
step.
+ * And it writes data to carbondata file. It also generates mdk key while 
writing to carbondata file
+ */
+public class DataWriterProcessorStepImpl extends 
AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(DataWriterProcessorStepImpl.class.getName());
+
+  private String storeLocation;
+
+  private boolean[] isUseInvertedIndex;
+
+  private int[] dimLens;
+
+  private int dimensionCount;
+
+  private List wrapperColumnSchema;
+
+  private int[] colCardinality;
+
+  private SegmentProperties segmentProperties;
+
+  private KeyGenerator keyGenerator;
+
+  private CarbonFactHandler dataHandler;
+
+  private Map complexIndexMap;
+
+  private int noDictionaryCount;
+
+  private int complexDimensionCount;
+
+  private int measureCount;
+
+  private long readCounter;
+
+  private long writeCounter;
+
+  private int measureIndex = 
IgnoreDictionary.MEASURES_INDEX_IN_ROW.getIndex();
+
+  private int noDimByteArrayIndex = 

table schema is not matched "create table" command

2016-10-20 Thread shufflemylife
hi, i deployed carbon on yarn and tried to convert json data into carbondata 
using carbon-spark-sql.
when i run following create table command:
create table q3_carbon (cookie string, freq int, key bigint) STORED BY 
'carbondata';
it return succ.


then , i run :
show create table q3_carbon;
it return :
CREATE EXTERNAL TABLE `q3_carbon`(
  `col` array COMMENT 'from deserializer')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe'
WITH SERDEPROPERTIES (
  'tableName'='default.q3_carbon',
  'tablePath'='hdfs://###/Opt/CarbonStore/default/q3_carbon')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION
  'hdfs://###:8020/apps/hive/warehouse/q3_carbon'
TBLPROPERTIES (
  'spark.sql.sources.provider'='carbondata',
  'transient_lastDdlTime'='1476953210')
column schema is not matched with my create table command.


then i tried to run the convert sql :
insert overwrite TABLE q3_carbon select cookie, freq, key from jsonTable;


it show the ERROR message:
"Error in query: unresolved operator 'InsertIntoTable 
Relation[cookie#25,freq#26L,key#27L] 
CarbonDatasourceRelation(`default`.`q3_carbon`,None), Map(), true, false;


--
i'm using carbondata 0.2 ,  on spark1.6.1 & hadoop 2.7.1
the data in table jsonTable is like following:
{"cookie":"FB438C406B85C41456D80CD2F75E","freq":2,"key":3158409}





[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84240216
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/encoding/impl/RowEncoderImpl.java
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.encoding.impl;
+
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.cache.Cache;
+import org.apache.carbondata.core.cache.CacheProvider;
+import org.apache.carbondata.core.cache.CacheType;
+import org.apache.carbondata.core.cache.dictionary.Dictionary;
+import 
org.apache.carbondata.core.cache.dictionary.DictionaryColumnUniqueIdentifier;
+import org.apache.carbondata.core.constants.IgnoreDictionary;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import org.apache.carbondata.processing.newflow.encoding.FieldEncoder;
+import org.apache.carbondata.processing.newflow.encoding.RowEncoder;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.util.RemoveDictionaryUtil;
+
+/**
+ *
+ */
+public class RowEncoderImpl implements RowEncoder {
+
+  private CarbonDataLoadConfiguration configuration;
+
+  private AbstractDictionaryFieldEncoderImpl[] dictionaryFieldEncoders;
+
+  private NonDictionaryFieldEncoderImpl[] nonDictionaryFieldEncoders;
+
+  private MeasureFieldEncoderImpl[] measureFieldEncoders;
+
+  public RowEncoderImpl(DataField[] fields, CarbonDataLoadConfiguration 
configuration) {
+this.configuration = configuration;
+CacheProvider cacheProvider = CacheProvider.getInstance();
+Cache cache = 
cacheProvider
+.createCache(CacheType.REVERSE_DICTIONARY,
+configuration.getTableIdentifier().getStorePath());
+List dictFieldEncoders = new 
ArrayList<>();
+List nonDictFieldEncoders = new 
ArrayList<>();
+List measureFieldEncoderList = new 
ArrayList<>();
+
+long lruCacheStartTime = System.currentTimeMillis();
+
+for (int i = 0; i < fields.length; i++) {
+  FieldEncoder fieldEncoder = FieldEncoderFactory.getInstance()
+  .createFieldEncoder(fields[i], cache,
+  
configuration.getTableIdentifier().getCarbonTableIdentifier(), i);
+  if (fieldEncoder instanceof AbstractDictionaryFieldEncoderImpl) {
+dictFieldEncoders.add((AbstractDictionaryFieldEncoderImpl) 
fieldEncoder);
+  } else if (fieldEncoder instanceof NonDictionaryFieldEncoderImpl) {
+nonDictFieldEncoders.add((NonDictionaryFieldEncoderImpl) 
fieldEncoder);
+  } else if (fieldEncoder instanceof MeasureFieldEncoderImpl) {
+measureFieldEncoderList.add((MeasureFieldEncoderImpl)fieldEncoder);
+  }
+}
+CarbonTimeStatisticsFactory.getLoadStatisticsInstance()
+.recordLruCacheLoadTime((System.currentTimeMillis() - 
lruCacheStartTime) / 1000.0);
+dictionaryFieldEncoders =
+dictFieldEncoders.toArray(new 
AbstractDictionaryFieldEncoderImpl[dictFieldEncoders.size()]);
+nonDictionaryFieldEncoders = nonDictFieldEncoders
+.toArray(new 
NonDictionaryFieldEncoderImpl[nonDictFieldEncoders.size()]);
+measureFieldEncoders = measureFieldEncoderList
+.toArray(new 
MeasureFieldEncoderImpl[measureFieldEncoderList.size()]);
+
+  }
+
+  @Override public CarbonRow encode(CarbonRow row) throws 
CarbonDataLoadingException {
--- End 

[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84238092
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/encoding/RowEncoder.java
 ---
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.encoding;
+
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+/**
+ * Encodes the row
+ */
+public interface RowEncoder {
--- End diff --

Calling this as Encoder is confusing, this step only replace dictionary 
value with dictionary key, right? It does not do actual encoding which is done 
in final write step

Suggest change to `KeyedRowConverter`. See other comments in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #244: [CARBONDATA-300] Added Encoder proce...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/244#discussion_r84243728
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/encodestep/EncoderProcessorStepImpl.java
 ---
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.processing.newflow.steps.encodestep;
+
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import org.apache.carbondata.processing.newflow.encoding.RowEncoder;
+import 
org.apache.carbondata.processing.newflow.encoding.impl.RowEncoderImpl;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+
+/**
+ * Encode data with dictionary values and composed with bit/byte packed 
key.
+ * nondictionary values are packed as bytes, and complex types are also 
packed as bytes.
+ */
+public class EncoderProcessorStepImpl extends 
AbstractDataLoadProcessorStep {
+
+  private RowEncoder encoder;
+
+  public EncoderProcessorStepImpl(CarbonDataLoadConfiguration 
configuration,
+  AbstractDataLoadProcessorStep child) {
+super(configuration, child);
+  }
+
+  @Override public DataField[] getOutput() {
+return child.getOutput();
+  }
+
+  @Override public void intialize() throws CarbonDataLoadingException {
+encoder = new RowEncoderImpl(child.getOutput(), configuration);
+child.intialize();
+  }
+
+  @Override protected CarbonRow processRow(CarbonRow row) {
+return encoder.encode(row);
+  }
+
+  @Override public void finish() {
+encoder.finish();
+  }
+
+  @Override public void close() {
+
--- End diff --

in case of failure, should there be encode.close()?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84227253
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/complexobjects/ArrayObject.java
 ---
@@ -0,0 +1,18 @@
+package org.apache.carbondata.processing.newflow.complexobjects;
--- End diff --

please add license header for all files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84234651
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java
 ---
@@ -0,0 +1,171 @@
+package org.apache.carbondata.processing.newflow.steps.input;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.parser.CarbonParserFactory;
+import org.apache.carbondata.processing.newflow.parser.GenericParser;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+
+/**
+ * It reads data from record reader and sends data to next step.
+ */
+public class InputProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(InputProcessorStepImpl.class.getName());
+
+  private GenericParser[] genericParsers;
+
+  private List> inputIterators;
+
+  public InputProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child, List> 
inputIterators) {
+super(configuration, child);
+this.inputIterators = inputIterators;
+  }
+
+  @Override public DataField[] getOutput() {
+DataField[] fields = configuration.getDataFields();
+String[] header = configuration.getHeader();
+DataField[] output = new DataField[fields.length];
+int k = 0;
+for (int i = 0; i < header.length; i++) {
+  for (int j = 0; j < fields.length; j++) {
+if 
(header[j].equalsIgnoreCase(fields[j].getColumn().getColName())) {
+  output[k++] = fields[j];
+  break;
+}
+  }
+}
+return output;
+  }
+
+  @Override public void initialize() throws CarbonDataLoadingException {
+DataField[] output = getOutput();
+genericParsers = new GenericParser[output.length];
+for (int i = 0; i < genericParsers.length; i++) {
+  genericParsers[i] = 
CarbonParserFactory.createParser(output[i].getColumn(),
+  (String[]) configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS));
+}
+  }
+
+  private int getNumberOfCores() {
+int numberOfCores;
+try {
+  numberOfCores = Integer.parseInt(CarbonProperties.getInstance()
+  .getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+  CarbonCommonConstants.NUM_CORES_DEFAULT_VAL));
+} catch (NumberFormatException exc) {
+  numberOfCores = 
Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL);
+}
+return numberOfCores;
+  }
+
+  private int getBatchSize() {
+int batchSize;
+try {
+  batchSize = Integer.parseInt(configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE,
+  
DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT).toString());
+} catch (NumberFormatException exc) {
+  batchSize = 
Integer.parseInt(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT);
+}
+return batchSize;
+  }
+
+  @Override public Iterator[] execute() {
+int batchSize = getBatchSize();
+List>[] readerIterators = 
partitionInputReaderIterators();
+Iterator[] outIterators = new 
Iterator[readerIterators.length];
+for (int i = 0; i < outIterators.length; i++) {
+  outIterators[i] = new InputProcessorIterator(readerIterators[i], 
genericParsers, batchSize);
+}
+return outIterators;
+  }
+
+  private List>[] partitionInputReaderIterators() {
+int numberOfCores = getNumberOfCores();
+if (inputIterators.size() < numberOfCores) {
+  numberOfCores = inputIterators.size();
+}
+List>[] 

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84230036
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java
 ---
@@ -0,0 +1,171 @@
+package org.apache.carbondata.processing.newflow.steps.input;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.parser.CarbonParserFactory;
+import org.apache.carbondata.processing.newflow.parser.GenericParser;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+
+/**
+ * It reads data from record reader and sends data to next step.
+ */
+public class InputProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(InputProcessorStepImpl.class.getName());
+
+  private GenericParser[] genericParsers;
+
+  private List> inputIterators;
+
+  public InputProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child, List> 
inputIterators) {
+super(configuration, child);
+this.inputIterators = inputIterators;
+  }
+
+  @Override public DataField[] getOutput() {
+DataField[] fields = configuration.getDataFields();
+String[] header = configuration.getHeader();
+DataField[] output = new DataField[fields.length];
+int k = 0;
+for (int i = 0; i < header.length; i++) {
+  for (int j = 0; j < fields.length; j++) {
+if 
(header[j].equalsIgnoreCase(fields[j].getColumn().getColName())) {
+  output[k++] = fields[j];
+  break;
+}
+  }
+}
+return output;
+  }
+
+  @Override public void initialize() throws CarbonDataLoadingException {
+DataField[] output = getOutput();
+genericParsers = new GenericParser[output.length];
+for (int i = 0; i < genericParsers.length; i++) {
+  genericParsers[i] = 
CarbonParserFactory.createParser(output[i].getColumn(),
+  (String[]) configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS));
+}
+  }
+
+  private int getNumberOfCores() {
+int numberOfCores;
+try {
+  numberOfCores = Integer.parseInt(CarbonProperties.getInstance()
+  .getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+  CarbonCommonConstants.NUM_CORES_DEFAULT_VAL));
+} catch (NumberFormatException exc) {
+  numberOfCores = 
Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL);
+}
+return numberOfCores;
+  }
+
+  private int getBatchSize() {
+int batchSize;
+try {
+  batchSize = Integer.parseInt(configuration
--- End diff --

move `configuration` to next line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84233387
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java
 ---
@@ -0,0 +1,171 @@
+package org.apache.carbondata.processing.newflow.steps.input;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.parser.CarbonParserFactory;
+import org.apache.carbondata.processing.newflow.parser.GenericParser;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+
+/**
+ * It reads data from record reader and sends data to next step.
+ */
+public class InputProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(InputProcessorStepImpl.class.getName());
+
+  private GenericParser[] genericParsers;
+
+  private List> inputIterators;
+
+  public InputProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child, List> 
inputIterators) {
+super(configuration, child);
+this.inputIterators = inputIterators;
+  }
+
+  @Override public DataField[] getOutput() {
+DataField[] fields = configuration.getDataFields();
+String[] header = configuration.getHeader();
+DataField[] output = new DataField[fields.length];
+int k = 0;
+for (int i = 0; i < header.length; i++) {
+  for (int j = 0; j < fields.length; j++) {
+if 
(header[j].equalsIgnoreCase(fields[j].getColumn().getColName())) {
+  output[k++] = fields[j];
+  break;
+}
+  }
+}
+return output;
+  }
+
+  @Override public void initialize() throws CarbonDataLoadingException {
+DataField[] output = getOutput();
+genericParsers = new GenericParser[output.length];
+for (int i = 0; i < genericParsers.length; i++) {
+  genericParsers[i] = 
CarbonParserFactory.createParser(output[i].getColumn(),
+  (String[]) configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS));
+}
+  }
+
+  private int getNumberOfCores() {
+int numberOfCores;
+try {
+  numberOfCores = Integer.parseInt(CarbonProperties.getInstance()
+  .getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+  CarbonCommonConstants.NUM_CORES_DEFAULT_VAL));
+} catch (NumberFormatException exc) {
+  numberOfCores = 
Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL);
+}
+return numberOfCores;
+  }
+
+  private int getBatchSize() {
+int batchSize;
+try {
+  batchSize = Integer.parseInt(configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE,
+  
DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT).toString());
+} catch (NumberFormatException exc) {
+  batchSize = 
Integer.parseInt(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT);
+}
+return batchSize;
+  }
+
+  @Override public Iterator[] execute() {
+int batchSize = getBatchSize();
+List>[] readerIterators = 
partitionInputReaderIterators();
+Iterator[] outIterators = new 
Iterator[readerIterators.length];
+for (int i = 0; i < outIterators.length; i++) {
+  outIterators[i] = new InputProcessorIterator(readerIterators[i], 
genericParsers, batchSize);
+}
+return outIterators;
+  }
+
+  private List>[] partitionInputReaderIterators() {
+int numberOfCores = getNumberOfCores();
+if (inputIterators.size() < numberOfCores) {
+  numberOfCores = inputIterators.size();
+}
+List>[] 

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84227656
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/parser/CarbonParserFactory.java
 ---
@@ -0,0 +1,40 @@
+package org.apache.carbondata.processing.newflow.parser;
+
+import java.util.List;
+
+import 
org.apache.carbondata.core.carbon.metadata.schema.table.column.CarbonColumn;
+import 
org.apache.carbondata.core.carbon.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.processing.newflow.parser.impl.ArrayParserImpl;
+import 
org.apache.carbondata.processing.newflow.parser.impl.PrimitiveParserImpl;
+import 
org.apache.carbondata.processing.newflow.parser.impl.StructParserImpl;
+
+public class CarbonParserFactory {
+
+  public static GenericParser createParser(CarbonColumn carbonColumn, 
String[] complexDelimiters) {
+return createParser(carbonColumn, complexDelimiters, 0);
+  }
+
+  private static GenericParser createParser(CarbonColumn carbonColumn, 
String[] complexDelimiters,
--- End diff --

please give function description to these two `createParser` functions, it 
is not so clear that what the `counter` it is


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84229694
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/constants/DataLoadProcessorConstants.java
 ---
@@ -33,4 +33,8 @@
   public static final String COMPLEX_DELIMITERS = "COMPLEX_DELIMITERS";
 
   public static final String DIMENSION_LENGTHS = "DIMENSION_LENGTHS";
+
--- End diff --

can you rename this class to a more meaningful name like `DataLoadOptions`. 
It will be exposed to user as data load options, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84229327
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/complexobjects/StructObject.java
 ---
@@ -0,0 +1,19 @@
+package org.apache.carbondata.processing.newflow.complexobjects;
+
+public class StructObject {
+
+  private Object[] data;
+
+  public StructObject(Object[] data) {
+this.data = data;
+  }
+
+  public Object[] getData() {
+return data;
+  }
+
+  public void setData(Object[] data) {
--- End diff --

instead of just setting data, I think it is better to add function like 
`addMember(Object member)`, so that in `StructParserImpl` can make use of 
`addMember`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84236268
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java
 ---
@@ -0,0 +1,171 @@
+package org.apache.carbondata.processing.newflow.steps.input;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.parser.CarbonParserFactory;
+import org.apache.carbondata.processing.newflow.parser.GenericParser;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+
+/**
+ * It reads data from record reader and sends data to next step.
+ */
+public class InputProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(InputProcessorStepImpl.class.getName());
+
+  private GenericParser[] genericParsers;
+
+  private List> inputIterators;
+
+  public InputProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child, List> 
inputIterators) {
+super(configuration, child);
+this.inputIterators = inputIterators;
+  }
+
+  @Override public DataField[] getOutput() {
+DataField[] fields = configuration.getDataFields();
+String[] header = configuration.getHeader();
+DataField[] output = new DataField[fields.length];
+int k = 0;
+for (int i = 0; i < header.length; i++) {
+  for (int j = 0; j < fields.length; j++) {
+if 
(header[j].equalsIgnoreCase(fields[j].getColumn().getColName())) {
+  output[k++] = fields[j];
+  break;
+}
+  }
+}
+return output;
+  }
+
+  @Override public void initialize() throws CarbonDataLoadingException {
+DataField[] output = getOutput();
+genericParsers = new GenericParser[output.length];
+for (int i = 0; i < genericParsers.length; i++) {
+  genericParsers[i] = 
CarbonParserFactory.createParser(output[i].getColumn(),
+  (String[]) configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS));
+}
+  }
+
+  private int getNumberOfCores() {
+int numberOfCores;
+try {
+  numberOfCores = Integer.parseInt(CarbonProperties.getInstance()
+  .getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+  CarbonCommonConstants.NUM_CORES_DEFAULT_VAL));
+} catch (NumberFormatException exc) {
+  numberOfCores = 
Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL);
+}
+return numberOfCores;
+  }
+
+  private int getBatchSize() {
+int batchSize;
+try {
+  batchSize = Integer.parseInt(configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE,
+  
DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT).toString());
+} catch (NumberFormatException exc) {
+  batchSize = 
Integer.parseInt(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT);
+}
+return batchSize;
+  }
+
+  @Override public Iterator[] execute() {
+int batchSize = getBatchSize();
+List>[] readerIterators = 
partitionInputReaderIterators();
+Iterator[] outIterators = new 
Iterator[readerIterators.length];
+for (int i = 0; i < outIterators.length; i++) {
+  outIterators[i] = new InputProcessorIterator(readerIterators[i], 
genericParsers, batchSize);
+}
+return outIterators;
+  }
+
+  private List>[] partitionInputReaderIterators() {
+int numberOfCores = getNumberOfCores();
+if (inputIterators.size() < numberOfCores) {
+  numberOfCores = inputIterators.size();
+}
+List>[] 

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84234179
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java
 ---
@@ -0,0 +1,171 @@
+package org.apache.carbondata.processing.newflow.steps.input;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.parser.CarbonParserFactory;
+import org.apache.carbondata.processing.newflow.parser.GenericParser;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+
+/**
+ * It reads data from record reader and sends data to next step.
+ */
+public class InputProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(InputProcessorStepImpl.class.getName());
+
+  private GenericParser[] genericParsers;
+
+  private List> inputIterators;
+
+  public InputProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child, List> 
inputIterators) {
+super(configuration, child);
+this.inputIterators = inputIterators;
+  }
+
+  @Override public DataField[] getOutput() {
+DataField[] fields = configuration.getDataFields();
+String[] header = configuration.getHeader();
+DataField[] output = new DataField[fields.length];
+int k = 0;
+for (int i = 0; i < header.length; i++) {
+  for (int j = 0; j < fields.length; j++) {
+if 
(header[j].equalsIgnoreCase(fields[j].getColumn().getColName())) {
+  output[k++] = fields[j];
+  break;
+}
+  }
+}
+return output;
+  }
+
+  @Override public void initialize() throws CarbonDataLoadingException {
+DataField[] output = getOutput();
+genericParsers = new GenericParser[output.length];
+for (int i = 0; i < genericParsers.length; i++) {
+  genericParsers[i] = 
CarbonParserFactory.createParser(output[i].getColumn(),
+  (String[]) configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS));
+}
+  }
+
+  private int getNumberOfCores() {
+int numberOfCores;
+try {
+  numberOfCores = Integer.parseInt(CarbonProperties.getInstance()
+  .getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+  CarbonCommonConstants.NUM_CORES_DEFAULT_VAL));
+} catch (NumberFormatException exc) {
+  numberOfCores = 
Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL);
+}
+return numberOfCores;
+  }
+
+  private int getBatchSize() {
+int batchSize;
+try {
+  batchSize = Integer.parseInt(configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE,
+  
DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT).toString());
+} catch (NumberFormatException exc) {
+  batchSize = 
Integer.parseInt(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT);
+}
+return batchSize;
+  }
+
+  @Override public Iterator[] execute() {
+int batchSize = getBatchSize();
+List>[] readerIterators = 
partitionInputReaderIterators();
+Iterator[] outIterators = new 
Iterator[readerIterators.length];
+for (int i = 0; i < outIterators.length; i++) {
+  outIterators[i] = new InputProcessorIterator(readerIterators[i], 
genericParsers, batchSize);
--- End diff --

In this case, genericParser should be thread-safe, please add comment in 
`GenericParser` interface and ensure it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if 

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84232050
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java
 ---
@@ -0,0 +1,171 @@
+package org.apache.carbondata.processing.newflow.steps.input;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.parser.CarbonParserFactory;
+import org.apache.carbondata.processing.newflow.parser.GenericParser;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+
+/**
+ * It reads data from record reader and sends data to next step.
+ */
+public class InputProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(InputProcessorStepImpl.class.getName());
+
+  private GenericParser[] genericParsers;
+
+  private List> inputIterators;
+
+  public InputProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child, List> 
inputIterators) {
+super(configuration, child);
+this.inputIterators = inputIterators;
+  }
+
+  @Override public DataField[] getOutput() {
+DataField[] fields = configuration.getDataFields();
+String[] header = configuration.getHeader();
+DataField[] output = new DataField[fields.length];
+int k = 0;
+for (int i = 0; i < header.length; i++) {
+  for (int j = 0; j < fields.length; j++) {
+if 
(header[j].equalsIgnoreCase(fields[j].getColumn().getColName())) {
+  output[k++] = fields[j];
+  break;
+}
+  }
+}
+return output;
+  }
+
+  @Override public void initialize() throws CarbonDataLoadingException {
+DataField[] output = getOutput();
+genericParsers = new GenericParser[output.length];
+for (int i = 0; i < genericParsers.length; i++) {
+  genericParsers[i] = 
CarbonParserFactory.createParser(output[i].getColumn(),
+  (String[]) configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS));
+}
+  }
+
+  private int getNumberOfCores() {
+int numberOfCores;
+try {
+  numberOfCores = Integer.parseInt(CarbonProperties.getInstance()
+  .getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+  CarbonCommonConstants.NUM_CORES_DEFAULT_VAL));
+} catch (NumberFormatException exc) {
+  numberOfCores = 
Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL);
+}
+return numberOfCores;
+  }
+
+  private int getBatchSize() {
--- End diff --

suggest to change to `CarbonProperties.loadProcessBatchSize()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84233633
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java
 ---
@@ -0,0 +1,171 @@
+package org.apache.carbondata.processing.newflow.steps.input;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.parser.CarbonParserFactory;
+import org.apache.carbondata.processing.newflow.parser.GenericParser;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+
+/**
+ * It reads data from record reader and sends data to next step.
+ */
+public class InputProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(InputProcessorStepImpl.class.getName());
+
+  private GenericParser[] genericParsers;
+
+  private List> inputIterators;
+
+  public InputProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child, List> 
inputIterators) {
+super(configuration, child);
+this.inputIterators = inputIterators;
+  }
+
+  @Override public DataField[] getOutput() {
+DataField[] fields = configuration.getDataFields();
+String[] header = configuration.getHeader();
+DataField[] output = new DataField[fields.length];
+int k = 0;
+for (int i = 0; i < header.length; i++) {
+  for (int j = 0; j < fields.length; j++) {
+if 
(header[j].equalsIgnoreCase(fields[j].getColumn().getColName())) {
+  output[k++] = fields[j];
+  break;
+}
+  }
+}
+return output;
+  }
+
+  @Override public void initialize() throws CarbonDataLoadingException {
+DataField[] output = getOutput();
+genericParsers = new GenericParser[output.length];
+for (int i = 0; i < genericParsers.length; i++) {
+  genericParsers[i] = 
CarbonParserFactory.createParser(output[i].getColumn(),
+  (String[]) configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS));
+}
+  }
+
+  private int getNumberOfCores() {
+int numberOfCores;
+try {
+  numberOfCores = Integer.parseInt(CarbonProperties.getInstance()
+  .getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+  CarbonCommonConstants.NUM_CORES_DEFAULT_VAL));
+} catch (NumberFormatException exc) {
+  numberOfCores = 
Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL);
+}
+return numberOfCores;
+  }
+
+  private int getBatchSize() {
+int batchSize;
+try {
+  batchSize = Integer.parseInt(configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE,
+  
DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT).toString());
+} catch (NumberFormatException exc) {
+  batchSize = 
Integer.parseInt(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT);
+}
+return batchSize;
+  }
+
+  @Override public Iterator[] execute() {
+int batchSize = getBatchSize();
+List>[] readerIterators = 
partitionInputReaderIterators();
+Iterator[] outIterators = new 
Iterator[readerIterators.length];
+for (int i = 0; i < outIterators.length; i++) {
+  outIterators[i] = new InputProcessorIterator(readerIterators[i], 
genericParsers, batchSize);
+}
+return outIterators;
+  }
+
+  private List>[] partitionInputReaderIterators() {
+int numberOfCores = getNumberOfCores();
+if (inputIterators.size() < numberOfCores) {
+  numberOfCores = inputIterators.size();
+}
+List>[] 

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84228406
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/parser/GenericParser.java
 ---
@@ -0,0 +1,22 @@
+package org.apache.carbondata.processing.newflow.parser;
+
+/**
+ * Parse the data according to implementation, The implementation classes 
can be struct, array or
+ * map datatypes.
+ */
+public interface GenericParser {
+
+  /**
+   * Parse the data as per the delimiter
+   * @param data
+   * @return
+   */
+  E parse(String data);
+
+  /**
+   * Children of the parser.
+   * @param parser
+   */
+  void addChildren(GenericParser parser);
--- End diff --

This can be the behavior of parser for complex type only, so I think you 
can create a sub-interface for `ComplexTypePaser` extending `GenericParser`. Or 
even better, you can remove it and add it in `ArrayParserImpl` and 
`StructParserImpl` only


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84233734
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java
 ---
@@ -0,0 +1,171 @@
+package org.apache.carbondata.processing.newflow.steps.input;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.parser.CarbonParserFactory;
+import org.apache.carbondata.processing.newflow.parser.GenericParser;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+
+/**
+ * It reads data from record reader and sends data to next step.
+ */
+public class InputProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(InputProcessorStepImpl.class.getName());
+
+  private GenericParser[] genericParsers;
+
+  private List> inputIterators;
+
+  public InputProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child, List> 
inputIterators) {
+super(configuration, child);
+this.inputIterators = inputIterators;
+  }
+
+  @Override public DataField[] getOutput() {
+DataField[] fields = configuration.getDataFields();
+String[] header = configuration.getHeader();
+DataField[] output = new DataField[fields.length];
+int k = 0;
+for (int i = 0; i < header.length; i++) {
+  for (int j = 0; j < fields.length; j++) {
+if 
(header[j].equalsIgnoreCase(fields[j].getColumn().getColName())) {
+  output[k++] = fields[j];
+  break;
+}
+  }
+}
+return output;
+  }
+
+  @Override public void initialize() throws CarbonDataLoadingException {
+DataField[] output = getOutput();
+genericParsers = new GenericParser[output.length];
+for (int i = 0; i < genericParsers.length; i++) {
+  genericParsers[i] = 
CarbonParserFactory.createParser(output[i].getColumn(),
+  (String[]) configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS));
+}
+  }
+
+  private int getNumberOfCores() {
+int numberOfCores;
+try {
+  numberOfCores = Integer.parseInt(CarbonProperties.getInstance()
+  .getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+  CarbonCommonConstants.NUM_CORES_DEFAULT_VAL));
+} catch (NumberFormatException exc) {
+  numberOfCores = 
Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL);
+}
+return numberOfCores;
+  }
+
+  private int getBatchSize() {
+int batchSize;
+try {
+  batchSize = Integer.parseInt(configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE,
+  
DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT).toString());
+} catch (NumberFormatException exc) {
+  batchSize = 
Integer.parseInt(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT);
+}
+return batchSize;
+  }
+
+  @Override public Iterator[] execute() {
+int batchSize = getBatchSize();
+List>[] readerIterators = 
partitionInputReaderIterators();
+Iterator[] outIterators = new 
Iterator[readerIterators.length];
+for (int i = 0; i < outIterators.length; i++) {
+  outIterators[i] = new InputProcessorIterator(readerIterators[i], 
genericParsers, batchSize);
+}
+return outIterators;
+  }
+
+  private List>[] partitionInputReaderIterators() {
+int numberOfCores = getNumberOfCores();
--- End diff --

please add some comment in this function to describe the parallelism


---
If your project is set up for it, you can 

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84236755
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java
 ---
@@ -0,0 +1,171 @@
+package org.apache.carbondata.processing.newflow.steps.input;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.parser.CarbonParserFactory;
+import org.apache.carbondata.processing.newflow.parser.GenericParser;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+
+/**
+ * It reads data from record reader and sends data to next step.
+ */
+public class InputProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(InputProcessorStepImpl.class.getName());
+
+  private GenericParser[] genericParsers;
+
+  private List> inputIterators;
+
+  public InputProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child, List> 
inputIterators) {
+super(configuration, child);
+this.inputIterators = inputIterators;
+  }
+
+  @Override public DataField[] getOutput() {
+DataField[] fields = configuration.getDataFields();
+String[] header = configuration.getHeader();
+DataField[] output = new DataField[fields.length];
+int k = 0;
+for (int i = 0; i < header.length; i++) {
+  for (int j = 0; j < fields.length; j++) {
+if 
(header[j].equalsIgnoreCase(fields[j].getColumn().getColName())) {
+  output[k++] = fields[j];
+  break;
+}
+  }
+}
+return output;
+  }
+
+  @Override public void initialize() throws CarbonDataLoadingException {
+DataField[] output = getOutput();
+genericParsers = new GenericParser[output.length];
+for (int i = 0; i < genericParsers.length; i++) {
+  genericParsers[i] = 
CarbonParserFactory.createParser(output[i].getColumn(),
+  (String[]) configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS));
+}
+  }
+
+  private int getNumberOfCores() {
+int numberOfCores;
+try {
+  numberOfCores = Integer.parseInt(CarbonProperties.getInstance()
+  .getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+  CarbonCommonConstants.NUM_CORES_DEFAULT_VAL));
+} catch (NumberFormatException exc) {
+  numberOfCores = 
Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL);
+}
+return numberOfCores;
+  }
+
+  private int getBatchSize() {
+int batchSize;
+try {
+  batchSize = Integer.parseInt(configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE,
+  
DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT).toString());
+} catch (NumberFormatException exc) {
+  batchSize = 
Integer.parseInt(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT);
+}
+return batchSize;
+  }
+
+  @Override public Iterator[] execute() {
+int batchSize = getBatchSize();
+List>[] readerIterators = 
partitionInputReaderIterators();
+Iterator[] outIterators = new 
Iterator[readerIterators.length];
+for (int i = 0; i < outIterators.length; i++) {
+  outIterators[i] = new InputProcessorIterator(readerIterators[i], 
genericParsers, batchSize);
+}
+return outIterators;
+  }
+
+  private List>[] partitionInputReaderIterators() {
+int numberOfCores = getNumberOfCores();
+if (inputIterators.size() < numberOfCores) {
+  numberOfCores = inputIterators.size();
+}
+List>[] 

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84237159
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java
 ---
@@ -0,0 +1,171 @@
+package org.apache.carbondata.processing.newflow.steps.input;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.parser.CarbonParserFactory;
+import org.apache.carbondata.processing.newflow.parser.GenericParser;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+
+/**
+ * It reads data from record reader and sends data to next step.
+ */
+public class InputProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(InputProcessorStepImpl.class.getName());
+
+  private GenericParser[] genericParsers;
+
+  private List> inputIterators;
+
+  public InputProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child, List> 
inputIterators) {
+super(configuration, child);
+this.inputIterators = inputIterators;
+  }
+
+  @Override public DataField[] getOutput() {
+DataField[] fields = configuration.getDataFields();
+String[] header = configuration.getHeader();
+DataField[] output = new DataField[fields.length];
+int k = 0;
+for (int i = 0; i < header.length; i++) {
+  for (int j = 0; j < fields.length; j++) {
+if 
(header[j].equalsIgnoreCase(fields[j].getColumn().getColName())) {
+  output[k++] = fields[j];
+  break;
+}
+  }
+}
+return output;
+  }
+
+  @Override public void initialize() throws CarbonDataLoadingException {
+DataField[] output = getOutput();
+genericParsers = new GenericParser[output.length];
+for (int i = 0; i < genericParsers.length; i++) {
+  genericParsers[i] = 
CarbonParserFactory.createParser(output[i].getColumn(),
+  (String[]) configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS));
+}
+  }
+
+  private int getNumberOfCores() {
+int numberOfCores;
+try {
+  numberOfCores = Integer.parseInt(CarbonProperties.getInstance()
+  .getProperty(CarbonCommonConstants.NUM_CORES_LOADING,
+  CarbonCommonConstants.NUM_CORES_DEFAULT_VAL));
+} catch (NumberFormatException exc) {
+  numberOfCores = 
Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL);
+}
+return numberOfCores;
+  }
+
+  private int getBatchSize() {
+int batchSize;
+try {
+  batchSize = Integer.parseInt(configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE,
+  
DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT).toString());
+} catch (NumberFormatException exc) {
+  batchSize = 
Integer.parseInt(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT);
+}
+return batchSize;
+  }
+
+  @Override public Iterator[] execute() {
+int batchSize = getBatchSize();
+List>[] readerIterators = 
partitionInputReaderIterators();
+Iterator[] outIterators = new 
Iterator[readerIterators.length];
+for (int i = 0; i < outIterators.length; i++) {
+  outIterators[i] = new InputProcessorIterator(readerIterators[i], 
genericParsers, batchSize);
+}
+return outIterators;
+  }
+
+  private List>[] partitionInputReaderIterators() {
+int numberOfCores = getNumberOfCores();
+if (inputIterators.size() < numberOfCores) {
+  numberOfCores = inputIterators.size();
+}
+List>[] 

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-20 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/240#discussion_r84231719
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java
 ---
@@ -0,0 +1,171 @@
+package org.apache.carbondata.processing.newflow.steps.input;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.parser.CarbonParserFactory;
+import org.apache.carbondata.processing.newflow.parser.GenericParser;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+
+/**
+ * It reads data from record reader and sends data to next step.
+ */
+public class InputProcessorStepImpl extends AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(InputProcessorStepImpl.class.getName());
+
+  private GenericParser[] genericParsers;
+
+  private List> inputIterators;
+
+  public InputProcessorStepImpl(CarbonDataLoadConfiguration configuration,
+  AbstractDataLoadProcessorStep child, List> 
inputIterators) {
+super(configuration, child);
+this.inputIterators = inputIterators;
+  }
+
+  @Override public DataField[] getOutput() {
+DataField[] fields = configuration.getDataFields();
+String[] header = configuration.getHeader();
+DataField[] output = new DataField[fields.length];
+int k = 0;
+for (int i = 0; i < header.length; i++) {
+  for (int j = 0; j < fields.length; j++) {
+if 
(header[j].equalsIgnoreCase(fields[j].getColumn().getColName())) {
+  output[k++] = fields[j];
+  break;
+}
+  }
+}
+return output;
+  }
+
+  @Override public void initialize() throws CarbonDataLoadingException {
+DataField[] output = getOutput();
+genericParsers = new GenericParser[output.length];
+for (int i = 0; i < genericParsers.length; i++) {
+  genericParsers[i] = 
CarbonParserFactory.createParser(output[i].getColumn(),
+  (String[]) configuration
+  
.getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS));
+}
+  }
+
+  private int getNumberOfCores() {
--- End diff --

I think these functions can be shared across the project, so considering 
move them into CarbonProperties directly, like 
`CarbonProperties.numberOfCores()` and `CarbonProperties.batchSize()` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (CARBONDATA-331) Support no compression option while loading

2016-10-20 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-331:
---

 Summary: Support no compression option while loading
 Key: CARBONDATA-331
 URL: https://issues.apache.org/jira/browse/CARBONDATA-331
 Project: CarbonData
  Issue Type: New Feature
Reporter: Jacky Li


Modify the compressor inteface and add a DummyCompressor for not doing 
compression.
This interface can be extend later for adding new compressors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)