[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3509: [CARBONDATA-3618] Update query should throw exception if key has more than one value

2019-12-27 Thread GitBox
ajantha-bhat commented on a change in pull request #3509: [CARBONDATA-3618] 
Update query should throw exception if key has more than one value
URL: https://github.com/apache/carbondata/pull/3509#discussion_r361782777
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
 ##
 @@ -135,7 +136,24 @@ private[sql] case class CarbonProjectForUpdateCommand(
   else {
 Dataset.ofRows(sparkSession, plan)
   }
-
+  if (CarbonProperties.getValidateUpdateKeyValueMapping) {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3509: [CARBONDATA-3618] Update query should throw exception if key has more than one value

2019-12-27 Thread GitBox
ajantha-bhat commented on a change in pull request #3509: [CARBONDATA-3618] 
Update query should throw exception if key has more than one value
URL: https://github.com/apache/carbondata/pull/3509#discussion_r361782772
 
 

 ##
 File path: docs/configuration-parameters.md
 ##
 @@ -152,6 +152,7 @@ This section provides the details of all the 
configurations required for the Car
 | carbon.insert.storage.level | MEMORY_AND_DISK | Storage level to persist 
dataset of a RDD/dataframe. Applicable when ***carbon.insert.persist.enable*** 
is **true**, if user's executor has less memory, set this parameter to 
'MEMORY_AND_DISK_SER' or other storage level to correspond to different 
environment. [See 
detail](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence).
 |
 | carbon.update.persist.enable | true | Configuration to enable the dataset of 
RDD/dataframe to persist data. Enabling this will reduce the execution time of 
UPDATE operation. |
 | carbon.update.storage.level | MEMORY_AND_DISK | Storage level to persist 
dataset of a RDD/dataframe. Applicable when ***carbon.update.persist.enable*** 
is **true**, if user's executor has less memory, set this parameter to 
'MEMORY_AND_DISK_SER' or other storage level to correspond to different 
environment. [See 
detail](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence).
 |
+| carbon.update.validate.key.to.value.mapping | true | By default this 
property is true, so update will validate key value mapping. This validation 
might have slight degrade in performance of update query. If user knows that 
key value mapping is correct, can disable this validation for better update 
performance by setting this property to false. |
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] niuge01 commented on issue #3532: [CARBONDATA-3557] Write flink streaming data to partition table

2019-12-27 Thread GitBox
niuge01 commented on issue #3532: [CARBONDATA-3557] Write flink streaming data 
to partition table
URL: https://github.com/apache/carbondata/pull/3532#issuecomment-569396166
 
 
   please test this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3509: [CARBONDATA-3618] Update query should throw exception if key has more than one value

2019-12-27 Thread GitBox
ajantha-bhat commented on a change in pull request #3509: [CARBONDATA-3618] 
Update query should throw exception if key has more than one value
URL: https://github.com/apache/carbondata/pull/3509#discussion_r361782285
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
 ##
 @@ -135,7 +136,24 @@ private[sql] case class CarbonProjectForUpdateCommand(
   else {
 Dataset.ofRows(sparkSession, plan)
   }
-
+  if (CarbonProperties.getValidateUpdateKeyValueMapping) {
+// If more than one value present for the update key, should fail 
the update
+val ds = 
dataSet.select(CarbonCommonConstants.CARBON_IMPLICIT_COLUMN_TUPLEID)
+  .groupBy(CarbonCommonConstants.CARBON_IMPLICIT_COLUMN_TUPLEID)
+  .count()
+  .select("count")
+  .filter(col("count") > lit(1))
+  .limit(1)
+  .collect()
 
 Review comment:
   @jackylk : yes, it is now in the same place as you are suggesting.
   line 135, we are making `Dataset.ofRows(sparkSession, plan)`, on top of this 
only I am collecting tupleId.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3532: [CARBONDATA-3557] Write flink streaming data to partition table

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3532: [CARBONDATA-3557] Write flink streaming 
data to partition table
URL: https://github.com/apache/carbondata/pull/3532#issuecomment-569394437
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1309/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] niuge01 commented on issue #3532: [CARBONDATA-3557] Write flink streaming data to partition table

2019-12-27 Thread GitBox
niuge01 commented on issue #3532: [CARBONDATA-3557] Write flink streaming data 
to partition table
URL: https://github.com/apache/carbondata/pull/3532#issuecomment-569393927
 
 
   please test this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] niuge01 commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

2019-12-27 Thread GitBox
niuge01 commented on a change in pull request #3532: [CARBONDATA-3557] Write 
flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361780938
 
 

 ##
 File path: 
integration/flink/src/main/java/org/apache/carbon/flink/CarbonS3Writer.java
 ##
 @@ -139,15 +152,16 @@ public void commit() throws IOException {
 );
   }
   dataPath = dataPath + this.table.getDatabaseName() + 
CarbonCommonConstants.FILE_SEPARATOR +
-  this.table.getTableName() + CarbonCommonConstants.FILE_SEPARATOR +
-  this.writePartition + CarbonCommonConstants.FILE_SEPARATOR;
-  Map fileList =
-  this.uploadSegmentDataFiles(this.writePath + 
"Fact/Part0/Segment_null/", dataPath);
+  this.table.getTableName() + CarbonCommonConstants.FILE_SEPARATOR;
+  StageInput stageInput = this.uploadSegmentDataFiles(this.writePath, 
dataPath);
+  if (stageInput == null) {
+return;
+  }
   try {
 String stageInputPath = CarbonTablePath.getStageDir(
 table.getAbsoluteTableIdentifier().getTablePath()) +
-CarbonCommonConstants.FILE_SEPARATOR + this.writePartition;
-StageManager.writeStageInput(stageInputPath, new StageInput(dataPath, 
fileList));
+CarbonCommonConstants.FILE_SEPARATOR + UUID.randomUUID();// TODO 
UUID
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] niuge01 commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

2019-12-27 Thread GitBox
niuge01 commented on a change in pull request #3532: [CARBONDATA-3557] Write 
flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361780902
 
 

 ##
 File path: 
integration/flink/src/test/scala/org/apache/carbon/flink/TestSource.scala
 ##
 @@ -1,25 +1,27 @@
 package org.apache.carbon.flink
 
+import java.util.Random
+
 import org.apache.flink.api.common.state.{ListState, ListStateDescriptor}
 import org.apache.flink.runtime.state.{FunctionInitializationContext, 
FunctionSnapshotContext}
 import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction
 import org.apache.flink.streaming.api.functions.source.SourceFunction
 
-abstract class TestSource(val dataCount: Int) extends SourceFunction[String] 
with CheckpointedFunction {
+abstract class TestSource(val dataCount: Int) extends 
SourceFunction[Array[AnyRef]] with CheckpointedFunction {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] niuge01 commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

2019-12-27 Thread GitBox
niuge01 commented on a change in pull request #3532: [CARBONDATA-3557] Write 
flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361780868
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/StageInput.java
 ##
 @@ -39,6 +39,8 @@
*/
   private Map files;
 
+  private List locations;
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3539: [HOTFIX] Optimize array length in loop in scala code

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3539: [HOTFIX] Optimize array length in loop 
in scala code
URL: https://github.com/apache/carbondata/pull/3539#issuecomment-569393472
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1331/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3539: [HOTFIX] Optimize array length in loop in scala code

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3539: [HOTFIX] Optimize array length in loop 
in scala code
URL: https://github.com/apache/carbondata/pull/3539#issuecomment-569393256
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1318/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3539: [HOTFIX] Optimize array length in loop in scala code

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3539: [HOTFIX] Optimize array length in loop 
in scala code
URL: https://github.com/apache/carbondata/pull/3539#issuecomment-569393017
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1308/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk opened a new pull request #3539: [HOTFIX] Optimize array length in loop in scala code

2019-12-27 Thread GitBox
jackylk opened a new pull request #3539: [HOTFIX] Optimize array length in loop 
in scala code
URL: https://github.com/apache/carbondata/pull/3539
 
 
   Inspired by CARBONDATA-3626, this PR optimized other places where getting 
array length in for loop and while loop in scala code.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569387388
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1330/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569387151
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1317/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569383717
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1307/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (CARBONDATA-3631) StringIndexOutOfBoundsException When Inserting Select From a Parquet Table with Empty array/map

2019-12-27 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-3631:
---

 Summary: StringIndexOutOfBoundsException When Inserting Select 
From a Parquet Table with Empty array/map
 Key: CARBONDATA-3631
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3631
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 1.6.1, 2.0.0
Reporter: Xingjun Hao
 Fix For: 2.0.0


sql("insert into datatype_array_parquet values(array())")
sql("insert into datatype_array_carbondata select f from 
datatype_array_parquet")

 
{code:java}
java.lang.StringIndexOutOfBoundsException: String index out of range: -1

at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:935)
at java.lang.StringBuilder.substring(StringBuilder.java:76)
at scala.collection.mutable.StringBuilder.substring(StringBuilder.scala:166)
at 
org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:77)
at 
org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:71)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3535: [WIP] Refactory data loading for partition table

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3535: [WIP] Refactory data loading for 
partition table
URL: https://github.com/apache/carbondata/pull/3535#issuecomment-569379596
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1329/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (CARBONDATA-3519) Optimizations in write step to avoid unnecessary memory blk allocation/free

2019-12-27 Thread Jacky Li (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-3519.
--
Fix Version/s: 2.0.0
   Resolution: Fixed

> Optimizations in write step to avoid unnecessary memory blk allocation/free
> ---
>
> Key: CARBONDATA-3519
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3519
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Reporter: Venugopal Reddy K
>Priority: Minor
> Fix For: 2.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
>  +*Issue-1:*+
> {color:#0747a6}*Context:*{color}
> For a string column with local dictionary enabled, a column page of
> `{color:#de350b}UnsafeFixLengthColumnPage{color}` with datatype 
> `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for 
> `{color:#de350b}{{encodedPage}}{color}` along with regular 
> `{color:#de350b}{{actualPage}}{color}` of 
> `{color:#de350b}{{UnsafeVarLengthColumnPage}}{color}`. 
> We have `{color:#de350b}*{{capacity}}*{color}` field in the 
> `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}`. And this field 
> indicates the capacity of  allocated
> `{color:#de350b}{{memoryBlock}}{color}` for the page. 
> `{{{color:#de350b}ensureMemory{color}()}}` method gets called while adding 
> rows to check if  `{color:#de350b}{{totalLength + requestSize > 
> capacity}}{color}` to allocate a new memoryBlock. If there is no room to add 
> the next row, allocates a new block, copy the old context(prev rows) and free 
> the old memoryBlock.
> {color:#0747a6} *Problem:*{color}
> While, `{color:#de350b}{{UnsafeFixLengthColumnPage}}{color}` with with 
> datatype `{color:#de350b}{{DataTypes.BYTE_ARRAY}}{color}` is created for 
> `{color:#de350b}{{encodedPage}}{color}`, we have not assigned the 
> *`{color:#de350b}{{capacity}}{color}`* field with allocated memory block 
> size. Hence, for each add row to tablePage, *ensureMemory() check always 
> fails*, allocates a new column page memoryBlock, copy the old context(prev 
> rows) and free the old memoryBlock. This *allocation of new memoryBlock and 
> free of old memoryBlock happens for each row row addition* for the string 
> columns with local dictionary.
>  
> +*Issue-2:*+
> {color:#0747a6}*Context:*{color}
> In`{color:#de350b}VarLengthColumnPageBase{color}`, we have a 
> `{color:#de350b}rowOffset{color}` column page of  
> `{color:#de350b}UnsafeFixLengthColumnPage{color}` of datatype 
> `{color:#de350b}INT{color}`
> to maintain the data offset to {color:#172b4d}each{color} row of variable 
> length columns. This `{color:#de350b}rowOffset{color}` page allocates to be 
> size of page. 
> {color:#0747a6} *Problem:*{color}
> {color:#172b4d}If we have 10 rows in the page, we need 11 rows for its 
> rowOffset page. Because we always keep 0 as offset to 1st row. So an 
> additional row is required for rowOffset page[pasted code below to show the 
> reference]. Otherwise, *ensureMemory() check always fails for the last 
> row*(10th row in this case) of data and *allocates a new rowOffset page 
> memoryBlock, copy the old context(prev rows) and free the old memoryBlock*. 
> This *can happen for the string columns with local dictionary, direct 
> dictionary columns, global disctionary columns*.{color}
>  
> {code:java}
> public abstract class VarLengthColumnPageBase extends ColumnPage {
> ...
> @Override
> public void putBytes(int rowId, byte[] bytes) {
>  ...
>  if (rowId == 0) {
>  rowOffset.putInt(0, 0); ==> offset to 1st row is 0.
>  }
>  rowOffset.putInt(rowId + 1, rowOffset.getInt(rowId) + bytes.length);
>  putBytesAtRow(rowId, bytes);
>  totalLength += bytes.length;
> }
> ...
> }
>  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #3524: [CARBONDATA-3519]Made optimizations in write step to avoid unnecessary memory blk allocation/free

2019-12-27 Thread GitBox
asfgit closed pull request #3524: [CARBONDATA-3519]Made optimizations in write 
step to avoid unnecessary memory blk allocation/free
URL: https://github.com/apache/carbondata/pull/3524
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3507: [CARBONDATA-3617] loadDataUsingGlobalSort should based on SortColumns…

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3507: [CARBONDATA-3617] 
loadDataUsingGlobalSort should based on SortColumns…
URL: https://github.com/apache/carbondata/pull/3507#issuecomment-569379152
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1328/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on issue #3524: [CARBONDATA-3519]Made optimizations in write step to avoid unnecessary memory blk allocation/free

2019-12-27 Thread GitBox
jackylk commented on issue #3524: [CARBONDATA-3519]Made optimizations in write 
step to avoid unnecessary memory blk allocation/free
URL: https://github.com/apache/carbondata/pull/3524#issuecomment-569379036
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed 
global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#discussion_r361772487
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
 ##
 @@ -374,17 +365,32 @@ class CarbonTableCompactor(carbonLoadModel: 
CarbonLoadModel,
   sparkSession: SparkSession,
   carbonLoadModel: CarbonLoadModel,
   carbonMergerMapping: CarbonMergerMapping): Array[(String, Boolean)] = {
+val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
 val splits = splitsOfSegments(
   sparkSession,
-  carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable,
+  carbonTable,
   carbonMergerMapping.validSegments)
-val dataFrame = DataLoadProcessBuilderOnSpark.createInputDataFrame(
-  sparkSession,
-  carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable,
-  splits.asScala)
+val dataFrame = try {
+  // segments to be compacted are set in the threadset() in carbon 
session, and unset in the end
 
 Review comment:
   please explain in comment why it is required, not just mentioning what the 
operation is


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3507: [CARBONDATA-3617] loadDataUsingGlobalSort should based on SortColumns…

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3507: [CARBONDATA-3617] 
loadDataUsingGlobalSort should based on SortColumns…
URL: https://github.com/apache/carbondata/pull/3507#issuecomment-569378887
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1315/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3528: [CARBONDATA-3630] update should support limit 1 sub query and empty result subquery

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3528: [CARBONDATA-3630] update 
should support limit 1 sub query and empty result subquery
URL: https://github.com/apache/carbondata/pull/3528#discussion_r361772364
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala
 ##
 @@ -262,12 +316,16 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser {
 }
   case _ => tab._1
 }
-
+val newSel = if (!StringUtils.isEmpty(constants)) {
 
 Review comment:
   explain the logic


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3528: [CARBONDATA-3630] update should support limit 1 sub query and empty result subquery

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3528: [CARBONDATA-3630] update 
should support limit 1 sub query and empty result subquery
URL: https://github.com/apache/carbondata/pull/3528#discussion_r361772320
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala
 ##
 @@ -247,8 +248,61 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser {
   case tab ~ columns ~ rest =>
 val (sel, where) = splitQuery(rest)
 val selectPattern = """^\s*select\s+""".r
+// comma separated string
 
 Review comment:
   explain in more detail what the logic it is


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3535: [WIP] Refactory data loading for partition table

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3535: [WIP] Refactory data loading for 
partition table
URL: https://github.com/apache/carbondata/pull/3535#issuecomment-569378599
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1316/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3521: [doc_zh_cn] add a 
commercial inventory DB and carbon data query performance comparison doc 
chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#discussion_r361772084
 
 

 ##
 File path: docs/zh_cn/某商业列存DB和CarbonData查询性能对比.md
 ##
 @@ -0,0 +1,111 @@
+
+
+## CarbonData 替换某商业列存DB查询性能对比
+
+本文主要在于给用户呈现CarbonData在替换某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。
+
+
+
+
+
+## 1.集群状态对比
+
+| 集群 | 描述  |
+|  | - 
|
+| 某商业列存DB集群 | 3节点,SSD硬盘|
 
 Review comment:
   please mention CPU and memory resource


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3521: [doc_zh_cn] add a 
commercial inventory DB and carbon data query performance comparison doc 
chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#discussion_r361772063
 
 

 ##
 File path: docs/zh_cn/某商业列存DB和CarbonData查询性能对比.md
 ##
 @@ -0,0 +1,111 @@
+
+
+## CarbonData 替换某商业列存DB查询性能对比
+
+本文主要在于给用户呈现CarbonData在替换某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。
+
+
+
+
+
+## 1.集群状态对比
 
 Review comment:
   ```suggestion
   ## 1. 测试集群
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3521: [doc_zh_cn] add a 
commercial inventory DB and carbon data query performance comparison doc 
chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#discussion_r361772063
 
 

 ##
 File path: docs/zh_cn/某商业列存DB和CarbonData查询性能对比.md
 ##
 @@ -0,0 +1,111 @@
+
+
+## CarbonData 替换某商业列存DB查询性能对比
+
+本文主要在于给用户呈现CarbonData在替换某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。
+
+
+
+
+
+## 1.集群状态对比
 
 Review comment:
   ```suggestion
   ## 1. 测试环境
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3521: [doc_zh_cn] add a 
commercial inventory DB and carbon data query performance comparison doc 
chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#discussion_r361772009
 
 

 ##
 File path: docs/zh_cn/某商业列存DB和CarbonData查询性能对比.md
 ##
 @@ -0,0 +1,111 @@
+
+
+## CarbonData 替换某商业列存DB查询性能对比
 
 Review comment:
   ```suggestion
   ## CarbonData与商业列存DB查询性能对比
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3521: [doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3521: [doc_zh_cn] add a 
commercial inventory DB and carbon data query performance comparison doc 
chinese doc to carbondata
URL: https://github.com/apache/carbondata/pull/3521#discussion_r361771821
 
 

 ##
 File path: docs/zh_cn/某商业列存DB和CarbonData查询性能对比.md
 ##
 @@ -0,0 +1,111 @@
+
+
+## CarbonData 替换某商业列存DB查询性能对比
+
+本文主要在于给用户呈现CarbonData在替换某商业列存DB过程中对于该DB的查询性能提升,CarbonData自身的优势和特点,本文的数据仅为基于某领域查询特点框架下SQL的查询结果,只代表该特定查询特点下的性能对比。
+
+
+
+
+
+## 1.集群状态对比
+
+| 集群 | 描述  |
+|  | - 
|
+| 某商业列存DB集群 | 3节点,SSD硬盘|
+| Hadoop集群   | 2个namenode,6个datanode,STAT硬盘,查询队列分配1/6的资源 |
 
 Review comment:
   Are these two cluster using the same resource? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3509: [CARBONDATA-3618] Update query should throw exception if key has more than one value

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3509: [CARBONDATA-3618] Update 
query should throw exception if key has more than one value
URL: https://github.com/apache/carbondata/pull/3509#discussion_r361771360
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
 ##
 @@ -135,7 +136,24 @@ private[sql] case class CarbonProjectForUpdateCommand(
   else {
 Dataset.ofRows(sparkSession, plan)
   }
-
+  if (CarbonProperties.getValidateUpdateKeyValueMapping) {
+// If more than one value present for the update key, should fail 
the update
+val ds = 
dataSet.select(CarbonCommonConstants.CARBON_IMPLICIT_COLUMN_TUPLEID)
+  .groupBy(CarbonCommonConstants.CARBON_IMPLICIT_COLUMN_TUPLEID)
+  .count()
+  .select("count")
+  .filter(col("count") > lit(1))
+  .limit(1)
+  .collect()
 
 Review comment:
   @zzcclp @ajantha-bhat  One suggestion is to try to move this check to the 
join subquery in the update operation itself, then we do not need to do this 
check separately.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3509: [CARBONDATA-3618] Update query should throw exception if key has more than one value

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3509: [CARBONDATA-3618] Update 
query should throw exception if key has more than one value
URL: https://github.com/apache/carbondata/pull/3509#discussion_r361771203
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
 ##
 @@ -135,7 +136,24 @@ private[sql] case class CarbonProjectForUpdateCommand(
   else {
 Dataset.ofRows(sparkSession, plan)
   }
-
+  if (CarbonProperties.getValidateUpdateKeyValueMapping) {
 
 Review comment:
   ```suggestion
 if (CarbonProperties.isUniqueValueCheckEnabled) {
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3509: [CARBONDATA-3618] Update query should throw exception if key has more than one value

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3509: [CARBONDATA-3618] Update 
query should throw exception if key has more than one value
URL: https://github.com/apache/carbondata/pull/3509#discussion_r361771222
 
 

 ##
 File path: docs/configuration-parameters.md
 ##
 @@ -152,6 +152,7 @@ This section provides the details of all the 
configurations required for the Car
 | carbon.insert.storage.level | MEMORY_AND_DISK | Storage level to persist 
dataset of a RDD/dataframe. Applicable when ***carbon.insert.persist.enable*** 
is **true**, if user's executor has less memory, set this parameter to 
'MEMORY_AND_DISK_SER' or other storage level to correspond to different 
environment. [See 
detail](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence).
 |
 | carbon.update.persist.enable | true | Configuration to enable the dataset of 
RDD/dataframe to persist data. Enabling this will reduce the execution time of 
UPDATE operation. |
 | carbon.update.storage.level | MEMORY_AND_DISK | Storage level to persist 
dataset of a RDD/dataframe. Applicable when ***carbon.update.persist.enable*** 
is **true**, if user's executor has less memory, set this parameter to 
'MEMORY_AND_DISK_SER' or other storage level to correspond to different 
environment. [See 
detail](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence).
 |
+| carbon.update.validate.key.to.value.mapping | true | By default this 
property is true, so update will validate key value mapping. This validation 
might have slight degrade in performance of update query. If user knows that 
key value mapping is correct, can disable this validation for better update 
performance by setting this property to false. |
 
 Review comment:
   ```suggestion
   | carbon.update.check.unique.value | true | By default this property is 
true, so update will validate key value mapping. This validation might have 
slight degrade in performance of update query. If user knows that key value 
mapping is correct, can disable this validation for better update performance 
by setting this property to false. |
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3509: [CARBONDATA-3618] Update query should throw exception if key has more than one value

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3509: [CARBONDATA-3618] Update 
query should throw exception if key has more than one value
URL: https://github.com/apache/carbondata/pull/3509#discussion_r361771222
 
 

 ##
 File path: docs/configuration-parameters.md
 ##
 @@ -152,6 +152,7 @@ This section provides the details of all the 
configurations required for the Car
 | carbon.insert.storage.level | MEMORY_AND_DISK | Storage level to persist 
dataset of a RDD/dataframe. Applicable when ***carbon.insert.persist.enable*** 
is **true**, if user's executor has less memory, set this parameter to 
'MEMORY_AND_DISK_SER' or other storage level to correspond to different 
environment. [See 
detail](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence).
 |
 | carbon.update.persist.enable | true | Configuration to enable the dataset of 
RDD/dataframe to persist data. Enabling this will reduce the execution time of 
UPDATE operation. |
 | carbon.update.storage.level | MEMORY_AND_DISK | Storage level to persist 
dataset of a RDD/dataframe. Applicable when ***carbon.update.persist.enable*** 
is **true**, if user's executor has less memory, set this parameter to 
'MEMORY_AND_DISK_SER' or other storage level to correspond to different 
environment. [See 
detail](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence).
 |
+| carbon.update.validate.key.to.value.mapping | true | By default this 
property is true, so update will validate key value mapping. This validation 
might have slight degrade in performance of update query. If user knows that 
key value mapping is correct, can disable this validation for better update 
performance by setting this property to false. |
 
 Review comment:
   ```suggestion
   | carbon.update.strict.check | true | By default this property is true, so 
update will validate key value mapping. This validation might have slight 
degrade in performance of update query. If user knows that key value mapping is 
correct, can disable this validation for better update performance by setting 
this property to false. |
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3509: [CARBONDATA-3618] Update query should throw exception if key has more than one value

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3509: [CARBONDATA-3618] Update 
query should throw exception if key has more than one value
URL: https://github.com/apache/carbondata/pull/3509#discussion_r361771203
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
 ##
 @@ -135,7 +136,24 @@ private[sql] case class CarbonProjectForUpdateCommand(
   else {
 Dataset.ofRows(sparkSession, plan)
   }
-
+  if (CarbonProperties.getValidateUpdateKeyValueMapping) {
 
 Review comment:
   ```suggestion
 if (CarbonProperties.isUpdateStrictCheckEnabled) {
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write 
flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361770999
 
 

 ##
 File path: 
integration/flink/src/test/scala/org/apache/carbon/flink/TestSource.scala
 ##
 @@ -1,25 +1,27 @@
 package org.apache.carbon.flink
 
+import java.util.Random
+
 import org.apache.flink.api.common.state.{ListState, ListStateDescriptor}
 import org.apache.flink.runtime.state.{FunctionInitializationContext, 
FunctionSnapshotContext}
 import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction
 import org.apache.flink.streaming.api.functions.source.SourceFunction
 
-abstract class TestSource(val dataCount: Int) extends SourceFunction[String] 
with CheckpointedFunction {
+abstract class TestSource(val dataCount: Int) extends 
SourceFunction[Array[AnyRef]] with CheckpointedFunction {
 
 Review comment:
   Please add more test case to verify the write output is correct


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write 
flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361770968
 
 

 ##
 File path: 
integration/flink/src/main/java/org/apache/carbon/flink/CarbonS3Writer.java
 ##
 @@ -139,15 +152,16 @@ public void commit() throws IOException {
 );
   }
   dataPath = dataPath + this.table.getDatabaseName() + 
CarbonCommonConstants.FILE_SEPARATOR +
-  this.table.getTableName() + CarbonCommonConstants.FILE_SEPARATOR +
-  this.writePartition + CarbonCommonConstants.FILE_SEPARATOR;
-  Map fileList =
-  this.uploadSegmentDataFiles(this.writePath + 
"Fact/Part0/Segment_null/", dataPath);
+  this.table.getTableName() + CarbonCommonConstants.FILE_SEPARATOR;
+  StageInput stageInput = this.uploadSegmentDataFiles(this.writePath, 
dataPath);
+  if (stageInput == null) {
+return;
+  }
   try {
 String stageInputPath = CarbonTablePath.getStageDir(
 table.getAbsoluteTableIdentifier().getTablePath()) +
-CarbonCommonConstants.FILE_SEPARATOR + this.writePartition;
-StageManager.writeStageInput(stageInputPath, new StageInput(dataPath, 
fileList));
+CarbonCommonConstants.FILE_SEPARATOR + UUID.randomUUID();// TODO 
UUID
 
 Review comment:
   remove TODO


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (CARBONDATA-3626) Improve performance when load data into carbondata

2019-12-27 Thread Jacky Li (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-3626.
--
Fix Version/s: 2.0.0
   Resolution: Fixed

> Improve performance when load data into carbondata
> --
>
> Key: CARBONDATA-3626
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3626
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Reporter: Hong Shen
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: image-2019-12-21-21-20-19-603.png, screenshot-1.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I prepare to use carbondata improve sparksql in our company, but I often 
> found it's take a long time when load data when the carbon table has many 
> fields. 
> {code}
> carbon.sql("insert into TABLE table2  select * from table1")
> {code}
> For example, when I use a production table2 with more than 100 columns, When 
> the above sql is running, one task take 10min to load 200MB data(with snappy 
> compress), the log is 
> {code}
> 2019-12-21 17:31:29 INFO  UnsafeSortDataRows:416 - Time taken to sort row 
> page with size: 37975 is: 110
> 2019-12-21 17:31:35 INFO  UnsafeSortDataRows:416 - Time taken to sort row 
> page with size: 37978 is: 64
> 2019-12-21 17:31:42 INFO  UnsafeSortDataRows:416 - Time taken to sort row 
> page with size: 37977 is: 64
> 2019-12-21 17:31:48 INFO  UnsafeSortDataRows:416 - Time taken to sort row 
> page with size: 37972 is: 66
> 2019-12-21 17:31:54 INFO  UnsafeSortDataRows:416 - Time taken to sort row 
> page with size: 37979 is: 68
> 2019-12-21 17:32:00 INFO  UnsafeSortDataRows:416 - Time taken to sort row 
> page with size: 37978 is: 62
> 2019-12-21 17:32:07 INFO  UnsafeSortDataRows:416 - Time taken to sort row 
> page with size: 37981 is: 65
> 2019-12-21 17:32:13 INFO  UnsafeSortDataRows:395 - Time taken to sort row 
> page with size37972 and write is: 226: 
> location:/home/hadoop/nm-local-dir/usercache/042986/appcache/application_1571110627213_192937/carbon19a2dc8d381442129dd0c7d906e7f51f_10210001311/Fact/Part0/Segment_2/10210001311/sortrowtmp/table2_0_21949613867659265.sorttemp,
>  sort temp file size in MB is 5.350312232971191
> 2019-12-21 17:32:19 INFO  UnsafeSortDataRows:395 - Time taken to sort row 
> page with size37982 and write is: 172: 
> location:/home/hadoop/nm-local-dir/usercache/042986/appcache/application_1571110627213_192937/carbon19a2dc8d381442129dd0c7d906e7f51f_10210001311/Fact/Part0/Segment_2/10210001311/sortrowtmp/table2_0_21949620209578293.sorttemp,
>  sort temp file size in MB is 5.293270111083984
> 2019-12-21 17:32:26 INFO  UnsafeSortDataRows:395 - Time taken to sort row 
> page with size37974 and write is: 175: 
> location:/home/hadoop/nm-local-dir/usercache/042986/appcache/application_1571110627213_192937/carbon19a2dc8d381442129dd0c7d906e7f51f_10210001311/Fact/Part0/Segment_2/10210001311/sortrowtmp/table2_0_21949626542521877.sorttemp,
>  sort temp file size in MB is 5.349262237548828
> ... ...
> {code}
> The task's jstack is often like below:
> {code}
> "Executor task launch worker for task 164" #77 daemon prio=5 os_prio=0 
> tid=0x2ab5768c3800 nid=0xb895 runnable [0x2ab578afd000]
>java.lang.Thread.State: RUNNABLE
> at 
> scala.collection.LinearSeqOptimized$class.length(LinearSeqOptimized.scala:54)
> at scala.collection.immutable.List.length(List.scala:84)
> at 
> org.apache.spark.sql.execution.datasources.CarbonOutputWriter.writeCarbon(SparkCarbonTableFormat.scala:360)
> at 
> org.apache.spark.sql.execution.datasources.AbstractCarbonOutputWriter$class.write(SparkCarbonTableFormat.scala:234)
> at 
> org.apache.spark.sql.execution.datasources.CarbonOutputWriter.write(SparkCarbonTableFormat.scala:239)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$DynamicPartitionWriteTask$$anonfun$execute$7.apply(FileFormatWriter.scala:717)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$DynamicPartitionWriteTask$$anonfun$execute$7.apply(FileFormatWriter.scala:661)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$DynamicPartitionWriteTask.execute(FileFormatWriter.scala:661)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:334)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:332)
> 

[GitHub] [carbondata] jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write flink streaming data to partition table

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3532: [CARBONDATA-3557] Write 
flink streaming data to partition table
URL: https://github.com/apache/carbondata/pull/3532#discussion_r361770939
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/StageInput.java
 ##
 @@ -39,6 +39,8 @@
*/
   private Map files;
 
+  private List locations;
 
 Review comment:
   please add comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] asfgit closed pull request #3525: [CARBONDATA-3626] Improve performance when load data into carbon table with lots of columns

2019-12-27 Thread GitBox
asfgit closed pull request #3525: [CARBONDATA-3626] Improve performance when 
load data into carbon table with lots of columns
URL: https://github.com/apache/carbondata/pull/3525
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (CARBONDATA-3626) Improve performance when load data into carbondata

2019-12-27 Thread Jacky Li (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004372#comment-17004372
 ] 

Jacky Li commented on CARBONDATA-3626:
--

Thanks for reporting this issue

> Improve performance when load data into carbondata
> --
>
> Key: CARBONDATA-3626
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3626
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Reporter: Hong Shen
>Priority: Major
> Attachments: image-2019-12-21-21-20-19-603.png, screenshot-1.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I prepare to use carbondata improve sparksql in our company, but I often 
> found it's take a long time when load data when the carbon table has many 
> fields. 
> {code}
> carbon.sql("insert into TABLE table2  select * from table1")
> {code}
> For example, when I use a production table2 with more than 100 columns, When 
> the above sql is running, one task take 10min to load 200MB data(with snappy 
> compress), the log is 
> {code}
> 2019-12-21 17:31:29 INFO  UnsafeSortDataRows:416 - Time taken to sort row 
> page with size: 37975 is: 110
> 2019-12-21 17:31:35 INFO  UnsafeSortDataRows:416 - Time taken to sort row 
> page with size: 37978 is: 64
> 2019-12-21 17:31:42 INFO  UnsafeSortDataRows:416 - Time taken to sort row 
> page with size: 37977 is: 64
> 2019-12-21 17:31:48 INFO  UnsafeSortDataRows:416 - Time taken to sort row 
> page with size: 37972 is: 66
> 2019-12-21 17:31:54 INFO  UnsafeSortDataRows:416 - Time taken to sort row 
> page with size: 37979 is: 68
> 2019-12-21 17:32:00 INFO  UnsafeSortDataRows:416 - Time taken to sort row 
> page with size: 37978 is: 62
> 2019-12-21 17:32:07 INFO  UnsafeSortDataRows:416 - Time taken to sort row 
> page with size: 37981 is: 65
> 2019-12-21 17:32:13 INFO  UnsafeSortDataRows:395 - Time taken to sort row 
> page with size37972 and write is: 226: 
> location:/home/hadoop/nm-local-dir/usercache/042986/appcache/application_1571110627213_192937/carbon19a2dc8d381442129dd0c7d906e7f51f_10210001311/Fact/Part0/Segment_2/10210001311/sortrowtmp/table2_0_21949613867659265.sorttemp,
>  sort temp file size in MB is 5.350312232971191
> 2019-12-21 17:32:19 INFO  UnsafeSortDataRows:395 - Time taken to sort row 
> page with size37982 and write is: 172: 
> location:/home/hadoop/nm-local-dir/usercache/042986/appcache/application_1571110627213_192937/carbon19a2dc8d381442129dd0c7d906e7f51f_10210001311/Fact/Part0/Segment_2/10210001311/sortrowtmp/table2_0_21949620209578293.sorttemp,
>  sort temp file size in MB is 5.293270111083984
> 2019-12-21 17:32:26 INFO  UnsafeSortDataRows:395 - Time taken to sort row 
> page with size37974 and write is: 175: 
> location:/home/hadoop/nm-local-dir/usercache/042986/appcache/application_1571110627213_192937/carbon19a2dc8d381442129dd0c7d906e7f51f_10210001311/Fact/Part0/Segment_2/10210001311/sortrowtmp/table2_0_21949626542521877.sorttemp,
>  sort temp file size in MB is 5.349262237548828
> ... ...
> {code}
> The task's jstack is often like below:
> {code}
> "Executor task launch worker for task 164" #77 daemon prio=5 os_prio=0 
> tid=0x2ab5768c3800 nid=0xb895 runnable [0x2ab578afd000]
>java.lang.Thread.State: RUNNABLE
> at 
> scala.collection.LinearSeqOptimized$class.length(LinearSeqOptimized.scala:54)
> at scala.collection.immutable.List.length(List.scala:84)
> at 
> org.apache.spark.sql.execution.datasources.CarbonOutputWriter.writeCarbon(SparkCarbonTableFormat.scala:360)
> at 
> org.apache.spark.sql.execution.datasources.AbstractCarbonOutputWriter$class.write(SparkCarbonTableFormat.scala:234)
> at 
> org.apache.spark.sql.execution.datasources.CarbonOutputWriter.write(SparkCarbonTableFormat.scala:239)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$DynamicPartitionWriteTask$$anonfun$execute$7.apply(FileFormatWriter.scala:717)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$DynamicPartitionWriteTask$$anonfun$execute$7.apply(FileFormatWriter.scala:661)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$DynamicPartitionWriteTask.execute(FileFormatWriter.scala:661)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:334)
> at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:332)
> at 

[GitHub] [carbondata] jackylk commented on issue #3525: [CARBONDATA-3626] Improve performance when load data into carbon table with lots of columns

2019-12-27 Thread GitBox
jackylk commented on issue #3525: [CARBONDATA-3626] Improve performance when 
load data into carbon table with lots of columns
URL: https://github.com/apache/carbondata/pull/3525#issuecomment-569376290
 
 
   LGTM.
   Thanks for contributing!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3530: [CARBONDATA-3629] Fix Select query failure on aggregation of same column on MV

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3530: [CARBONDATA-3629] Fix 
Select query failure on aggregation of same column on MV
URL: https://github.com/apache/carbondata/pull/3530#discussion_r361770632
 
 

 ##
 File path: docs/datamap/mv-datamap-guide.md
 ##
 @@ -91,7 +91,7 @@ EXPLAIN SELECT a, sum(b) from maintable group by a;
12. NO_INVERTED_INDEX
13. COLUMN_COMPRESSOR
 
- * All columns of main table at once cannot participate in mv datamap table 
creation
+ * Creating MV datamap with select query containing only project of all 
columns of maintable is unsupported 
 
 Review comment:
   Can you add an example to the doc for this unsupported case


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3530: [CARBONDATA-3629] Fix Select query failure on aggregation of same column on MV

2019-12-27 Thread GitBox
jackylk commented on a change in pull request #3530: [CARBONDATA-3629] Fix 
Select query failure on aggregation of same column on MV
URL: https://github.com/apache/carbondata/pull/3530#discussion_r361770549
 
 

 ##
 File path: 
datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala
 ##
 @@ -49,23 +49,28 @@ class MVUtil {
   case select: Select =>
 select.children.map {
   case groupBy: GroupBy =>
-getFieldsFromProject(groupBy.outputList, groupBy.predicateList, 
logicalRelation)
+getFieldsFromProject(groupBy.outputList, groupBy.predicateList,
+  logicalRelation, groupBy.flagSpec)
   case _: ModularRelation =>
-getFieldsFromProject(select.outputList, select.predicateList, 
logicalRelation)
+getFieldsFromProject(select.outputList, select.predicateList,
+  logicalRelation, select.flagSpec)
 }.head
   case groupBy: GroupBy =>
 groupBy.child match {
   case select: Select =>
-getFieldsFromProject(groupBy.outputList, select.predicateList, 
logicalRelation)
+getFieldsFromProject(groupBy.outputList, select.predicateList,
+  logicalRelation, select.flagSpec)
   case _: ModularRelation =>
-getFieldsFromProject(groupBy.outputList, groupBy.predicateList, 
logicalRelation)
+getFieldsFromProject(groupBy.outputList, groupBy.predicateList,
+  logicalRelation, groupBy.flagSpec)
 }
 }
   }
 
   def getFieldsFromProject(outputList: Seq[NamedExpression],
   predicateList: Seq[Expression],
-  logicalRelation: Seq[LogicalRelation]): mutable.LinkedHashMap[Field, 
DataMapField] = {
+  logicalRelation: Seq[LogicalRelation],
+  flagSpec: Seq[Seq[Any]]): mutable.LinkedHashMap[Field, DataMapField] = {
 
 Review comment:
   please add comment to describe parameters 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3535: [WIP] Refactory data loading for partition table

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3535: [WIP] Refactory data loading for 
partition table
URL: https://github.com/apache/carbondata/pull/3535#issuecomment-569375557
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1306/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3507: [CARBONDATA-3617] loadDataUsingGlobalSort should based on SortColumns…

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3507: [CARBONDATA-3617] 
loadDataUsingGlobalSort should based on SortColumns…
URL: https://github.com/apache/carbondata/pull/3507#issuecomment-569375347
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1305/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global 
dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#issuecomment-569304818
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1314/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global 
dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#issuecomment-569304756
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1327/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global 
dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#issuecomment-569304362
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1304/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569304318
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1326/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569303524
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1313/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569294657
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1325/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569294139
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1312/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569293083
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1303/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3537: Metacache issues

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3537: Metacache issues
URL: https://github.com/apache/carbondata/pull/3537#issuecomment-569286118
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1308/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569285983
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1310/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569283318
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1324/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569283314
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1311/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-569282513
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1301/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-569282473
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1302/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-27 Thread GitBox
akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed 
global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#discussion_r361673785
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
 ##
 @@ -399,30 +393,24 @@ class CarbonTableCompactor(carbonLoadModel: 
CarbonLoadModel,
   def dataFrameOfSegments(
   sparkSession: SparkSession,
   carbonTable: CarbonTable,
-  segments: Array[Segment]
-  ): DataFrame = {
-val columns = carbonTable
-  .getCreateOrderColumn()
-  .asScala
-  .map(_.getColName)
-  .toArray
-val schema = SparkTypeConverter.createSparkSchema(carbonTable, columns)
-val rdd: RDD[Row] = new CarbonScanRDD[CarbonRow](
-  sparkSession,
-  columnProjection = new CarbonProjection(columns),
-  null,
-  carbonTable.getAbsoluteTableIdentifier,
-  carbonTable.getTableInfo.serialize,
-  carbonTable.getTableInfo,
-  new CarbonInputMetrics,
-  null,
-  null,
-  classOf[CarbonRowReadSupport],
-  splitsOfSegments(sparkSession, carbonTable, segments))
-  .map { row =>
-new GenericRow(row.getData.asInstanceOf[Array[Any]])
-  }
-sparkSession.createDataFrame(rdd, schema)
+  segments: Array[Segment]): DataFrame = {
+try {
+  CarbonSession
+.threadSet(CarbonCommonConstants.CARBON_INPUT_SEGMENTS +
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2019-12-27 Thread GitBox
akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed 
global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#discussion_r361670362
 
 

 ##
 File path: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala
 ##
 @@ -399,30 +393,24 @@ class CarbonTableCompactor(carbonLoadModel: 
CarbonLoadModel,
   def dataFrameOfSegments(
   sparkSession: SparkSession,
   carbonTable: CarbonTable,
-  segments: Array[Segment]
-  ): DataFrame = {
-val columns = carbonTable
-  .getCreateOrderColumn()
-  .asScala
-  .map(_.getColName)
-  .toArray
-val schema = SparkTypeConverter.createSparkSchema(carbonTable, columns)
-val rdd: RDD[Row] = new CarbonScanRDD[CarbonRow](
-  sparkSession,
-  columnProjection = new CarbonProjection(columns),
-  null,
-  carbonTable.getAbsoluteTableIdentifier,
-  carbonTable.getTableInfo.serialize,
-  carbonTable.getTableInfo,
-  new CarbonInputMetrics,
-  null,
-  null,
-  classOf[CarbonRowReadSupport],
-  splitsOfSegments(sparkSession, carbonTable, segments))
-  .map { row =>
-new GenericRow(row.getData.asInstanceOf[Array[Any]])
-  }
-sparkSession.createDataFrame(rdd, schema)
+  segments: Array[Segment]): DataFrame = {
+try {
+  CarbonSession
+.threadSet(CarbonCommonConstants.CARBON_INPUT_SEGMENTS +
+   carbonTable.getDatabaseName + CarbonCommonConstants.POINT +
+   carbonTable.getTableName,
+  segments.map(s => s.getSegmentNo).mkString(","))
+  val logicalPlan = sparkSession
+.sql(s"select * from ${ carbonTable.getTableName }")
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] ajantha-bhat opened a new pull request #3538: [WIP] Separate Insert and load to later optimize insert.

2019-12-27 Thread GitBox
ajantha-bhat opened a new pull request #3538: [WIP] Separate Insert and load to 
later optimize insert.
URL: https://github.com/apache/carbondata/pull/3538
 
 
   [WIP] Separate Insert and load to later optimize insert.
   
   1. separated load and insert command; load command is used only for load 
DDL, rest of the flow uses insert command
   2. remove update and dataframe argument from load command
   3. currently code is duplicated for separation. later once insert flow is 
optimized, common code will be extracted
   
   
   Be sure to do all of the following checklist to help us incorporate 
   your contribution quickly and easily:
   
- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?
   
- [ ] Testing done
   Please provide details on 
   - Whether new unit test cases have been added or why no new tests 
are required?
   - How it is tested? Please attach test report.
   - Is it a performance related change? Please attach the performance 
test report.
   - Any additional information to help reviewers in testing this 
change.
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3456: Bump solr.version from 6.3.0 to 8.3.0 in /datamap/lucene

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3456: Bump solr.version from 6.3.0 to 8.3.0 
in /datamap/lucene
URL: https://github.com/apache/carbondata/pull/3456#issuecomment-569276164
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1321/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3537: Metacache issues

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3537: Metacache issues
URL: https://github.com/apache/carbondata/pull/3537#issuecomment-569274070
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1320/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global 
dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#issuecomment-569270459
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1322/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global 
dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#issuecomment-569269862
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1309/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3535: [WIP] Refactory data loading for partition table

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3535: [WIP] Refactory data loading for 
partition table
URL: https://github.com/apache/carbondata/pull/3535#issuecomment-569269032
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1318/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global 
dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#issuecomment-569266789
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1300/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3530: [CARBONDATA-3629] Fix Select query failure on aggregation of same column on MV

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3530: [CARBONDATA-3629] Fix Select query 
failure on aggregation of same column on MV
URL: https://github.com/apache/carbondata/pull/3530#issuecomment-569266788
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1316/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3537: Metacache issues

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3537: Metacache issues
URL: https://github.com/apache/carbondata/pull/3537#issuecomment-569262987
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1299/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3530: [CARBONDATA-3629] Fix Select query failure on aggregation of same column on MV

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3530: [CARBONDATA-3629] Fix Select query 
failure on aggregation of same column on MV
URL: https://github.com/apache/carbondata/pull/3530#issuecomment-569260178
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1305/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3536: Metacache issues issues

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3536: Metacache issues issues
URL: https://github.com/apache/carbondata/pull/3536#issuecomment-569259774
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1319/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] vikramahuja1001 opened a new pull request #3537: Metacache issues

2019-12-27 Thread GitBox
vikramahuja1001 opened a new pull request #3537: Metacache issues
URL: https://github.com/apache/carbondata/pull/3537
 
 
   Be sure to do all of the following checklist to help us incorporate 
   your contribution quickly and easily:
   
- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?
   
- [ ] Testing done
   Please provide details on 
   - Whether new unit test cases have been added or why no new tests 
are required?
   - How it is tested? Please attach test report.
   - Is it a performance related change? Please attach the performance 
test report.
   - Any additional information to help reviewers in testing this 
change.
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] vikramahuja1001 closed pull request #3536: Metacache issues issues

2019-12-27 Thread GitBox
vikramahuja1001 closed pull request #3536: Metacache issues issues
URL: https://github.com/apache/carbondata/pull/3536
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3535: [WIP] Refactory data loading for partition table

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3535: [WIP] Refactory data loading for 
partition table
URL: https://github.com/apache/carbondata/pull/3535#issuecomment-569257436
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1307/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3535: [WIP] Refactory data loading for partition table

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3535: [WIP] Refactory data loading for 
partition table
URL: https://github.com/apache/carbondata/pull/3535#issuecomment-569257438
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1298/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai opened a new pull request #3535: [WIP] Refactory data loading for partition table

2019-12-27 Thread GitBox
QiangCai opened a new pull request #3535: [WIP] Refactory data loading for 
partition table
URL: https://github.com/apache/carbondata/pull/3535
 
 
   Be sure to do all of the following checklist to help us incorporate 
   your contribution quickly and easily:
   
- [ ] Any interfaces changed?

- [ ] Any backward compatibility impacted?

- [ ] Document update required?
   
- [ ] Testing done
   Please provide details on 
   - Whether new unit test cases have been added or why no new tests 
are required?
   - How it is tested? Please attach test report.
   - Is it a performance related change? Please attach the performance 
test report.
   - Any additional information to help reviewers in testing this 
change.
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global 
dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#issuecomment-569250811
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1306/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global 
dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#issuecomment-569250645
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1317/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global 
dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#issuecomment-569250385
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1297/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: add hash id create,query condition analyze and generate hash id list

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: 
add hash id create,query condition analyze and generate hash id list
URL: https://github.com/apache/carbondata/pull/3481#issuecomment-569248355
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1315/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3530: [CARBONDATA-3629] Fix Select query failure on aggregation of same column on MV

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3530: [CARBONDATA-3629] Fix Select query 
failure on aggregation of same column on MV
URL: https://github.com/apache/carbondata/pull/3530#issuecomment-569248356
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1296/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: add hash id create,query condition analyze and generate hash id list

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: 
add hash id create,query condition analyze and generate hash id list
URL: https://github.com/apache/carbondata/pull/3481#issuecomment-569244437
 
 
   Build Success with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1304/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: add hash id create,query condition analyze and generate hash id list

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: 
add hash id create,query condition analyze and generate hash id list
URL: https://github.com/apache/carbondata/pull/3481#issuecomment-569234688
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1295/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: add hash id create,query condition analyze and generate hash id list

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: 
add hash id create,query condition analyze and generate hash id list
URL: https://github.com/apache/carbondata/pull/3481#issuecomment-569231305
 
 
   Build Success with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1294/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: add hash id create,query condition analyze and generate hash id list

2019-12-27 Thread GitBox
VenuReddy2103 commented on issue #3481: [CARBONDATA-3548]Geospatial Support: 
add hash id create,query condition analyze and generate hash id list
URL: https://github.com/apache/carbondata/pull/3481#issuecomment-569231139
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global 
dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#issuecomment-569230321
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1313/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] MarvinLitt commented on a change in pull request #3481: [CARBONDATA-3548]Geospatial Support: add hash id create,query condition analyze and generate hash id list

2019-12-27 Thread GitBox
MarvinLitt commented on a change in pull request #3481: 
[CARBONDATA-3548]Geospatial Support: add hash id create,query condition analyze 
and generate hash id list
URL: https://github.com/apache/carbondata/pull/3481#discussion_r361620991
 
 

 ##
 File path: geo/src/main/java/org/apache/carbondata/geo/GeoHashImpl.java
 ##
 @@ -0,0 +1,400 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.geo;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.util.CustomIndex;
+
+import org.apache.commons.lang3.StringUtils;
+
+import org.apache.log4j.Logger;
+
+/**
+ * GeoHash custom implementation.
+ * This class extends {@link CustomIndex}. It provides methods to
+ * 1. Extracts the sub-properties of geohash type index handler such as type, 
source columns,
+ * grid size, origin, min and max longitude and latitude of data. Validates 
and stores them in
+ * instance.
+ * 2. Generates column value from the longitude and latitude column values.
+ * 3. Query processor to handle the custom UDF filter queries based on 
longitude and latitude
+ * columns.
+ */
+public class GeoHashImpl extends CustomIndex> {
+  private static final Logger LOGGER =
+  LogServiceFactory.getLogService(GeoHashImpl.class.getName());
+
+  // conversion factor of angle to radian
+  private static final double CONVERT_FACTOR = 180.0;
+  // Earth radius
+  private static final double EARTH_RADIUS = 6371004.0;
+  // Latitude of coordinate origin
+  private double oriLatitude;
+  // User defined maximum longitude of map
+  private double userDefineMaxLongitude;
+  // User defined maximum latitude of map
+  private double userDefineMaxLatitude;
+  // User defined map minimum longitude
+  private double userDefineMinLongitude;
+  // User defined map minimum latitude
+  private double userDefineMinLatitude;
+  // The maximum longitude of the completed map after calculation
+  private double CalculateMaxLongitude;
+  // The maximum latitude of the completed map after calculation
+  private double CalculateMaxLatitude;
+  // Grid length is in meters
+  private int gridSize;
+  // cos value of latitude of origin of coordinate
+  private double mCos;
+  // The degree of Y axis corresponding to each grid size length
+  private double deltaY;
+  // Each grid size length should be the degree of X axis
+  private double deltaX;
+  // Degree * coefficient of Y axis corresponding to each grid size length
+  private double deltaYByRatio;
+  // Each grid size length should be X-axis Degree * coefficient
+  private double deltaXByRatio;
+  // The number of knives cut for the whole area (one horizontally and one 
vertically)
+  // is the depth of quad tree
+  private int cutLevel;
+  // used to convert the latitude and longitude of double type to int type for 
calculation
+  private int conversionRatio;
+  // * Constant of coefficient
+  private double lon0ByRation;
+  // * Constant of coefficient
+  private double lat0ByRation;
+
+
+  /**
+   * Initialize the geohash index handler instance.
+   * the properties is like that:
+   * TBLPROPERTIES ('INDEX_HANDLER'='mygeohash',
+   * 'INDEX_HANDLER.mygeohash.type'='geohash',
+   * 'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',
+   * 'INDEX_HANDLER.mygeohash.gridSize'=''
+   * 'INDEX_HANDLER.mygeohash.minLongitude'=''
+   * 'INDEX_HANDLER.mygeohash.maxLongitude'=''
+   * 'INDEX_HANDLER.mygeohash.minLatitude'=''
+   * 'INDEX_HANDLER.mygeohash.maxLatitude'=''
+   * 'INDEX_HANDLER.mygeohash.orilatitude''')
+   * @param handlerName the class name of generating algorithm
+   * @param properties input properties,please check the describe
+   * @throws Exception
+   */
+  @Override
+  public void init(String handlerName, Map properties) throws 
Exception {
+String options = properties.get(CarbonCommonConstants.INDEX_HANDLER);
+if (StringUtils.isEmpty(options)) {
+  throw new MalformedCarbonCommandException(
+  

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global 
dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#issuecomment-569229675
 
 
   Build Failed with Spark 2.2.1, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1302/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global dictionary feature

2019-12-27 Thread GitBox
CarbonDataQA1 commented on issue #3502: [CARBONATA-3605] Remove global 
dictionary feature
URL: https://github.com/apache/carbondata/pull/3502#issuecomment-569229182
 
 
   Build Failed  with Spark 2.1.0, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1293/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] asfgit closed pull request #3534: [HOTFIX] Fix UDF, Hex SQL Functions test case for binary

2019-12-27 Thread GitBox
asfgit closed pull request #3534: [HOTFIX] Fix UDF, Hex SQL Functions test case 
for binary
URL: https://github.com/apache/carbondata/pull/3534
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on issue #3534: [HOTFIX] Fix UDF, Hex SQL Functions test case for binary

2019-12-27 Thread GitBox
jackylk commented on issue #3534: [HOTFIX] Fix UDF, Hex SQL Functions test case 
for binary
URL: https://github.com/apache/carbondata/pull/3534#issuecomment-569224916
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services