[GitHub] [carbondata] QiangCai commented on pull request #3813: [CARBONDATA-3876] Update cli test case

2020-06-28 Thread GitBox


QiangCai commented on pull request #3813:
URL: https://github.com/apache/carbondata/pull/3813#issuecomment-650867384


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-3878) Should get the last modified time from 'tablestatus' file instead of segment file to reduce file operation 'getLastModifiedTime'

2020-06-28 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai resolved CARBONDATA-3878.
---
Resolution: Fixed

> Should get the last modified time from 'tablestatus' file instead of segment 
> file to reduce file operation 'getLastModifiedTime' 
> -
>
> Key: CARBONDATA-3878
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3878
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Assignee: David Cai
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CARBONDATA-3878) Should get the last modified time from 'tablestatus' file instead of segment file to reduce file operation 'getLastModifiedTime'

2020-06-28 Thread David Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cai reassigned CARBONDATA-3878:
-

Assignee: David Cai

> Should get the last modified time from 'tablestatus' file instead of segment 
> file to reduce file operation 'getLastModifiedTime' 
> -
>
> Key: CARBONDATA-3878
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3878
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Assignee: David Cai
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #3814: [CARBONDATA-3878] Get last modified time from 'tablestatus' file entry instead of segment file to reduce file operation 'getLastModifiedTime'

2020-06-28 Thread GitBox


asfgit closed pull request #3814:
URL: https://github.com/apache/carbondata/pull/3814


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3800: [CARBONDATA-3877] Reduce read tablestatus overhead during inserting into partition table

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3800:
URL: https://github.com/apache/carbondata/pull/3800#issuecomment-650794400


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1512/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3800: [CARBONDATA-3877] Reduce read tablestatus overhead during inserting into partition table

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3800:
URL: https://github.com/apache/carbondata/pull/3800#issuecomment-650794297


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3240/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3798: [CARBONDATA-3875] Support show segments with stage

2020-06-28 Thread GitBox


asfgit closed pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jackylk commented on pull request #3798: [CARBONDATA-3875] Support show segments with stage

2020-06-28 Thread GitBox


jackylk commented on pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798#issuecomment-650778971


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3814: [CARBONDATA-3878] Get last modified time from 'tablestatus' file entry instead of segment file to reduce file operation 'getLastMod

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3814:
URL: https://github.com/apache/carbondata/pull/3814#issuecomment-650770839


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3239/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3814: [CARBONDATA-3878] Get last modified time from 'tablestatus' file entry instead of segment file to reduce file operation 'getLastMod

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3814:
URL: https://github.com/apache/carbondata/pull/3814#issuecomment-650770324


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1511/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on a change in pull request #3798: [CARBONDATA-3875] Support show segments with stage

2020-06-28 Thread GitBox


marchpure commented on a change in pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798#discussion_r446657621



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
##
@@ -63,13 +68,101 @@ object CarbonStore {
 }
 
 if (limit.isDefined) {
-  val lim = Integer.parseInt(limit.get)
-  segmentsMetadataDetails.slice(0, lim)
+  segmentsMetadataDetails.slice(0, limit.get)
 } else {
   segmentsMetadataDetails
 }
   }
 
+  /**
+   * Read stage files and return input files
+   */
+  def readStages(
+  tablePath: String,
+  configuration: Configuration): Seq[StageInput] = {
+val stageFiles = listStageFiles(
+  CarbonTablePath.getStageDir(tablePath), configuration)
+var output = Collections.synchronizedList(new util.ArrayList[StageInput]())
+output.addAll(readStageInput(stageFiles._1,
+  StageInput.StageStatus.Unload).asJavaCollection)
+output.addAll(readStageInput(stageFiles._2,
+  StageInput.StageStatus.Loading).asJavaCollection)
+Collections.sort(output, new Comparator[StageInput]() {
+  def compare(stageInput1: StageInput, stageInput2: StageInput): Int = {
+(stageInput2.getCreateTime - stageInput1.getCreateTime).intValue()
+  }
+})
+output.asScala
+  }
+
+  /**
+   * Read stage files and return input files
+   */
+  def readStageInput(
+  stageFiles: Seq[CarbonFile],
+  status: StageInput.StageStatus): Seq[StageInput] = {
+val gson = new Gson()
+val output = Collections.synchronizedList(new util.ArrayList[StageInput]())
+stageFiles.map { stage =>
+  val filePath = stage.getAbsolutePath
+  val stream = FileFactory.getDataInputStream(filePath)
+  try {
+val stageInput = gson.fromJson(new InputStreamReader(stream), 
classOf[StageInput])
+stageInput.setCreateTime(stage.getLastModifiedTime)
+stageInput.setStatus(status)
+output.add(stageInput)
+  } finally {
+stream.close()
+  }
+}
+output.asScala
+  }
+
+  /*
+   * Collect all stage files and matched success files.
+   * A stage file without success file will not be collected
+   */
+  def listStageFiles(

Review comment:
   not only for test, it will be used in CarbonStore.scala

##
File path: docs/segment-management-on-carbondata.md
##
@@ -32,7 +32,7 @@ concept which helps to maintain consistency of data and easy 
transaction managem
 
   ```
   SHOW [HISTORY] SEGMENTS
-  [FOR TABLE | ON] [db_name.]table_name [LIMIT number_of_segments]
+  [FOR TABLE | ON] [db_name.]table_name [INCLUDE STAGE] [LIMIT 
number_of_segments]

Review comment:
   modified

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonShowSegmentsAsSelectCommand.scala
##
@@ -35,7 +36,8 @@ case class CarbonShowSegmentsAsSelectCommand(
 databaseNameOp: Option[String],
 tableName: String,
 query: String,
-limit: Option[String],
+limit: Option[Int],
+includeStage: Boolean = false,

Review comment:
   modified





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on a change in pull request #3798: [CARBONDATA-3875] Support show segments with stage

2020-06-28 Thread GitBox


marchpure commented on a change in pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798#discussion_r446657569



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonShowSegmentsCommand.scala
##
@@ -17,34 +17,40 @@
 
 package org.apache.spark.sql.execution.command.management
 
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.conf.Configuration
 import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
 import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference}
 import org.apache.spark.sql.execution.command.{Checker, DataCommand}
 import org.apache.spark.sql.types.StringType
 
-import org.apache.carbondata.api.CarbonStore.{getDataAndIndexSize, 
getLoadStartTime, getLoadTimeTaken, getPartitions, readSegments}
+import org.apache.carbondata.api.CarbonStore.{getDataAndIndexSize, 
getLoadStartTime, getLoadTimeTaken, getPartitions, readSegments, readStages}
 import org.apache.carbondata.common.Strings
 import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
-import org.apache.carbondata.core.statusmanager.LoadMetadataDetails
+import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, 
StageInput}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+
 
 case class CarbonShowSegmentsCommand(
 databaseNameOp: Option[String],
 tableName: String,
-limit: Option[String],
+limit: Option[Int],
+includeStage: Boolean = false,

Review comment:
   modified

##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
##
@@ -63,13 +68,101 @@ object CarbonStore {
 }
 
 if (limit.isDefined) {
-  val lim = Integer.parseInt(limit.get)
-  segmentsMetadataDetails.slice(0, lim)
+  segmentsMetadataDetails.slice(0, limit.get)
 } else {
   segmentsMetadataDetails
 }
   }
 
+  /**
+   * Read stage files and return input files
+   */
+  def readStages(
+  tablePath: String,
+  configuration: Configuration): Seq[StageInput] = {
+val stageFiles = listStageFiles(
+  CarbonTablePath.getStageDir(tablePath), configuration)
+var output = Collections.synchronizedList(new util.ArrayList[StageInput]())
+output.addAll(readStageInput(stageFiles._1,
+  StageInput.StageStatus.Unload).asJavaCollection)
+output.addAll(readStageInput(stageFiles._2,
+  StageInput.StageStatus.Loading).asJavaCollection)
+Collections.sort(output, new Comparator[StageInput]() {
+  def compare(stageInput1: StageInput, stageInput2: StageInput): Int = {
+(stageInput2.getCreateTime - stageInput1.getCreateTime).intValue()
+  }
+})
+output.asScala
+  }
+
+  /**
+   * Read stage files and return input files
+   */
+  def readStageInput(
+  stageFiles: Seq[CarbonFile],
+  status: StageInput.StageStatus): Seq[StageInput] = {
+val gson = new Gson()
+val output = Collections.synchronizedList(new util.ArrayList[StageInput]())
+stageFiles.map { stage =>
+  val filePath = stage.getAbsolutePath
+  val stream = FileFactory.getDataInputStream(filePath)
+  try {
+val stageInput = gson.fromJson(new InputStreamReader(stream), 
classOf[StageInput])
+stageInput.setCreateTime(stage.getLastModifiedTime)
+stageInput.setStatus(status)
+output.add(stageInput)
+  } finally {
+stream.close()
+  }
+}
+output.asScala
+  }
+
+  /*
+   * Collect all stage files and matched success files.
+   * A stage file without success file will not be collected
+   */
+  def listStageFiles(
+loadDetailsDir: String,
+hadoopConf: Configuration): (Array[CarbonFile], Array[CarbonFile]) = {
+val dir = FileFactory.getCarbonFile(loadDetailsDir, hadoopConf)
+if (dir.exists()) {
+  var allFiles = dir.listFiles()
+  val successFiles = allFiles.filter { file =>
+file.getName.endsWith(CarbonTablePath.SUCCESS_FILE_SUBFIX)
+  }.map { file =>
+(file.getName.substring(0, file.getName.indexOf(".")), file)
+  }.toMap
+  val loadingFiles = allFiles.filter { file =>
+file.getName.endsWith(CarbonTablePath.LOADING_FILE_SUBFIX)
+  }.map { file =>
+(file.getName.substring(0, file.getName.indexOf(".")), file)
+  }.toMap
+
+  allFiles = allFiles.filter { file =>
+!file.getName.endsWith(CarbonTablePath.SUCCESS_FILE_SUBFIX)
+  }.filter { file =>
+!file.getName.endsWith(CarbonTablePath.LOADING_FILE_SUBFIX)
+  }
+
+  val unloadedFiles = allFiles.filter { file =>
+successFiles.contains(file.getName)

Review comment:
   modified

##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
##
@@ -63,13 +68,101 @@ object CarbonStore {
 }
 
 if (limit.isDefined) {
-  val lim = Integer.parseInt(limit.get)
-  

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3798: [CARBONDATA-3875] Support show segments with stage

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798#issuecomment-650761126


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3237/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3798: [CARBONDATA-3875] Support show segments with stage

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798#issuecomment-650760914


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1509/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3813: [CARBONDATA-3876] Update cli test case

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3813:
URL: https://github.com/apache/carbondata/pull/3813#issuecomment-650759599


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1508/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3813: [CARBONDATA-3876] Update cli test case

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3813:
URL: https://github.com/apache/carbondata/pull/3813#issuecomment-650759563


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3236/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3800: [CARBONDATA-3877] Reduce read tablestatus overhead during inserting into partition table

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3800:
URL: https://github.com/apache/carbondata/pull/3800#issuecomment-650756474


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1510/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3800: [CARBONDATA-3877] Reduce read tablestatus overhead during inserting into partition table

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3800:
URL: https://github.com/apache/carbondata/pull/3800#issuecomment-650755888


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3238/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] niuge01 commented on pull request #3814: [CARBONDATA-3878] Get last modified time from 'tablestatus' file entry instead of segment file to reduce file operation 'getLastModifiedT

2020-06-28 Thread GitBox


niuge01 commented on pull request #3814:
URL: https://github.com/apache/carbondata/pull/3814#issuecomment-650745072


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure closed pull request #3794: [WIP] refresh segment index cache performance issue

2020-06-28 Thread GitBox


marchpure closed pull request #3794:
URL: https://github.com/apache/carbondata/pull/3794


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai opened a new pull request #3814: [CARBONDATA-3878] Get last modified time from 'tablestatus' file entry instead of segment file to reduce file operation 'getLastModified

2020-06-28 Thread GitBox


QiangCai opened a new pull request #3814:
URL: https://github.com/apache/carbondata/pull/3814


### Why is this PR needed?
After the table has too many segments,  file operation 
'getLastModifiedTime' on all segment files will take a long time.

### What changes were proposed in this PR?
Get last modified time from 'tablestatus' file entry instead of segment 
file to reduce file operation 'getLastModifiedTime'
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3878) Should get the last modified time from 'tablestatus' file instead of segment file to reduce file operation 'getLastModifiedTime'

2020-06-28 Thread David Cai (Jira)
David Cai created CARBONDATA-3878:
-

 Summary: Should get the last modified time from 'tablestatus' file 
instead of segment file to reduce file operation 'getLastModifiedTime' 
 Key: CARBONDATA-3878
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3878
 Project: CarbonData
  Issue Type: Improvement
Reporter: David Cai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3877) Reduce read tablestatus overhead during inserting into partition table

2020-06-28 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-3877:
---

 Summary: Reduce read tablestatus overhead during inserting into 
partition table
 Key: CARBONDATA-3877
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3877
 Project: CarbonData
  Issue Type: Improvement
  Components: spark-integration
Affects Versions: 2.0.0
Reporter: Xingjun Hao
 Fix For: 2.0.2


Currently during inserting into a partition table, there are a lot of 
tablestauts read operations, but when storing table status file in object 
store, reading of table status file may fail (receive IOException or 
JsonSyntaxException) when table status file is being modifying, which leading 
to High failure rate when concurrent insert into a partition table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3876) Update cli test case

2020-06-28 Thread Manhua Jiang (Jira)
Manhua Jiang created CARBONDATA-3876:


 Summary: Update cli test case
 Key: CARBONDATA-3876
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3876
 Project: CarbonData
  Issue Type: Improvement
Reporter: Manhua Jiang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] kevinjmh opened a new pull request #3813: [CARBONDATA-3876] Update cli test case

2020-06-28 Thread GitBox


kevinjmh opened a new pull request #3813:
URL: https://github.com/apache/carbondata/pull/3813


### Why is this PR needed?
The test case wants to validate non-sorted column's min max range. To make 
the result fixed, set all columns to be sort columns

### What changes were proposed in this PR?
   set all columns to be sort columns, and update test output
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3798: [CARBONDATA-3875] Support show segments include stage

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798#issuecomment-650731827


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1507/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3798: [CARBONDATA-3875] Support show segments include stage

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798#issuecomment-650730601


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3235/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#issuecomment-650724764


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1506/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3804: [CARBONDATA-3871] Optimize performance when getting row from heap

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3804:
URL: https://github.com/apache/carbondata/pull/3804#issuecomment-650724384


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3234/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3812: [wip]fix global sort compaction issue

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3812:
URL: https://github.com/apache/carbondata/pull/3812#issuecomment-650723297


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3233/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3812: [wip]fix global sort compaction issue

2020-06-28 Thread GitBox


CarbonDataQA1 commented on pull request #3812:
URL: https://github.com/apache/carbondata/pull/3812#issuecomment-650722692


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1505/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-3865) Implement delete and update feature in carbondata SDK.

2020-06-28 Thread Karanpreet Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karanpreet Singh updated CARBONDATA-3865:
-
Attachment: Implement delete and update feature in carbondata SDK_V2.pdf

> Implement delete and update feature in carbondata SDK.
> --
>
> Key: CARBONDATA-3865
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3865
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Karanpreet Singh
>Priority: Major
> Attachments: Implement delete and update feature in carbondata 
> SDK.pdf, Implement delete and update feature in carbondata SDK_V2.pdf
>
>
> Please find the design document attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] QiangCai commented on pull request #3801: adding comments with issue numbers.

2020-06-28 Thread GitBox


QiangCai commented on pull request #3801:
URL: https://github.com/apache/carbondata/pull/3801#issuecomment-650709366


   If it can help us to understand the code, we can add the issue number into 
the code in a few cases, else if no need to do.
   But this PR is not the above case, it added all issue numbers into code 
directly and didn't check whether it is needed or not.
   
   please close this PR without merging.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 opened a new pull request #3812: [wip]fix global sort compaction issue

2020-06-28 Thread GitBox


akashrn5 opened a new pull request #3812:
URL: https://github.com/apache/carbondata/pull/3812


### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jackylk commented on a change in pull request #3798: [CARBONDATA-3875] Support show segments include stage

2020-06-28 Thread GitBox


jackylk commented on a change in pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798#discussion_r446609866



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonShowSegmentsCommand.scala
##
@@ -17,34 +17,40 @@
 
 package org.apache.spark.sql.execution.command.management
 
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.conf.Configuration
 import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
 import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference}
 import org.apache.spark.sql.execution.command.{Checker, DataCommand}
 import org.apache.spark.sql.types.StringType
 
-import org.apache.carbondata.api.CarbonStore.{getDataAndIndexSize, 
getLoadStartTime, getLoadTimeTaken, getPartitions, readSegments}
+import org.apache.carbondata.api.CarbonStore.{getDataAndIndexSize, 
getLoadStartTime, getLoadTimeTaken, getPartitions, readSegments, readStages}
 import org.apache.carbondata.common.Strings
 import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
-import org.apache.carbondata.core.statusmanager.LoadMetadataDetails
+import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, 
StageInput}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+
 
 case class CarbonShowSegmentsCommand(
 databaseNameOp: Option[String],
 tableName: String,
-limit: Option[String],
+limit: Option[Int],
+includeStage: Boolean = false,

Review comment:
   add as last param





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jackylk commented on a change in pull request #3798: [CARBONDATA-3875] Support show segments include stage

2020-06-28 Thread GitBox


jackylk commented on a change in pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798#discussion_r446609807



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
##
@@ -63,13 +68,101 @@ object CarbonStore {
 }
 
 if (limit.isDefined) {
-  val lim = Integer.parseInt(limit.get)
-  segmentsMetadataDetails.slice(0, lim)
+  segmentsMetadataDetails.slice(0, limit.get)
 } else {
   segmentsMetadataDetails
 }
   }
 
+  /**
+   * Read stage files and return input files
+   */
+  def readStages(
+  tablePath: String,
+  configuration: Configuration): Seq[StageInput] = {
+val stageFiles = listStageFiles(
+  CarbonTablePath.getStageDir(tablePath), configuration)
+var output = Collections.synchronizedList(new util.ArrayList[StageInput]())
+output.addAll(readStageInput(stageFiles._1,
+  StageInput.StageStatus.Unload).asJavaCollection)
+output.addAll(readStageInput(stageFiles._2,
+  StageInput.StageStatus.Loading).asJavaCollection)
+Collections.sort(output, new Comparator[StageInput]() {
+  def compare(stageInput1: StageInput, stageInput2: StageInput): Int = {
+(stageInput2.getCreateTime - stageInput1.getCreateTime).intValue()
+  }
+})
+output.asScala
+  }
+
+  /**
+   * Read stage files and return input files
+   */
+  def readStageInput(
+  stageFiles: Seq[CarbonFile],
+  status: StageInput.StageStatus): Seq[StageInput] = {
+val gson = new Gson()
+val output = Collections.synchronizedList(new util.ArrayList[StageInput]())
+stageFiles.map { stage =>
+  val filePath = stage.getAbsolutePath
+  val stream = FileFactory.getDataInputStream(filePath)
+  try {
+val stageInput = gson.fromJson(new InputStreamReader(stream), 
classOf[StageInput])
+stageInput.setCreateTime(stage.getLastModifiedTime)
+stageInput.setStatus(status)
+output.add(stageInput)
+  } finally {
+stream.close()
+  }
+}
+output.asScala
+  }
+
+  /*
+   * Collect all stage files and matched success files.
+   * A stage file without success file will not be collected
+   */
+  def listStageFiles(
+loadDetailsDir: String,
+hadoopConf: Configuration): (Array[CarbonFile], Array[CarbonFile]) = {
+val dir = FileFactory.getCarbonFile(loadDetailsDir, hadoopConf)
+if (dir.exists()) {
+  var allFiles = dir.listFiles()
+  val successFiles = allFiles.filter { file =>
+file.getName.endsWith(CarbonTablePath.SUCCESS_FILE_SUBFIX)
+  }.map { file =>
+(file.getName.substring(0, file.getName.indexOf(".")), file)
+  }.toMap
+  val loadingFiles = allFiles.filter { file =>
+file.getName.endsWith(CarbonTablePath.LOADING_FILE_SUBFIX)
+  }.map { file =>
+(file.getName.substring(0, file.getName.indexOf(".")), file)
+  }.toMap
+
+  allFiles = allFiles.filter { file =>
+!file.getName.endsWith(CarbonTablePath.SUCCESS_FILE_SUBFIX)
+  }.filter { file =>
+!file.getName.endsWith(CarbonTablePath.LOADING_FILE_SUBFIX)
+  }
+
+  val unloadedFiles = allFiles.filter { file =>
+successFiles.contains(file.getName)

Review comment:
   maybe you can first construct the last part of the file by "file.getName 
+ CarbonTablePath.LOADING_FILE_SUBFIX", then can still use endsWith. I think it 
can be a little faster





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jackylk commented on a change in pull request #3798: [CARBONDATA-3875] Support show segments include stage

2020-06-28 Thread GitBox


jackylk commented on a change in pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798#discussion_r446609417



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
##
@@ -63,13 +68,101 @@ object CarbonStore {
 }
 
 if (limit.isDefined) {
-  val lim = Integer.parseInt(limit.get)
-  segmentsMetadataDetails.slice(0, lim)
+  segmentsMetadataDetails.slice(0, limit.get)
 } else {
   segmentsMetadataDetails
 }
   }
 
+  /**
+   * Read stage files and return input files
+   */
+  def readStages(
+  tablePath: String,
+  configuration: Configuration): Seq[StageInput] = {
+val stageFiles = listStageFiles(
+  CarbonTablePath.getStageDir(tablePath), configuration)
+var output = Collections.synchronizedList(new util.ArrayList[StageInput]())
+output.addAll(readStageInput(stageFiles._1,
+  StageInput.StageStatus.Unload).asJavaCollection)
+output.addAll(readStageInput(stageFiles._2,
+  StageInput.StageStatus.Loading).asJavaCollection)
+Collections.sort(output, new Comparator[StageInput]() {
+  def compare(stageInput1: StageInput, stageInput2: StageInput): Int = {
+(stageInput2.getCreateTime - stageInput1.getCreateTime).intValue()
+  }
+})
+output.asScala
+  }
+
+  /**
+   * Read stage files and return input files
+   */
+  def readStageInput(
+  stageFiles: Seq[CarbonFile],
+  status: StageInput.StageStatus): Seq[StageInput] = {
+val gson = new Gson()
+val output = Collections.synchronizedList(new util.ArrayList[StageInput]())
+stageFiles.map { stage =>
+  val filePath = stage.getAbsolutePath
+  val stream = FileFactory.getDataInputStream(filePath)
+  try {
+val stageInput = gson.fromJson(new InputStreamReader(stream), 
classOf[StageInput])
+stageInput.setCreateTime(stage.getLastModifiedTime)
+stageInput.setStatus(status)
+output.add(stageInput)
+  } finally {
+stream.close()
+  }
+}
+output.asScala
+  }
+
+  /*
+   * Collect all stage files and matched success files.
+   * A stage file without success file will not be collected
+   */
+  def listStageFiles(
+loadDetailsDir: String,
+hadoopConf: Configuration): (Array[CarbonFile], Array[CarbonFile]) = {
+val dir = FileFactory.getCarbonFile(loadDetailsDir, hadoopConf)
+if (dir.exists()) {
+  var allFiles = dir.listFiles()
+  val successFiles = allFiles.filter { file =>
+file.getName.endsWith(CarbonTablePath.SUCCESS_FILE_SUBFIX)
+  }.map { file =>
+(file.getName.substring(0, file.getName.indexOf(".")), file)
+  }.toMap
+  val loadingFiles = allFiles.filter { file =>
+file.getName.endsWith(CarbonTablePath.LOADING_FILE_SUBFIX)
+  }.map { file =>
+(file.getName.substring(0, file.getName.indexOf(".")), file)
+  }.toMap
+
+  allFiles = allFiles.filter { file =>

Review comment:
   create another variable for this





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jackylk commented on a change in pull request #3798: [CARBONDATA-3875] Support show segments include stage

2020-06-28 Thread GitBox


jackylk commented on a change in pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798#discussion_r446609223



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/api/CarbonStore.scala
##
@@ -63,13 +68,101 @@ object CarbonStore {
 }
 
 if (limit.isDefined) {
-  val lim = Integer.parseInt(limit.get)
-  segmentsMetadataDetails.slice(0, lim)
+  segmentsMetadataDetails.slice(0, limit.get)
 } else {
   segmentsMetadataDetails
 }
   }
 
+  /**
+   * Read stage files and return input files
+   */
+  def readStages(
+  tablePath: String,
+  configuration: Configuration): Seq[StageInput] = {
+val stageFiles = listStageFiles(
+  CarbonTablePath.getStageDir(tablePath), configuration)
+var output = Collections.synchronizedList(new util.ArrayList[StageInput]())
+output.addAll(readStageInput(stageFiles._1,
+  StageInput.StageStatus.Unload).asJavaCollection)
+output.addAll(readStageInput(stageFiles._2,
+  StageInput.StageStatus.Loading).asJavaCollection)
+Collections.sort(output, new Comparator[StageInput]() {
+  def compare(stageInput1: StageInput, stageInput2: StageInput): Int = {
+(stageInput2.getCreateTime - stageInput1.getCreateTime).intValue()
+  }
+})
+output.asScala
+  }
+
+  /**
+   * Read stage files and return input files
+   */
+  def readStageInput(
+  stageFiles: Seq[CarbonFile],
+  status: StageInput.StageStatus): Seq[StageInput] = {
+val gson = new Gson()
+val output = Collections.synchronizedList(new util.ArrayList[StageInput]())
+stageFiles.map { stage =>
+  val filePath = stage.getAbsolutePath
+  val stream = FileFactory.getDataInputStream(filePath)
+  try {
+val stageInput = gson.fromJson(new InputStreamReader(stream), 
classOf[StageInput])
+stageInput.setCreateTime(stage.getLastModifiedTime)
+stageInput.setStatus(status)
+output.add(stageInput)
+  } finally {
+stream.close()
+  }
+}
+output.asScala
+  }
+
+  /*
+   * Collect all stage files and matched success files.
+   * A stage file without success file will not be collected
+   */
+  def listStageFiles(

Review comment:
   It seems this it is only for test, can you add it to test folder instead 
of adding new API?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jackylk commented on a change in pull request #3798: [CARBONDATA-3875] Support show segments include stage

2020-06-28 Thread GitBox


jackylk commented on a change in pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798#discussion_r446608882



##
File path: docs/segment-management-on-carbondata.md
##
@@ -32,7 +32,7 @@ concept which helps to maintain consistency of data and easy 
transaction managem
 
   ```
   SHOW [HISTORY] SEGMENTS
-  [FOR TABLE | ON] [db_name.]table_name [LIMIT number_of_segments]
+  [FOR TABLE | ON] [db_name.]table_name [INCLUDE STAGE] [LIMIT 
number_of_segments]

Review comment:
   can user use "with stage" and segment query together?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jackylk commented on a change in pull request #3798: [CARBONDATA-3875] Support show segments include stage

2020-06-28 Thread GitBox


jackylk commented on a change in pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798#discussion_r446608764



##
File path: docs/segment-management-on-carbondata.md
##
@@ -32,7 +32,7 @@ concept which helps to maintain consistency of data and easy 
transaction managem
 
   ```
   SHOW [HISTORY] SEGMENTS
-  [FOR TABLE | ON] [db_name.]table_name [LIMIT number_of_segments]
+  [FOR TABLE | ON] [db_name.]table_name [INCLUDE STAGE] [LIMIT 
number_of_segments]

Review comment:
   suggest to use "WITH STAGE"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jackylk commented on a change in pull request #3798: [CARBONDATA-3875] Support show segments include stage

2020-06-28 Thread GitBox


jackylk commented on a change in pull request #3798:
URL: https://github.com/apache/carbondata/pull/3798#discussion_r446607939



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonShowSegmentsAsSelectCommand.scala
##
@@ -35,7 +36,8 @@ case class CarbonShowSegmentsAsSelectCommand(
 databaseNameOp: Option[String],
 tableName: String,
 query: String,
-limit: Option[String],
+limit: Option[Int],
+includeStage: Boolean = false,

Review comment:
   add new parameter at the last





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org