[GitHub] carbondata issue #1436: [WIP][CARBONDATA-1617] Merging carbonindex files wit...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1436 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1320/ ---
[GitHub] carbondata issue #1436: [WIP][CARBONDATA-1617] Merging carbonindex files wit...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1436 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/684/ ---
[GitHub] carbondata issue #1436: [WIP][CARBONDATA-1617] Merging carbonindex files wit...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1436 restest this please ---
[GitHub] carbondata issue #1436: [WIP][CARBONDATA-1617] Merging carbonindex files wit...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1436 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/683/ ---
[GitHub] carbondata issue #1437: [CARBONDATA-1618] Fix issue of not support table com...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1437 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1319/ ---
[GitHub] carbondata issue #1426: DOCUMENTATION for SORT_SCOPE
Github user sgururajshetty commented on the issue: https://github.com/apache/carbondata/pull/1426 @chenliang613 please review ---
[GitHub] carbondata issue #1437: [CARBONDATA-1618] Fix issue of not support table com...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1437 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/682/ ---
[jira] [Updated] (CARBONDATA-1619) Loading data to a carbondata table with overwrite=true many times will cause NullPointerException
[ https://issues.apache.org/jira/browse/CARBONDATA-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wyp updated CARBONDATA-1619: Description: If you loading data to a carbondata table with {{overwrite=true}} many times will cause {{NullPointerException}}. The following is the code snippet: {code} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.1.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77) Type in expressions to have them evaluated. Type :help for more information. scala> import org.apache.spark.sql.SparkSession import org.apache.spark.sql.SparkSession scala> import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.CarbonSession._ scala> val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://mycluster/user/wyp/carb") 17/10/26 12:58:25 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect. 17/10/26 12:58:25 WARN util.CarbonProperties: main The custom block distribution value "null" is invalid. Using the default value "false 17/10/26 12:58:25 WARN util.CarbonProperties: main The enable vector reader value "null" is invalid. Using the default value "true 17/10/26 12:58:25 WARN util.CarbonProperties: main The value "LOCALLOCK" configured for key carbon.lock.type is invalid for current file system. Use the default value HDFSLOCK instead. 17/10/26 12:58:43 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException carbon: org.apache.spark.sql.SparkSession = org.apache.spark.sql.CarbonSession@718b9d56 scala> carbon.sql("CREATE TABLE temp.my_table(id bigint) STORED BY 'carbondata'") 17/10/26 12:59:03 AUDIT command.CreateTable: [l-sparkcluster1.test.com][wyp][Thread-1]Creating Table with Database name [temp] and Table name [my_table] 17/10/26 12:59:03 WARN hive.HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting data source table `temp`.`my_table` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. 17/10/26 12:59:03 AUDIT command.CreateTable: [l-sparkcluster1.test.com][wyp][Thread-1]Table created with Database name [temp] and Table name [my_table] res0: org.apache.spark.sql.DataFrame = [] scala> carbon.sql("insert overwrite table temp.my_table select id from co.order_common_p where dt = '2010-10'") 17/10/26 12:59:23 AUDIT rdd.CarbonDataRDDFactory$: [l-sparkcluster1.test.com][wyp][Thread-1]Data load request has been received for table temp.my_table 17/10/26 12:59:23 WARN util.CarbonDataProcessorUtil: main sort scope is set to LOCAL_SORT 17/10/26 12:59:26 AUDIT rdd.CarbonDataRDDFactory$: [l-sparkcluster1.test.com][wyp][Thread-1]Data load is successful for temp.my_table res1: org.apache.spark.sql.DataFrame = [] scala> carbon.sql("insert overwrite table temp.my_table select id from co.order_common_p where dt = '2010-10'") 17/10/26 12:59:33 AUDIT rdd.CarbonDataRDDFactory$: [l-sparkcluster1.test.com][wyp][Thread-1]Data load request has been received for table temp.my_table 17/10/26 12:59:33 WARN util.CarbonDataProcessorUtil: main sort scope is set to LOCAL_SORT 17/10/26 12:59:52 AUDIT rdd.CarbonDataRDDFactory$: [l-sparkcluster1.test.com][wyp][Thread-1]Data load is successful for temp.my_table res2: org.apache.spark.sql.DataFrame = [] scala> carbon.sql("insert overwrite table temp.my_table select id from co.order_common_p where dt = '2012-10'") 17/10/26 13:00:05 AUDIT rdd.CarbonDataRDDFactory$: [l-sparkcluster1.test.com][wyp][Thread-1]Data load request has been received for table temp.my_table 17/10/26 13:00:05 WARN util.CarbonDataProcessorUtil: main sort scope is set to LOCAL_SORT 17/10/26 13:00:08 ERROR filesystem.AbstractDFSCarbonFile: main Exception occurred:File does not exist: hdfs://mycluster/user/wyp/carb/temp/my_table/Fact/Part0/Segment_0 17/10/26 13:00:09 ERROR command.LoadTable: main java.lang.NullPointerException at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88) at org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364) at org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93) at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326) at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322) at org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUt
[GitHub] carbondata issue #1437: [CARBONDATA-1618] Fix issue of not support table com...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1437 retest this please ---
[jira] [Created] (CARBONDATA-1619) Loading data to a carbondata table with overwrite=true many times will cause NullPointerException
wyp created CARBONDATA-1619: --- Summary: Loading data to a carbondata table with overwrite=true many times will cause NullPointerException Key: CARBONDATA-1619 URL: https://issues.apache.org/jira/browse/CARBONDATA-1619 Project: CarbonData Issue Type: Bug Components: sql Affects Versions: 1.2.0 Reporter: wyp If you loading data to a carbondata table with {{overwrite=true}} many times will cause {{NullPointerException}}. The following is the code snippet: {code} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.1.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77) Type in expressions to have them evaluated. Type :help for more information. scala> import org.apache.spark.sql.SparkSession import org.apache.spark.sql.SparkSession scala> import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.CarbonSession._ scala> val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://mycluster/user/wyp/carb") 17/10/26 12:58:25 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect. 17/10/26 12:58:25 WARN util.CarbonProperties: main The custom block distribution value "null" is invalid. Using the default value "false 17/10/26 12:58:25 WARN util.CarbonProperties: main The enable vector reader value "null" is invalid. Using the default value "true 17/10/26 12:58:25 WARN util.CarbonProperties: main The value "LOCALLOCK" configured for key carbon.lock.type is invalid for current file system. Use the default value HDFSLOCK instead. 17/10/26 12:58:43 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException carbon: org.apache.spark.sql.SparkSession = org.apache.spark.sql.CarbonSession@718b9d56 scala> carbon.sql("CREATE TABLE temp.my_table(id bigint) STORED BY 'carbondata'") 17/10/26 12:59:03 AUDIT command.CreateTable: [l-sparkcluster1.test.com][wyp][Thread-1]Creating Table with Database name [temp] and Table name [my_table] 17/10/26 12:59:03 WARN hive.HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting data source table `temp`.`my_table` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. 17/10/26 12:59:03 AUDIT command.CreateTable: [l-sparkcluster1.test.com][wyp][Thread-1]Table created with Database name [temp] and Table name [my_table] res0: org.apache.spark.sql.DataFrame = [] scala> carbon.sql("insert overwrite table temp.my_table select id from co.order_common_p where dt = '2010-10'") 17/10/26 12:59:23 AUDIT rdd.CarbonDataRDDFactory$: [l-sparkcluster1.test.com][wyp][Thread-1]Data load request has been received for table temp.my_table 17/10/26 12:59:23 WARN util.CarbonDataProcessorUtil: main sort scope is set to LOCAL_SORT 17/10/26 12:59:26 AUDIT rdd.CarbonDataRDDFactory$: [l-sparkcluster1.test.com][wyp][Thread-1]Data load is successful for temp.my_table res1: org.apache.spark.sql.DataFrame = [] scala> carbon.sql("insert overwrite table temp.my_table select id from co.order_common_p where dt = '2010-10'") 17/10/26 12:59:33 AUDIT rdd.CarbonDataRDDFactory$: [l-sparkcluster1.test.com][wyp][Thread-1]Data load request has been received for table temp.my_table 17/10/26 12:59:33 WARN util.CarbonDataProcessorUtil: main sort scope is set to LOCAL_SORT 17/10/26 12:59:52 AUDIT rdd.CarbonDataRDDFactory$: [l-sparkcluster1.test.com][wyp][Thread-1]Data load is successful for temp.my_table res2: org.apache.spark.sql.DataFrame = [] scala> carbon.sql("insert overwrite table temp.my_table select id from co.order_common_p where dt = '2012-10'") 17/10/26 13:00:05 AUDIT rdd.CarbonDataRDDFactory$: [l-sparkcluster1.test.com][wyp][Thread-1]Data load request has been received for table temp.my_table 17/10/26 13:00:05 WARN util.CarbonDataProcessorUtil: main sort scope is set to LOCAL_SORT 17/10/26 13:00:08 ERROR filesystem.AbstractDFSCarbonFile: main Exception occurred:File does not exist: hdfs://mycluster/user/wyp/carb/temp/my_table/Fact/Part0/Segment_0 17/10/26 13:00:09 ERROR command.LoadTable: main java.lang.NullPointerException at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88) at org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364) at org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93) at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326) at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupI
[GitHub] carbondata pull request #716: [CARBONDATA-840] improve limit query performan...
Github user lionelcao closed the pull request at: https://github.com/apache/carbondata/pull/716 ---
[GitHub] carbondata issue #1437: [CARBONDATA-1618] Fix issue of not support table com...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1437 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1318/ ---
[GitHub] carbondata issue #1437: [CARBONDATA-1618] Fix issue of not support table com...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1437 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/681/ ---
[GitHub] carbondata pull request #1437: [CARBONDATA-1618] Fix issue of not support ta...
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1437 [CARBONDATA-1618] Fix issue of not support table comment Background: Current carbon do not support table comment when create table. This PR will support table comment. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata tablecomment Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1437.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1437 ---
[jira] [Created] (CARBONDATA-1618) Fix issue of not supporting table comment
chenerlu created CARBONDATA-1618: Summary: Fix issue of not supporting table comment Key: CARBONDATA-1618 URL: https://issues.apache.org/jira/browse/CARBONDATA-1618 Project: CarbonData Issue Type: Bug Reporter: chenerlu Assignee: chenerlu -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1593) Add partition to table cause NoSuchTableException
[ https://issues.apache.org/jira/browse/CARBONDATA-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wyp updated CARBONDATA-1593: Description: When I run the following code snippet, I get NoSuchTableException: {code} scala> import org.apache.spark.sql.SparkSession scala> import org.apache.spark.sql.CarbonSession._ scala> val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://mycluster/user/wyp/carbon") scala> carbon.sql("CREATE TABLE temp.order_common(id bigint, order_no string,create_time timestamp) partitioned by (dt string) STORED BY 'carbondata' tblproperties('partition_type'='RANGE','RANGE_INFO'='2010,2011')") scala> carbon.sql("ALTER TABLE temp.order_common ADD PARTITION('2012')") org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'order_common' not found in database 'default'; at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:76) at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:76) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:76) at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:110) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:110) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95) at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:109) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:601) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:601) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95) at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:600) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:106) at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:69) at org.apache.spark.sql.hive.CarbonSessionCatalog.lookupRelation(CarbonSessionState.scala:83) at org.apache.spark.sql.internal.CatalogImpl.refreshTable(CatalogImpl.scala:461) at org.apache.spark.sql.execution.command.AlterTableSplitPartitionCommand.processSchema(carbonTableSchema.scala:283) at org.apache.spark.sql.execution.command.AlterTableSplitPartitionCommand.run(carbonTableSchema.scala:229) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87) at org.apache.spark.sql.Dataset.(Dataset.scala:185) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) ... 50 elided {code} but partition {{2012}} already add to table {{temp.order_common}}: {code} scala> carbon.sql("show partitions temp.order_common").show(100, 100) +--+ | partition| +--+ | 0, dt = DEFAULT| | 1, dt < 2010 | |2, 2010 <= dt < 2011 | |3, 2011 <= dt < 2012 | +--+ {code} My Spark version is 2.1.0, Carbondata is 1.2.0. was: When I run the following code snippet, I get NoSuchTableException: {code} scala> import org.apache.spark.sql.SparkSession scala> import org.apache.spark.sql.CarbonSession._ scala> val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://mycluster/user/wyp/carbon") scala> carbon.sql("CREATE TABLE temp.order(id bigint, order_no string,create_time timestamp) partitioned by (dt string) STORED BY 'carbondata' tblproperties('partition_type'='R
[GitHub] carbondata issue #1418: [WIP] Support db location
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1418 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1317/ ---
[GitHub] carbondata issue #1418: [WIP] Support db location
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1418 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/680/ ---
[GitHub] carbondata issue #1436: [WIP][CARBONDATA-1617] Merging carbonindex files wit...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1436 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1316/ ---
[GitHub] carbondata issue #1436: [WIP][CARBONDATA-1617] Merging carbonindex files wit...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1436 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/679/ ---
[GitHub] carbondata pull request #1436: [WIP][CARBONDATA-1617] Merging carbonindex fi...
GitHub user ravipesala opened a pull request: https://github.com/apache/carbondata/pull/1436 [WIP][CARBONDATA-1617] Merging carbonindex files within segment Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [X ] Make sure the PR title is formatted like: `[CARBONDATA-] Description of pull request` - [ X] Make sure to add PR description including Problem : The first-time query of carbon becomes very slow. It is because of reading many small carbonindex files and cache to the driver at the first time. Many carbonindex files are created in below case Loading data in large cluster For example, if the cluster size is 100 nodes then for each load 100 index files are created per segment. So after 100 loads, the number of carbonindex files becomes 1. . It will be slower to read all the files from the driver since a lot of namenode calls and IO operations. Solution : Merge the carbonindex files in two levels.so that we can reduce the IO calls to namenode and improves the read performance. Merge within a segment. Merge the carbonindex files to single file immediately after load completes within the segment. It would be named as a .carbonindexmerge file. It is actually not a true data merging but a simple file merge. So that the current structure of carbonindex files does not change. While reading we just read one file instead of many carbonindex files within the segment. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata carbon-index-merge Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1436.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1436 commit bbabb705b8dc81a6569b66ee3bb1765de04348c7 Author: ravipesala Date: 2017-10-25T05:43:22Z Merging carbonindex files within segment ---
[jira] [Created] (CARBONDATA-1617) Merging carbonindex files for each segment.
Ravindra Pesala created CARBONDATA-1617: --- Summary: Merging carbonindex files for each segment. Key: CARBONDATA-1617 URL: https://issues.apache.org/jira/browse/CARBONDATA-1617 Project: CarbonData Issue Type: New Feature Reporter: Ravindra Pesala Hi, Problem : The first-time query of carbon becomes very slow. It is because of reading many small carbonindex files and cache to the driver at the first time. Many carbonindex files are created in below case Loading data in large cluster For example, if the cluster size is 100 nodes then for each load 100 index files are created per segment. So after 100 loads, the number of carbonindex files becomes 1. . It will be slower to read all the files from the driver since a lot of namenode calls and IO operations. Solution : Merge the carbonindex files in two levels.so that we can reduce the IO calls to namenode and improves the read performance. Merge within a segment. Merge the carbonindex files to single file immediately after load completes within the segment. It would be named as a .carbonindexmerge file. It is actually not a true data merging but a simple file merge. So that the current structure of carbonindex files does not change. While reading we just read one file instead of many carbonindex files within the segment. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1615) DELETE SEGMENT BY DATE should ignore the streaming segment
Jacky Li created CARBONDATA-1615: Summary: DELETE SEGMENT BY DATE should ignore the streaming segment Key: CARBONDATA-1615 URL: https://issues.apache.org/jira/browse/CARBONDATA-1615 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1616) Add document for streaming ingestion usage
Jacky Li created CARBONDATA-1616: Summary: Add document for streaming ingestion usage Key: CARBONDATA-1616 URL: https://issues.apache.org/jira/browse/CARBONDATA-1616 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li Fix For: 1.3.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1614) SHOW SEGMENT should include the streaming property
Jacky Li created CARBONDATA-1614: Summary: SHOW SEGMENT should include the streaming property Key: CARBONDATA-1614 URL: https://issues.apache.org/jira/browse/CARBONDATA-1614 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1613) streaming table should support INSERT OVERWRITE
Jacky Li created CARBONDATA-1613: Summary: streaming table should support INSERT OVERWRITE Key: CARBONDATA-1613 URL: https://issues.apache.org/jira/browse/CARBONDATA-1613 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li INSERT OVERWRITE should take care of streaming segment when executing the command -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1612) Block DELETE SEGMENT BY ID for streaming table
[ https://issues.apache.org/jira/browse/CARBONDATA-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-1612: - Fix Version/s: 1.3.0 > Block DELETE SEGMENT BY ID for streaming table > -- > > Key: CARBONDATA-1612 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1612 > Project: CarbonData > Issue Type: Sub-task >Reporter: Jacky Li > Fix For: 1.3.0 > > > Streaming segment should be managed by carbon internally and it should not be > deleted by user -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1612) Block DELETE SEGMENT BY ID for streaming table
Jacky Li created CARBONDATA-1612: Summary: Block DELETE SEGMENT BY ID for streaming table Key: CARBONDATA-1612 URL: https://issues.apache.org/jira/browse/CARBONDATA-1612 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li Streaming segment should be managed by carbon internally and it should not be deleted by user -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1611) Block UPDATE/DELETE command for streaming table
Jacky Li created CARBONDATA-1611: Summary: Block UPDATE/DELETE command for streaming table Key: CARBONDATA-1611 URL: https://issues.apache.org/jira/browse/CARBONDATA-1611 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li Fix For: 1.3.0 In streaming table, row file format is used, which is not updatable. So UPDATE/DELETE command should be rejected -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1610) ALTER TABLE set streaming property
Jacky Li created CARBONDATA-1610: Summary: ALTER TABLE set streaming property Key: CARBONDATA-1610 URL: https://issues.apache.org/jira/browse/CARBONDATA-1610 Project: CarbonData Issue Type: Sub-task Reporter: Jacky Li Fix For: 1.3.0 For existing table, user should be able to use ALTER TABLE source SET TBLPROPERTIES('streaming'='true') to set the table property so that this table can be streaming ingested -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1435: [WIP]add data size and index size in table status fi...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1435 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1315/ ---
[GitHub] carbondata issue #1385: [CARBONDATA-1492] Alter add and remove struct Column...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1385 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1314/ ---
[GitHub] carbondata issue #1418: [WIP] Support db location
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1418 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1313/ ---
[GitHub] carbondata issue #1435: [WIP]add data size and index size in table status fi...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1435 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/678/ ---
[GitHub] carbondata issue #1385: [CARBONDATA-1492] Alter add and remove struct Column...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1385 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/677/ ---
[GitHub] carbondata issue #1418: [WIP] Support db location
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1418 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/676/ ---
[GitHub] carbondata issue #1435: [WIP]add data size and index size in table status fi...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1435 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1312/ ---
[GitHub] carbondata issue #1385: [CARBONDATA-1492] Alter add and remove struct Column...
Github user dhatchayani commented on the issue: https://github.com/apache/carbondata/pull/1385 Retest this please ---
[GitHub] carbondata issue #1435: [WIP]add data size and index size in table status fi...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1435 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/675/ ---
[GitHub] carbondata pull request #1435: [WIP]add data size and index size in tabl sta...
GitHub user akashrn5 opened a pull request: https://github.com/apache/carbondata/pull/1435 [WIP]add data size and index size in tabl status file Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[CARBONDATA-] Description of pull request` - [ ] Make sure to add PR description including - the root cause/problem statement - What is the implemented solution - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/akashrn5/incubator-carbondata file_size Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1435.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1435 commit 168722091558902200c20526cd19d5ab7d0c4c6f Author: akashrn5 Date: 2017-10-25T09:57:37Z add carbondata size and index size in table status file ---
[GitHub] carbondata issue #1434: [CARBONDATA-1593]Add partition to table cause NoSuch...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1434 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1311/ ---
[GitHub] carbondata issue #1434: [CARBONDATA-1593]Add partition to table cause NoSuch...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1434 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/674/ ---
[GitHub] carbondata issue #1434: [CARBONDATA-1593]Add partition to table cause NoSuch...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/1434 add to whitelist ---
[GitHub] carbondata issue #1434: [CARBONDATA-1593]Add partition to table cause NoSuch...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1434 Can one of the admins verify this patch? ---
[GitHub] carbondata issue #1434: [CARBONDATA-1593]Add partition to table cause NoSuch...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1434 Can one of the admins verify this patch? ---
[GitHub] carbondata pull request #1434: [CARBONDATA-1593]Add partition to table cause...
GitHub user 397090770 opened a pull request: https://github.com/apache/carbondata/pull/1434 [CARBONDATA-1593]Add partition to table cause NoSuchTableException `AlterTableSplitCarbonPartition`'s `processSchema` method doesn't provide db info to `catalog.refreshTable`, this will cause `NoSuchTableException` when we add partitions to carbondata table. See [CARBONDATA-1593](https://issues.apache.org/jira/browse/CARBONDATA-1593) Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[CARBONDATA-] Description of pull request` - [ ] Make sure to add PR description including - the root cause/problem statement - What is the implemented solution - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/397090770/carbondata CARBONDATA-1593 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1434.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1434 commit d10d3fea025e37351c6dd8852c6c8e5742c6a71e Author: wyp Date: 2017-10-25T08:43:47Z [CARBONDATA-1593]Add partition to table cause NoSuchTableException ---
[GitHub] carbondata issue #1432: [WIP][CARBONDATA-1608]Support Column Comment for Cre...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1432 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1310/ ---
[jira] [Updated] (CARBONDATA-1593) Add partition to table cause NoSuchTableException
[ https://issues.apache.org/jira/browse/CARBONDATA-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wyp updated CARBONDATA-1593: Description: When I run the following code snippet, I get NoSuchTableException: {code} scala> import org.apache.spark.sql.SparkSession scala> import org.apache.spark.sql.CarbonSession._ scala> val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://mycluster/user/wyp/carbon") scala> carbon.sql("CREATE TABLE temp.order(id bigint, order_no string,create_time timestamp) partitioned by (dt string) STORED BY 'carbondata' tblproperties('partition_type'='RANGE','RANGE_INFO'='2010,2011')") scala> carbon.sql("ALTER TABLE temp.order_common ADD PARTITION('2012')") org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'order_common' not found in database 'default'; at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:76) at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:76) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:76) at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:110) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:110) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95) at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:109) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:601) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:601) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95) at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:600) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:106) at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:69) at org.apache.spark.sql.hive.CarbonSessionCatalog.lookupRelation(CarbonSessionState.scala:83) at org.apache.spark.sql.internal.CatalogImpl.refreshTable(CatalogImpl.scala:461) at org.apache.spark.sql.execution.command.AlterTableSplitPartitionCommand.processSchema(carbonTableSchema.scala:283) at org.apache.spark.sql.execution.command.AlterTableSplitPartitionCommand.run(carbonTableSchema.scala:229) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87) at org.apache.spark.sql.Dataset.(Dataset.scala:185) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) ... 50 elided {code} but partition {{2012}} already add to table {{temp.order_common}}: {code} scala> carbon.sql("show partitions temp.order_common").show(100, 100) +--+ | partition| +--+ | 0, dt = DEFAULT| | 1, dt < 2010 | |2, 2010 <= dt < 2011 | |3, 2011 <= dt < 2012 | +--+ {code} My Spark version is 2.1.0, Carbondata is 1.2.0. was: When I run the following code snippet, I get NoSuchTableException: {code} scala> import org.apache.spark.sql.SparkSession scala> import org.apache.spark.sql.CarbonSession._ scala> val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://mycluster/user/wyp/carbon") scala> carbon.sql("CREATE TABLE temp.order(id bigint, order_no string,create_time timestamp) partitioned by (dt string) STORED BY 'carbondata' tblproperties('partition_type'='RANGE','
[jira] [Updated] (CARBONDATA-1593) Add partition to table cause NoSuchTableException
[ https://issues.apache.org/jira/browse/CARBONDATA-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wyp updated CARBONDATA-1593: Description: When I run the following code snippet, I get NoSuchTableException: {code} scala> import org.apache.spark.sql.SparkSession scala> import org.apache.spark.sql.CarbonSession._ scala> val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://mycluster/user/wyp/carbon") scala> carbon.sql("CREATE TABLE temp.order(id bigint, order_no string,create_time timestamp) partitioned by (dt string) STORED BY 'carbondata' tblproperties('partition_type'='RANGE','RANGE_INFO'='2010,2011')") scala> carbon.sql("ALTER TABLE temp.order_common ADD PARTITION('2012')") org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'order_common' not found in database 'default'; at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:76) at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:76) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:76) at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:110) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:110) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95) at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:109) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:601) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:601) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95) at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:600) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:106) at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:69) at org.apache.spark.sql.hive.CarbonSessionCatalog.lookupRelation(CarbonSessionState.scala:83) at org.apache.spark.sql.internal.CatalogImpl.refreshTable(CatalogImpl.scala:461) at org.apache.spark.sql.execution.command.AlterTableSplitPartitionCommand.processSchema(carbonTableSchema.scala:283) at org.apache.spark.sql.execution.command.AlterTableSplitPartitionCommand.run(carbonTableSchema.scala:229) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87) at org.apache.spark.sql.Dataset.(Dataset.scala:185) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) ... 50 elided {code} but partition {{2012}} already add to table {{temp.order_common}}: {code} scala> carbon.sql("show partitions temp.order_common").show(100, 100) +--+ | partition| +--+ | 0, dt = DEFAULT| | 1, dt < 2010 | |2, 2010 <= dt < 2011 | |3, 2011 <= dt < 2012 | +--+ My Spark version is 2.1.0, Carbondata is 1.2.0. {code} was: When I run the following code snippet, I get NoSuchTableException: {code} scala> import org.apache.spark.sql.SparkSession scala> import org.apache.spark.sql.CarbonSession._ scala> val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://mycluster/user/wyp/carbon") scala> carbon.sql("CREATE TABLE temp.order(id bigint, order_no string,create_time timestamp) partitioned by (dt string) STORED BY 'carbondata' tblproperties('partition_type'='RANGE','
[GitHub] carbondata issue #1418: [WIP] Support db location
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/1418 SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/1309/ ---
[GitHub] carbondata issue #1432: [WIP][CARBONDATA-1608]Support Column Comment for Cre...
Github user akashrn5 commented on the issue: https://github.com/apache/carbondata/pull/1432 @manishgupta88 please review ---
[GitHub] carbondata issue #1432: [WIP][CARBONDATA-1608]Support Column Comment for Cre...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/1432 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/673/ ---