[jira] [Created] (CARBONDATA-4278) Avoid refetching all indexes to get segment properties
Mahesh Raju Somalaraju created CARBONDATA-4278: -- Summary: Avoid refetching all indexes to get segment properties Key: CARBONDATA-4278 URL: https://issues.apache.org/jira/browse/CARBONDATA-4278 Project: CarbonData Issue Type: Bug Reporter: Mahesh Raju Somalaraju h1. Avoid refetching all indexes to get segment properties 1) When block index is available then no need to prepare blockindex from available segments and partition locations. 2) call directly getsegment properties if blockindex available. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (CARBONDATA-4273) Cannot create table with partitions in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Indhumathi updated CARBONDATA-4273: --- Comment: was deleted (was: Can you tell me, in which file environment you are facing this issue ? in Hadoop FileSystem or are you running this in local ?) > Cannot create table with partitions in Spark in EMR > --- > > Key: CARBONDATA-4273 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4273 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 2.2.0 > Environment: Release label:emr-5.24.1 > Hadoop distribution:Amazon 2.8.5 > Applications: > Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, > JupyterHub 0.9.6 > Jar complied with: > apache-carbondata:2.2.0 > spark:2.4.5 > hadoop:2.8.3 >Reporter: Bigicecream >Priority: Critical > Labels: EMR, spark > > > When trying to create a table like this: > {code:sql} > CREATE TABLE IF NOT EXISTS will_not_work( > timestamp string, > name string > ) > PARTITIONED BY (dt string, hr string) > STORED AS carbondata > LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work > {code} > I get the following error: > {noformat} > org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: > Partition is not supported for external table > at > org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219) > at > org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235) > at > org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394) > at > org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69) > at > org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137) > at > org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137) > at > org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118) > at > org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134) > at > org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) > at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363) > at org.apache.spark.sql.Dataset.(Dataset.scala:194) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643) > ... 64 elided > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4273) Cannot create table with partitions in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405273#comment-17405273 ] Indhumathi commented on CARBONDATA-4273: Can you tell me, in which file environment you are facing this issue ? in Hadoop FileSystem or are you running this in local ? > Cannot create table with partitions in Spark in EMR > --- > > Key: CARBONDATA-4273 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4273 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 2.2.0 > Environment: Release label:emr-5.24.1 > Hadoop distribution:Amazon 2.8.5 > Applications: > Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, > JupyterHub 0.9.6 > Jar complied with: > apache-carbondata:2.2.0 > spark:2.4.5 > hadoop:2.8.3 >Reporter: Bigicecream >Priority: Critical > Labels: EMR, spark > > > When trying to create a table like this: > {code:sql} > CREATE TABLE IF NOT EXISTS will_not_work( > timestamp string, > name string > ) > PARTITIONED BY (dt string, hr string) > STORED AS carbondata > LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work > {code} > I get the following error: > {noformat} > org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: > Partition is not supported for external table > at > org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219) > at > org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235) > at > org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394) > at > org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69) > at > org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137) > at > org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137) > at > org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118) > at > org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134) > at > org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) > at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363) > at org.apache.spark.sql.Dataset.(Dataset.scala:194) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643) > ... 64 elided > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-4180) create maintable and do insert before creation of si on maintable, then query on si column from presto does not hit SI
[ https://issues.apache.org/jira/browse/CARBONDATA-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahesh Raju Somalaraju closed CARBONDATA-4180. -- the same problem is working fine. So closing the JIRA > create maintable and do insert before creation of si on maintable, then query > on si column from presto does not hit SI > -- > > Key: CARBONDATA-4180 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4180 > Project: CarbonData > Issue Type: Bug >Reporter: Mahesh Raju Somalaraju >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > create maintable and do insert before creation of si on maintable, then query > on si column from presto does not hit SI > > steps: > 1) create maintable > 2) insert the data > 3) create SI > 4) query from presto on si column > Expectation: > It should hit SI table and fetch the results. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4273) Cannot create table with partitions in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405241#comment-17405241 ] Brijoo Bopanna commented on CARBONDATA-4273: Thanks for sharing this issue, we will check and reply > Cannot create table with partitions in Spark in EMR > --- > > Key: CARBONDATA-4273 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4273 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 2.2.0 > Environment: Release label:emr-5.24.1 > Hadoop distribution:Amazon 2.8.5 > Applications: > Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, > JupyterHub 0.9.6 > Jar complied with: > apache-carbondata:2.2.0 > spark:2.4.5 > hadoop:2.8.3 >Reporter: Bigicecream >Priority: Critical > Labels: EMR, spark > > > When trying to create a table like this: > {code:sql} > CREATE TABLE IF NOT EXISTS will_not_work( > timestamp string, > name string > ) > PARTITIONED BY (dt string, hr string) > STORED AS carbondata > LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work > {code} > I get the following error: > {noformat} > org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: > Partition is not supported for external table > at > org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219) > at > org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235) > at > org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394) > at > org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69) > at > org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137) > at > org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137) > at > org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118) > at > org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134) > at > org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) > at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363) > at org.apache.spark.sql.Dataset.(Dataset.scala:194) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643) > ... 64 elided > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4277) Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 2.2.0 (Spark 2.4.5 and Spark 3.1.1)
[ https://issues.apache.org/jira/browse/CARBONDATA-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PURUJIT CHAUGULE updated CARBONDATA-4277: - Priority: Major (was: Minor) > Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData > 2.2.0 (Spark 2.4.5 and Spark 3.1.1) > - > > Key: CARBONDATA-4277 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4277 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.2.0 > Environment: Spark 2.4.5 > Spark 3.1.1 >Reporter: PURUJIT CHAUGULE >Priority: Major > > > > *Issue 1 : Load on geospatial table from 2.1.0 table in 2.2.0(Spark 2.4.5 and > 3.1.1) is failing* > *STEPS:-* > # create table in CarbonData 2.1.0 : create table > source_index_2_1_0(TIMEVALUE BIGINT,LONGITUDE long,LATITUDE long) STORED AS > carbondata TBLPROPERTIES > ('SPATIAL_INDEX.mygeohash.type'='geohash','SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, > > latitude','SPATIAL_INDEX.mygeohash.originLatitude'='39.930753','SPATIAL_INDEX.mygeohash.gridSize'='50','SPATIAL_INDEX.mygeohash.minLongitude'='116.176090','SPATIAL_INDEX.mygeohash.maxLongitude'='116.736367','SPATIAL_INDEX.mygeohash.minLatitude'='39.930753','SPATIAL_INDEX.mygeohash.maxLatitude'='40.179415','SPATIAL_INDEX'='mygeohash','SPATIAL_INDEX.mygeohash.conversionRatio'='100'); > # LOAD DATA INPATH 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO > TABLE source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', > 'QUOTECHAR'='|'); > # Take store of table the place in hdfs of CarbonData 2.2.0(Spark 2.4.5 and > Spark 3.1.1) clusters > # refresh table source_index_2_1_0; > # 0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA INPATH > 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO TABLE > source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', 'QUOTECHAR'='|'); > Error: org.apache.hive.service.cli.HiveSQLException: Error running query: > java.lang.Exception: DataLoad failure: Data Loading failed for table > source_index_2_1_0 > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.Exception: DataLoad failure: Data Loading failed for > table source_index_2_1_0 > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:460) > at > org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.loadData(CarbonLoadDataCommand.scala:226) > at > org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:163) > at > org.apache.spark.sql.execution.command.AtomicRunnableCommand.$anonfun$run$3(package.scala:162) > at > org.apache.spark.sql.execution.command.Auditable.runWithAudit(package.scala:118) > at > org.apache.spark.sql.execution.command.Auditable.runWithAudit$(package.scala:114) > at > org.apache.spark.sql.execution.command.AtomicRunnableCommand.runWithAudit(package.scala:155) > at > org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:168) >
[jira] [Updated] (CARBONDATA-4277) Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 2.2.0 (Spark 2.4.5 and Spark 3.1.1)
[ https://issues.apache.org/jira/browse/CARBONDATA-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PURUJIT CHAUGULE updated CARBONDATA-4277: - Description: *Issue 1 : Load on geospatial table from 2.1.0 table in 2.2.0(Spark 2.4.5 and 3.1.1) is failing* *STEPS:-* # create table in CarbonData 2.1.0 : create table source_index_2_1_0(TIMEVALUE BIGINT,LONGITUDE long,LATITUDE long) STORED AS carbondata TBLPROPERTIES ('SPATIAL_INDEX.mygeohash.type'='geohash','SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude','SPATIAL_INDEX.mygeohash.originLatitude'='39.930753','SPATIAL_INDEX.mygeohash.gridSize'='50','SPATIAL_INDEX.mygeohash.minLongitude'='116.176090','SPATIAL_INDEX.mygeohash.maxLongitude'='116.736367','SPATIAL_INDEX.mygeohash.minLatitude'='39.930753','SPATIAL_INDEX.mygeohash.maxLatitude'='40.179415','SPATIAL_INDEX'='mygeohash','SPATIAL_INDEX.mygeohash.conversionRatio'='100'); # LOAD DATA INPATH 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO TABLE source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', 'QUOTECHAR'='|'); # Take store of table the place in hdfs of CarbonData 2.2.0(Spark 2.4.5 and Spark 3.1.1) clusters # refresh table source_index_2_1_0; # 0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA INPATH 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO TABLE source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', 'QUOTECHAR'='|'); Error: org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.Exception: DataLoad failure: Data Loading failed for table source_index_2_1_0 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.Exception: DataLoad failure: Data Loading failed for table source_index_2_1_0 at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:460) at org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.loadData(CarbonLoadDataCommand.scala:226) at org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:163) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.$anonfun$run$3(package.scala:162) at org.apache.spark.sql.execution.command.Auditable.runWithAudit(package.scala:118) at org.apache.spark.sql.execution.command.Auditable.runWithAudit$(package.scala:114) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.runWithAudit(package.scala:155) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:168) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at
[jira] [Updated] (CARBONDATA-4277) Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 2.2.0 (Spark 2.4.5 and Spark 3.1.1)
[ https://issues.apache.org/jira/browse/CARBONDATA-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PURUJIT CHAUGULE updated CARBONDATA-4277: - Summary: Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 2.2.0 (Spark 2.4.5 and Spark 3.1.1) (was: Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 2.2.0 (Spark 2.4.5 and Spark 3.1.1))) > Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData > 2.2.0 (Spark 2.4.5 and Spark 3.1.1) > - > > Key: CARBONDATA-4277 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4277 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.2.0 > Environment: Spark 2.4.5 > Spark 3.1.1 >Reporter: PURUJIT CHAUGULE >Priority: Minor > > > > *Issue 1 : Load on geo table from 2.1.0 table in 2.2.0(Spark 2.4.5 and 3.1.1) > is failing* > *STEPS:-* > # create table in CarbonData 2.1.0 : create table > source_index_2_1_0(TIMEVALUE BIGINT,LONGITUDE long,LATITUDE long) STORED AS > carbondata TBLPROPERTIES > ('SPATIAL_INDEX.mygeohash.type'='geohash','SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, > > latitude','SPATIAL_INDEX.mygeohash.originLatitude'='39.930753','SPATIAL_INDEX.mygeohash.gridSize'='50','SPATIAL_INDEX.mygeohash.minLongitude'='116.176090','SPATIAL_INDEX.mygeohash.maxLongitude'='116.736367','SPATIAL_INDEX.mygeohash.minLatitude'='39.930753','SPATIAL_INDEX.mygeohash.maxLatitude'='40.179415','SPATIAL_INDEX'='mygeohash','SPATIAL_INDEX.mygeohash.conversionRatio'='100'); > # LOAD DATA INPATH 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO > TABLE source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', > 'QUOTECHAR'='|'); > # Take store of table the place in hdfs of CarbonData 2.2.0(Spark 2.4.5 and > Spark 3.1.1) clusters > # refresh table source_index_2_1_0; > # 0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA INPATH > 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO TABLE > source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', 'QUOTECHAR'='|'); > Error: org.apache.hive.service.cli.HiveSQLException: Error running query: > java.lang.Exception: DataLoad failure: Data Loading failed for table > source_index_2_1_0 > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.Exception: DataLoad failure: Data Loading failed for > table source_index_2_1_0 > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:460) > at > org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.loadData(CarbonLoadDataCommand.scala:226) > at > org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:163) > at > org.apache.spark.sql.execution.command.AtomicRunnableCommand.$anonfun$run$3(package.scala:162) > at > org.apache.spark.sql.execution.command.Auditable.runWithAudit(package.scala:118) > at >
[jira] [Created] (CARBONDATA-4277) Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 2.2.0 (Spark 2.4.5 and Spark 3.1.1))
PURUJIT CHAUGULE created CARBONDATA-4277: Summary: Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 2.2.0 (Spark 2.4.5 and Spark 3.1.1)) Key: CARBONDATA-4277 URL: https://issues.apache.org/jira/browse/CARBONDATA-4277 Project: CarbonData Issue Type: Bug Affects Versions: 2.2.0 Environment: Spark 2.4.5 Spark 3.1.1 Reporter: PURUJIT CHAUGULE *Issue 1 : Load on geo table from 2.1.0 table in 2.2.0(Spark 2.4.5 and 3.1.1) is failing* *STEPS:-* # create table in CarbonData 2.1.0 : create table source_index_2_1_0(TIMEVALUE BIGINT,LONGITUDE long,LATITUDE long) STORED AS carbondata TBLPROPERTIES ('SPATIAL_INDEX.mygeohash.type'='geohash','SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude','SPATIAL_INDEX.mygeohash.originLatitude'='39.930753','SPATIAL_INDEX.mygeohash.gridSize'='50','SPATIAL_INDEX.mygeohash.minLongitude'='116.176090','SPATIAL_INDEX.mygeohash.maxLongitude'='116.736367','SPATIAL_INDEX.mygeohash.minLatitude'='39.930753','SPATIAL_INDEX.mygeohash.maxLatitude'='40.179415','SPATIAL_INDEX'='mygeohash','SPATIAL_INDEX.mygeohash.conversionRatio'='100'); # LOAD DATA INPATH 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO TABLE source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', 'QUOTECHAR'='|'); # Take store of table the place in hdfs of CarbonData 2.2.0(Spark 2.4.5 and Spark 3.1.1) clusters # refresh table source_index_2_1_0; # 0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA INPATH 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO TABLE source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', 'QUOTECHAR'='|'); Error: org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.Exception: DataLoad failure: Data Loading failed for table source_index_2_1_0 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.Exception: DataLoad failure: Data Loading failed for table source_index_2_1_0 at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:460) at org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.loadData(CarbonLoadDataCommand.scala:226) at org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:163) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.$anonfun$run$3(package.scala:162) at org.apache.spark.sql.execution.command.Auditable.runWithAudit(package.scala:118) at org.apache.spark.sql.execution.command.Auditable.runWithAudit$(package.scala:114) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.runWithAudit(package.scala:155) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:168) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228) at
[jira] [Resolved] (CARBONDATA-4234) Alter change datatype at nested levels
[ https://issues.apache.org/jira/browse/CARBONDATA-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Indhumathi resolved CARBONDATA-4234. Fix Version/s: 2.3.0 Resolution: Fixed > Alter change datatype at nested levels > -- > > Key: CARBONDATA-4234 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4234 > Project: CarbonData > Issue Type: Sub-task >Reporter: Akshay >Priority: Major > Fix For: 2.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4199) Support renaming of map columns including nested levels
[ https://issues.apache.org/jira/browse/CARBONDATA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Indhumathi resolved CARBONDATA-4199. Fix Version/s: 2.3.0 Resolution: Fixed > Support renaming of map columns including nested levels > --- > > Key: CARBONDATA-4199 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4199 > Project: CarbonData > Issue Type: Sub-task >Reporter: Akshay >Priority: Minor > Fix For: 2.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4198) Support adding of single-level and multi-level map columns
[ https://issues.apache.org/jira/browse/CARBONDATA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Indhumathi resolved CARBONDATA-4198. Fix Version/s: 2.3.0 Resolution: Fixed > Support adding of single-level and multi-level map columns > -- > > Key: CARBONDATA-4198 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4198 > Project: CarbonData > Issue Type: Sub-task >Reporter: Akshay >Priority: Minor > Fix For: 2.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4164) Support adding of multi-level complex columns(array/struct)
[ https://issues.apache.org/jira/browse/CARBONDATA-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Indhumathi resolved CARBONDATA-4164. Fix Version/s: 2.3.0 Resolution: Fixed > Support adding of multi-level complex columns(array/struct) > --- > > Key: CARBONDATA-4164 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4164 > Project: CarbonData > Issue Type: Sub-task > Components: spark-integration >Reporter: Akshay >Priority: Major > Fix For: 2.3.0 > > Time Spent: 8h > Remaining Estimate: 0h > > Add multi-level(upto 3 nested levels) complex columns(only array and struct) > to carbon table. For example - > Command - > ALTER TABLE ADD COLUMNS(arr array >) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4276) writestream fail when csv is copied to readstream hdfs path in Spark 2.4.5
[ https://issues.apache.org/jira/browse/CARBONDATA-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-4276: Description: *With Carbon 2.2.0 Spark 2.4.5 cluster* *steps :* *+In hdfs execute following command :+* cd /opt/HA/C10/install/hadoop/datanode/bin/ ./hdfs dfs -rm -r /tmp/stream_test/checkpoint_all_data ./hdfs dfs -mkdir -p /tmp/stream_test/\{checkpoint_all_data,bad_records_all_data} ./hdfs dfs -mkdir -p /Priyesh/streaming/csv/ ./hdfs dfs -cp /chetan/100_olap_C20.csv /Priyesh/streaming/csv/ ./hdfs dfs -cp /Priyesh/streaming/csv/100_olap_C20.csv /Priyesh/streaming/csv/100_olap_C21.csv *+From Spark-beeline /Spark-sql /Spark-shell, execute :+* DROP TABLE IF EXISTS all_datatypes_2048; create table all_datatypes_2048 (imei string,deviceInformationId int,MAC string,deviceColor string,device_backColor string,modelId string,marketName string,AMSize string,ROMSize string,CUPAudit string,CPIClocked string,series string,productionDate timestamp,bomCode string,internalModels string, deliveryTime string, channelsId string, channelsName string , deliveryAreaId string, deliveryCountry string, deliveryProvince string, deliveryCity string,deliveryDistrict string, deliveryStreet string, oxSingleNumber string, ActiveCheckTime string, ActiveAreaId string, ActiveCountry string, ActiveProvince string, Activecity string, ActiveDistrict string, ActiveStreet string, ActiveOperatorId string, Active_releaseId string, Active_EMUIVersion string, Active_operaSysVersion string, Active_BacVerNumber string, Active_BacFlashVer string, Active_webUIVersion string, Active_webUITypeCarrVer string,Active_webTypeDataVerNumber string, Active_operatorsVersion string, Active_phonePADPartitionedVersions string, Latest_YEAR int, Latest_MONTH int, Latest_DAY Decimal(30,10), Latest_HOUR string, Latest_areaId string, Latest_country string, Latest_province string, Latest_city string, Latest_district string, Latest_street string, Latest_releaseId string, Latest_EMUIVersion string, Latest_operaSysVersion string, Latest_BacVerNumber string, Latest_BacFlashVer string, Latest_webUIVersion string, Latest_webUITypeCarrVer string, Latest_webTypeDataVerNumber string, Latest_operatorsVersion string, Latest_phonePADPartitionedVersions string, Latest_operatorId string, gamePointDescription string,gamePointId double,contractNumber BigInt) stored as carbondata TBLPROPERTIES('table_blocksize'='2048','streaming'='true', 'sort_columns'='imei'); *+From Spark-shell ,execute :+* import org.apache.spark.sql.streaming._ import org.apache.spark.sql.streaming.Trigger.ProcessingTime val df_j=spark.readStream.text("hdfs://hacluster/Priyesh/streaming/csv/*.csv") df_j.writeStream.format("carbondata").option("dbName","ranjan").option("carbon.stream.parser","org.apache.carbondata.streaming.parser.CSVStreamParserImp").option("checkpointLocation", "hdfs://hacluster/tmp/stream_test/checkpoint_all_data").option("bad_records_action","hdfs://hacluster/tmp/stream_test/bad_records_all_data").option("tableName","all_datatypes_2048").trigger(ProcessingTime(6000)).option("carbon.streaming.auto.handoff.enabled","true").option("carbon.streaming.segment.max.size",102400).start show segments for table all_datatypes_2048; *issue 1 :* *+when copy csv file in hdfs folder for 1st time after streaming started ,writestream fails with error:+* scala> df_j.writeStream.format("carbondata").option("dbName","ranjan").option("carbon.stream.parser","org.apache.carbondata.streaming.parser.CSVStreamParserImp").option("checkpointLocation", "hdfs://hacluster/tmp/stream_test/checkpoint_all_data").option("bad_records_action","hdfs://hacluster/tmp/stream_test/bad_records_all_data").option("tableName","all_datatypes_2048").trigger(ProcessingTime(6000)).option("carbon.streaming.auto.handoff.enabled","true").option("carbon.streaming.segment.max.size",102400).start 21/08/26 12:53:11 WARN CarbonProperties: The enable mv value "null" is invalid. Using the default value "true" 21/08/26 12:53:11 WARN CarbonProperties: The value "LOCALLOCK" configured for key carbon.lock.type is invalid for current file system. Use the default value HDFSLOCK instead. 21/08/26 12:53:12 WARN HiveConf: HiveConf of name hive.metastore.rdb.password.decode.enable does not exist 21/08/26 12:53:12 WARN HiveConf: HiveConf of name hive.metastore.db.ssl.enabled does not exist 21/08/26 12:53:13 WARN HiveConf: HiveConf of name hive.metastore.rdb.password.decode.enable does not exist 21/08/26 12:53:13 WARN HiveConf: HiveConf of name hive.metastore.db.ssl.enabled does not exist 21/08/26 12:53:14 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException res0: org.apache.spark.sql.streaming.StreamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@ad038f8 scala> 21/08/26 13:00:49 WARN DFSClient: DataStreamer Exception
[jira] [Updated] (CARBONDATA-4276) writestream fail when csv is copied to readstream hdfs path in Spark 2.4.5
[ https://issues.apache.org/jira/browse/CARBONDATA-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-4276: Summary: writestream fail when csv is copied to readstream hdfs path in Spark 2.4.5 (was: writestream fail when csv is copied to readstream hdfs path) > writestream fail when csv is copied to readstream hdfs path in Spark 2.4.5 > -- > > Key: CARBONDATA-4276 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4276 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 2.2.0 > Environment: Spark 2.4.5 >Reporter: PRIYESH RANJAN >Priority: Minor > > *steps :* > *+In hdfs execute following command :+* > cd /opt/HA/C10/install/hadoop/datanode/bin/ > ./hdfs dfs -rm -r /tmp/stream_test/checkpoint_all_data > ./hdfs dfs -mkdir -p > /tmp/stream_test/\{checkpoint_all_data,bad_records_all_data} > ./hdfs dfs -mkdir -p /Priyesh/streaming/csv/ > ./hdfs dfs -cp /chetan/100_olap_C20.csv /Priyesh/streaming/csv/ > ./hdfs dfs -cp /Priyesh/streaming/csv/100_olap_C20.csv > /Priyesh/streaming/csv/100_olap_C21.csv > > *+From Spark-beeline /Spark-sql /Spark-shell, execute :+* > DROP TABLE IF EXISTS all_datatypes_2048; > create table all_datatypes_2048 (imei string,deviceInformationId int,MAC > string,deviceColor string,device_backColor string,modelId string,marketName > string,AMSize string,ROMSize string,CUPAudit string,CPIClocked string,series > string,productionDate timestamp,bomCode string,internalModels string, > deliveryTime string, channelsId string, channelsName string , deliveryAreaId > string, deliveryCountry string, deliveryProvince string, deliveryCity > string,deliveryDistrict string, deliveryStreet string, oxSingleNumber string, > ActiveCheckTime string, ActiveAreaId string, ActiveCountry string, > ActiveProvince string, Activecity string, ActiveDistrict string, ActiveStreet > string, ActiveOperatorId string, Active_releaseId string, Active_EMUIVersion > string, Active_operaSysVersion string, Active_BacVerNumber string, > Active_BacFlashVer string, Active_webUIVersion string, > Active_webUITypeCarrVer string,Active_webTypeDataVerNumber string, > Active_operatorsVersion string, Active_phonePADPartitionedVersions string, > Latest_YEAR int, Latest_MONTH int, Latest_DAY Decimal(30,10), Latest_HOUR > string, Latest_areaId string, Latest_country string, Latest_province string, > Latest_city string, Latest_district string, Latest_street string, > Latest_releaseId string, Latest_EMUIVersion string, Latest_operaSysVersion > string, Latest_BacVerNumber string, Latest_BacFlashVer string, > Latest_webUIVersion string, Latest_webUITypeCarrVer string, > Latest_webTypeDataVerNumber string, Latest_operatorsVersion string, > Latest_phonePADPartitionedVersions string, Latest_operatorId string, > gamePointDescription string,gamePointId double,contractNumber BigInt) stored > as carbondata TBLPROPERTIES('table_blocksize'='2048','streaming'='true', > 'sort_columns'='imei'); > > *+From Spark-shell ,execute :+* > import org.apache.spark.sql.streaming._ > import org.apache.spark.sql.streaming.Trigger.ProcessingTime > val df_j=spark.readStream.text("hdfs://hacluster/Priyesh/streaming/csv/*.csv") > df_j.writeStream.format("carbondata").option("dbName","ranjan").option("carbon.stream.parser","org.apache.carbondata.streaming.parser.CSVStreamParserImp").option("checkpointLocation", > > "hdfs://hacluster/tmp/stream_test/checkpoint_all_data").option("bad_records_action","hdfs://hacluster/tmp/stream_test/bad_records_all_data").option("tableName","all_datatypes_2048").trigger(ProcessingTime(6000)).option("carbon.streaming.auto.handoff.enabled","true").option("carbon.streaming.segment.max.size",102400).start > show segments for table all_datatypes_2048; > > *issue 1 :* > *+when copy csv file in hdfs folder for 1st time after streaming started > ,writestream fails with error:+* > scala> > df_j.writeStream.format("carbondata").option("dbName","ranjan").option("carbon.stream.parser","org.apache.carbondata.streaming.parser.CSVStreamParserImp").option("checkpointLocation", > > "hdfs://hacluster/tmp/stream_test/checkpoint_all_data").option("bad_records_action","hdfs://hacluster/tmp/stream_test/bad_records_all_data").option("tableName","all_datatypes_2048").trigger(ProcessingTime(6000)).option("carbon.streaming.auto.handoff.enabled","true").option("carbon.streaming.segment.max.size",102400).start > 21/08/26 12:53:11 WARN CarbonProperties: The enable mv value "null" is > invalid. Using the default value "true" > 21/08/26 12:53:11 WARN CarbonProperties: The value "LOCALLOCK" configured for > key carbon.lock.type is invalid for current file system. Use the default > value HDFSLOCK instead. > 21/08/26 12:53:12
[jira] [Created] (CARBONDATA-4276) writestream fail when csv is copied to readstream hdfs path
PRIYESH RANJAN created CARBONDATA-4276: -- Summary: writestream fail when csv is copied to readstream hdfs path Key: CARBONDATA-4276 URL: https://issues.apache.org/jira/browse/CARBONDATA-4276 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 2.2.0 Environment: Spark 2.4.5 Reporter: PRIYESH RANJAN *steps :* *+In hdfs execute following command :+* cd /opt/HA/C10/install/hadoop/datanode/bin/ ./hdfs dfs -rm -r /tmp/stream_test/checkpoint_all_data ./hdfs dfs -mkdir -p /tmp/stream_test/\{checkpoint_all_data,bad_records_all_data} ./hdfs dfs -mkdir -p /Priyesh/streaming/csv/ ./hdfs dfs -cp /chetan/100_olap_C20.csv /Priyesh/streaming/csv/ ./hdfs dfs -cp /Priyesh/streaming/csv/100_olap_C20.csv /Priyesh/streaming/csv/100_olap_C21.csv *+From Spark-beeline /Spark-sql /Spark-shell, execute :+* DROP TABLE IF EXISTS all_datatypes_2048; create table all_datatypes_2048 (imei string,deviceInformationId int,MAC string,deviceColor string,device_backColor string,modelId string,marketName string,AMSize string,ROMSize string,CUPAudit string,CPIClocked string,series string,productionDate timestamp,bomCode string,internalModels string, deliveryTime string, channelsId string, channelsName string , deliveryAreaId string, deliveryCountry string, deliveryProvince string, deliveryCity string,deliveryDistrict string, deliveryStreet string, oxSingleNumber string, ActiveCheckTime string, ActiveAreaId string, ActiveCountry string, ActiveProvince string, Activecity string, ActiveDistrict string, ActiveStreet string, ActiveOperatorId string, Active_releaseId string, Active_EMUIVersion string, Active_operaSysVersion string, Active_BacVerNumber string, Active_BacFlashVer string, Active_webUIVersion string, Active_webUITypeCarrVer string,Active_webTypeDataVerNumber string, Active_operatorsVersion string, Active_phonePADPartitionedVersions string, Latest_YEAR int, Latest_MONTH int, Latest_DAY Decimal(30,10), Latest_HOUR string, Latest_areaId string, Latest_country string, Latest_province string, Latest_city string, Latest_district string, Latest_street string, Latest_releaseId string, Latest_EMUIVersion string, Latest_operaSysVersion string, Latest_BacVerNumber string, Latest_BacFlashVer string, Latest_webUIVersion string, Latest_webUITypeCarrVer string, Latest_webTypeDataVerNumber string, Latest_operatorsVersion string, Latest_phonePADPartitionedVersions string, Latest_operatorId string, gamePointDescription string,gamePointId double,contractNumber BigInt) stored as carbondata TBLPROPERTIES('table_blocksize'='2048','streaming'='true', 'sort_columns'='imei'); *+From Spark-shell ,execute :+* import org.apache.spark.sql.streaming._ import org.apache.spark.sql.streaming.Trigger.ProcessingTime val df_j=spark.readStream.text("hdfs://hacluster/Priyesh/streaming/csv/*.csv") df_j.writeStream.format("carbondata").option("dbName","ranjan").option("carbon.stream.parser","org.apache.carbondata.streaming.parser.CSVStreamParserImp").option("checkpointLocation", "hdfs://hacluster/tmp/stream_test/checkpoint_all_data").option("bad_records_action","hdfs://hacluster/tmp/stream_test/bad_records_all_data").option("tableName","all_datatypes_2048").trigger(ProcessingTime(6000)).option("carbon.streaming.auto.handoff.enabled","true").option("carbon.streaming.segment.max.size",102400).start show segments for table all_datatypes_2048; *issue 1 :* *+when copy csv file in hdfs folder for 1st time after streaming started ,writestream fails with error:+* scala> df_j.writeStream.format("carbondata").option("dbName","ranjan").option("carbon.stream.parser","org.apache.carbondata.streaming.parser.CSVStreamParserImp").option("checkpointLocation", "hdfs://hacluster/tmp/stream_test/checkpoint_all_data").option("bad_records_action","hdfs://hacluster/tmp/stream_test/bad_records_all_data").option("tableName","all_datatypes_2048").trigger(ProcessingTime(6000)).option("carbon.streaming.auto.handoff.enabled","true").option("carbon.streaming.segment.max.size",102400).start 21/08/26 12:53:11 WARN CarbonProperties: The enable mv value "null" is invalid. Using the default value "true" 21/08/26 12:53:11 WARN CarbonProperties: The value "LOCALLOCK" configured for key carbon.lock.type is invalid for current file system. Use the default value HDFSLOCK instead. 21/08/26 12:53:12 WARN HiveConf: HiveConf of name hive.metastore.rdb.password.decode.enable does not exist 21/08/26 12:53:12 WARN HiveConf: HiveConf of name hive.metastore.db.ssl.enabled does not exist 21/08/26 12:53:13 WARN HiveConf: HiveConf of name hive.metastore.rdb.password.decode.enable does not exist 21/08/26 12:53:13 WARN HiveConf: HiveConf of name hive.metastore.db.ssl.enabled does not exist 21/08/26 12:53:14 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException