[jira] [Created] (CARBONDATA-4278) Avoid refetching all indexes to get segment properties

2021-08-26 Thread Mahesh Raju Somalaraju (Jira)
Mahesh Raju Somalaraju created CARBONDATA-4278:
--

 Summary: Avoid refetching all indexes to get segment properties
 Key: CARBONDATA-4278
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4278
 Project: CarbonData
  Issue Type: Bug
Reporter: Mahesh Raju Somalaraju


h1. Avoid refetching all indexes to get segment properties

 

1) When block index is available then no need to prepare blockindex from 
available segments and partition locations.

2) call directly getsegment properties if blockindex available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (CARBONDATA-4273) Cannot create table with partitions in Spark in EMR

2021-08-26 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi updated CARBONDATA-4273:
---
Comment: was deleted

(was: Can you tell me, in which file environment you are facing this issue ? in 
Hadoop FileSystem or are you running this in local ?)

> Cannot create table with partitions in Spark in EMR
> ---
>
> Key: CARBONDATA-4273
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4273
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.2.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, 
> JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.2.0
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Critical
>  Labels: EMR, spark
>
>  
> When trying to create a table like this:
> {code:sql}
> CREATE TABLE IF NOT EXISTS will_not_work(
> timestamp string,
> name string
> )
> PARTITIONED BY (dt string, hr string)
> STORED AS carbondata
> LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work
> {code}
> I get the following error:
> {noformat}
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
> Partition is not supported for external table
>   at 
> org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394)
>   at 
> org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
>   ... 64 elided
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4273) Cannot create table with partitions in Spark in EMR

2021-08-26 Thread Indhumathi (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405273#comment-17405273
 ] 

Indhumathi commented on CARBONDATA-4273:


Can you tell me, in which file environment you are facing this issue ? in 
Hadoop FileSystem or are you running this in local ?

> Cannot create table with partitions in Spark in EMR
> ---
>
> Key: CARBONDATA-4273
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4273
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.2.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, 
> JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.2.0
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Critical
>  Labels: EMR, spark
>
>  
> When trying to create a table like this:
> {code:sql}
> CREATE TABLE IF NOT EXISTS will_not_work(
> timestamp string,
> name string
> )
> PARTITIONED BY (dt string, hr string)
> STORED AS carbondata
> LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work
> {code}
> I get the following error:
> {noformat}
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
> Partition is not supported for external table
>   at 
> org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394)
>   at 
> org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
>   ... 64 elided
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-4180) create maintable and do insert before creation of si on maintable, then query on si column from presto does not hit SI

2021-08-26 Thread Mahesh Raju Somalaraju (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahesh Raju Somalaraju closed CARBONDATA-4180.
--

the same problem is working fine. So closing the JIRA

> create maintable and do insert before creation of si on maintable, then query 
> on si column from presto does not hit SI
> --
>
> Key: CARBONDATA-4180
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4180
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> create maintable and do insert before creation of si on maintable, then query 
> on si column from presto does not hit SI
>  
> steps:
> 1) create maintable
> 2) insert the data
> 3) create SI
> 4) query from presto on si column
> Expectation:
> It should hit SI table and fetch the results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4273) Cannot create table with partitions in Spark in EMR

2021-08-26 Thread Brijoo Bopanna (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405241#comment-17405241
 ] 

Brijoo Bopanna commented on CARBONDATA-4273:


Thanks for sharing this issue,  we will check and reply

> Cannot create table with partitions in Spark in EMR
> ---
>
> Key: CARBONDATA-4273
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4273
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.2.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, 
> JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.2.0
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Critical
>  Labels: EMR, spark
>
>  
> When trying to create a table like this:
> {code:sql}
> CREATE TABLE IF NOT EXISTS will_not_work(
> timestamp string,
> name string
> )
> PARTITIONED BY (dt string, hr string)
> STORED AS carbondata
> LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work
> {code}
> I get the following error:
> {noformat}
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
> Partition is not supported for external table
>   at 
> org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394)
>   at 
> org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
>   ... 64 elided
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4277) Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 2.2.0 (Spark 2.4.5 and Spark 3.1.1)

2021-08-26 Thread PURUJIT CHAUGULE (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PURUJIT CHAUGULE updated CARBONDATA-4277:
-
Priority: Major  (was: Minor)

> Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 
> 2.2.0 (Spark 2.4.5 and Spark 3.1.1)
> -
>
> Key: CARBONDATA-4277
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4277
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.2.0
> Environment: Spark 2.4.5
> Spark 3.1.1
>Reporter: PURUJIT CHAUGULE
>Priority: Major
>
>  
>  
> *Issue 1 : Load on geospatial table from 2.1.0 table in 2.2.0(Spark 2.4.5 and 
> 3.1.1) is failing*
> *STEPS:-*
>  # create table in CarbonData 2.1.0 : create table 
> source_index_2_1_0(TIMEVALUE BIGINT,LONGITUDE long,LATITUDE long) STORED AS 
> carbondata TBLPROPERTIES 
> ('SPATIAL_INDEX.mygeohash.type'='geohash','SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude,
>  
> latitude','SPATIAL_INDEX.mygeohash.originLatitude'='39.930753','SPATIAL_INDEX.mygeohash.gridSize'='50','SPATIAL_INDEX.mygeohash.minLongitude'='116.176090','SPATIAL_INDEX.mygeohash.maxLongitude'='116.736367','SPATIAL_INDEX.mygeohash.minLatitude'='39.930753','SPATIAL_INDEX.mygeohash.maxLatitude'='40.179415','SPATIAL_INDEX'='mygeohash','SPATIAL_INDEX.mygeohash.conversionRatio'='100');
>  # LOAD DATA INPATH 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO 
> TABLE source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', 
> 'QUOTECHAR'='|');
>  # Take store of table the place in hdfs of CarbonData 2.2.0(Spark 2.4.5 and 
> Spark 3.1.1)  clusters
>  # refresh table source_index_2_1_0;
>  # 0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA INPATH 
> 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO TABLE 
> source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', 'QUOTECHAR'='|');
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> java.lang.Exception: DataLoad failure: Data Loading failed for table 
> source_index_2_1_0
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: java.lang.Exception: DataLoad failure: Data Loading failed for 
> table source_index_2_1_0
>  at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:460)
>  at 
> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.loadData(CarbonLoadDataCommand.scala:226)
>  at 
> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:163)
>  at 
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.$anonfun$run$3(package.scala:162)
>  at 
> org.apache.spark.sql.execution.command.Auditable.runWithAudit(package.scala:118)
>  at 
> org.apache.spark.sql.execution.command.Auditable.runWithAudit$(package.scala:114)
>  at 
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.runWithAudit(package.scala:155)
>  at 
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:168)
> 

[jira] [Updated] (CARBONDATA-4277) Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 2.2.0 (Spark 2.4.5 and Spark 3.1.1)

2021-08-26 Thread PURUJIT CHAUGULE (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PURUJIT CHAUGULE updated CARBONDATA-4277:
-
Description: 
 

 

*Issue 1 : Load on geospatial table from 2.1.0 table in 2.2.0(Spark 2.4.5 and 
3.1.1) is failing*

*STEPS:-*
 # create table in CarbonData 2.1.0 : create table source_index_2_1_0(TIMEVALUE 
BIGINT,LONGITUDE long,LATITUDE long) STORED AS carbondata TBLPROPERTIES 
('SPATIAL_INDEX.mygeohash.type'='geohash','SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude,
 
latitude','SPATIAL_INDEX.mygeohash.originLatitude'='39.930753','SPATIAL_INDEX.mygeohash.gridSize'='50','SPATIAL_INDEX.mygeohash.minLongitude'='116.176090','SPATIAL_INDEX.mygeohash.maxLongitude'='116.736367','SPATIAL_INDEX.mygeohash.minLatitude'='39.930753','SPATIAL_INDEX.mygeohash.maxLatitude'='40.179415','SPATIAL_INDEX'='mygeohash','SPATIAL_INDEX.mygeohash.conversionRatio'='100');
 # LOAD DATA INPATH 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO 
TABLE source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', 
'QUOTECHAR'='|');
 # Take store of table the place in hdfs of CarbonData 2.2.0(Spark 2.4.5 and 
Spark 3.1.1)  clusters
 # refresh table source_index_2_1_0;
 # 0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA INPATH 
'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO TABLE source_index_2_1_0 
OPTIONS('HEADER'='true','DELIMITER'='|', 'QUOTECHAR'='|');

Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
java.lang.Exception: DataLoad failure: Data Loading failed for table 
source_index_2_1_0
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 at 
org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
 at 
org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Caused by: java.lang.Exception: DataLoad failure: Data Loading failed for 
table source_index_2_1_0
 at 
org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:460)
 at 
org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.loadData(CarbonLoadDataCommand.scala:226)
 at 
org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:163)
 at 
org.apache.spark.sql.execution.command.AtomicRunnableCommand.$anonfun$run$3(package.scala:162)
 at 
org.apache.spark.sql.execution.command.Auditable.runWithAudit(package.scala:118)
 at 
org.apache.spark.sql.execution.command.Auditable.runWithAudit$(package.scala:114)
 at 
org.apache.spark.sql.execution.command.AtomicRunnableCommand.runWithAudit(package.scala:155)
 at 
org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:168)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
 at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
 at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
 at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
 at 

[jira] [Updated] (CARBONDATA-4277) Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 2.2.0 (Spark 2.4.5 and Spark 3.1.1)

2021-08-26 Thread PURUJIT CHAUGULE (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PURUJIT CHAUGULE updated CARBONDATA-4277:
-
Summary: Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in 
CarbonData 2.2.0 (Spark 2.4.5 and Spark 3.1.1)  (was: Compatibility Issue of 
GeoSpatial table of CarbonData 2.1.0 in CarbonData 2.2.0 (Spark 2.4.5 and Spark 
3.1.1)))

> Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 
> 2.2.0 (Spark 2.4.5 and Spark 3.1.1)
> -
>
> Key: CARBONDATA-4277
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4277
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.2.0
> Environment: Spark 2.4.5
> Spark 3.1.1
>Reporter: PURUJIT CHAUGULE
>Priority: Minor
>
>  
>  
> *Issue 1 : Load on geo table from 2.1.0 table in 2.2.0(Spark 2.4.5 and 3.1.1) 
> is failing*
> *STEPS:-*
>  # create table in CarbonData 2.1.0 : create table 
> source_index_2_1_0(TIMEVALUE BIGINT,LONGITUDE long,LATITUDE long) STORED AS 
> carbondata TBLPROPERTIES 
> ('SPATIAL_INDEX.mygeohash.type'='geohash','SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude,
>  
> latitude','SPATIAL_INDEX.mygeohash.originLatitude'='39.930753','SPATIAL_INDEX.mygeohash.gridSize'='50','SPATIAL_INDEX.mygeohash.minLongitude'='116.176090','SPATIAL_INDEX.mygeohash.maxLongitude'='116.736367','SPATIAL_INDEX.mygeohash.minLatitude'='39.930753','SPATIAL_INDEX.mygeohash.maxLatitude'='40.179415','SPATIAL_INDEX'='mygeohash','SPATIAL_INDEX.mygeohash.conversionRatio'='100');
>  # LOAD DATA INPATH 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO 
> TABLE source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', 
> 'QUOTECHAR'='|');
>  # Take store of table the place in hdfs of CarbonData 2.2.0(Spark 2.4.5 and 
> Spark 3.1.1)  clusters
>  # refresh table source_index_2_1_0;
>  # 0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA INPATH 
> 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO TABLE 
> source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', 'QUOTECHAR'='|');
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> java.lang.Exception: DataLoad failure: Data Loading failed for table 
> source_index_2_1_0
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.Exception: DataLoad failure: Data Loading failed for 
> table source_index_2_1_0
>  at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:460)
>  at 
> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.loadData(CarbonLoadDataCommand.scala:226)
>  at 
> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:163)
>  at 
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.$anonfun$run$3(package.scala:162)
>  at 
> org.apache.spark.sql.execution.command.Auditable.runWithAudit(package.scala:118)
>  at 
> 

[jira] [Created] (CARBONDATA-4277) Compatibility Issue of GeoSpatial table of CarbonData 2.1.0 in CarbonData 2.2.0 (Spark 2.4.5 and Spark 3.1.1))

2021-08-26 Thread PURUJIT CHAUGULE (Jira)
PURUJIT CHAUGULE created CARBONDATA-4277:


 Summary: Compatibility Issue of GeoSpatial table of CarbonData 
2.1.0 in CarbonData 2.2.0 (Spark 2.4.5 and Spark 3.1.1))
 Key: CARBONDATA-4277
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4277
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: Spark 2.4.5
Spark 3.1.1
Reporter: PURUJIT CHAUGULE


 

 

*Issue 1 : Load on geo table from 2.1.0 table in 2.2.0(Spark 2.4.5 and 3.1.1) 
is failing*

*STEPS:-*
 # create table in CarbonData 2.1.0 : create table source_index_2_1_0(TIMEVALUE 
BIGINT,LONGITUDE long,LATITUDE long) STORED AS carbondata TBLPROPERTIES 
('SPATIAL_INDEX.mygeohash.type'='geohash','SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude,
 
latitude','SPATIAL_INDEX.mygeohash.originLatitude'='39.930753','SPATIAL_INDEX.mygeohash.gridSize'='50','SPATIAL_INDEX.mygeohash.minLongitude'='116.176090','SPATIAL_INDEX.mygeohash.maxLongitude'='116.736367','SPATIAL_INDEX.mygeohash.minLatitude'='39.930753','SPATIAL_INDEX.mygeohash.maxLatitude'='40.179415','SPATIAL_INDEX'='mygeohash','SPATIAL_INDEX.mygeohash.conversionRatio'='100');
 # LOAD DATA INPATH 'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO 
TABLE source_index_2_1_0 OPTIONS('HEADER'='true','DELIMITER'='|', 
'QUOTECHAR'='|');
 # Take store of table the place in hdfs of CarbonData 2.2.0(Spark 2.4.5 and 
Spark 3.1.1)  clusters
 # refresh table source_index_2_1_0;
 # 0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA INPATH 
'hdfs://hacluster/chetan/f_lcov_50basic_data.csv' INTO TABLE source_index_2_1_0 
OPTIONS('HEADER'='true','DELIMITER'='|', 'QUOTECHAR'='|');

Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
java.lang.Exception: DataLoad failure: Data Loading failed for table 
source_index_2_1_0
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
 at 
org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
 at 
org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
 at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: DataLoad failure: Data Loading failed for table 
source_index_2_1_0
 at 
org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:460)
 at 
org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.loadData(CarbonLoadDataCommand.scala:226)
 at 
org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:163)
 at 
org.apache.spark.sql.execution.command.AtomicRunnableCommand.$anonfun$run$3(package.scala:162)
 at 
org.apache.spark.sql.execution.command.Auditable.runWithAudit(package.scala:118)
 at 
org.apache.spark.sql.execution.command.Auditable.runWithAudit$(package.scala:114)
 at 
org.apache.spark.sql.execution.command.AtomicRunnableCommand.runWithAudit(package.scala:155)
 at 
org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:168)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
 at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
 at 

[jira] [Resolved] (CARBONDATA-4234) Alter change datatype at nested levels

2021-08-26 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4234.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Alter change datatype at nested levels
> --
>
> Key: CARBONDATA-4234
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4234
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Akshay
>Priority: Major
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4199) Support renaming of map columns including nested levels

2021-08-26 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4199.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Support renaming of map columns including nested levels
> ---
>
> Key: CARBONDATA-4199
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4199
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Akshay
>Priority: Minor
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4198) Support adding of single-level and multi-level map columns

2021-08-26 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4198.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Support adding of single-level and multi-level map columns
> --
>
> Key: CARBONDATA-4198
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4198
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Akshay
>Priority: Minor
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4164) Support adding of multi-level complex columns(array/struct)

2021-08-26 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4164.

Fix Version/s: 2.3.0
   Resolution: Fixed

> Support adding of multi-level complex columns(array/struct)
> ---
>
> Key: CARBONDATA-4164
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4164
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Akshay
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Add multi-level(upto 3 nested levels) complex columns(only array and struct) 
> to carbon table. For example - 
> Command - 
> ALTER TABLE  ADD COLUMNS(arr array >)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4276) writestream fail when csv is copied to readstream hdfs path in Spark 2.4.5

2021-08-26 Thread Chetan Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated CARBONDATA-4276:

Description: 
*With Carbon 2.2.0 Spark 2.4.5 cluster* 

*steps :*

*+In hdfs execute following command :+*

 cd /opt/HA/C10/install/hadoop/datanode/bin/
 ./hdfs dfs -rm -r /tmp/stream_test/checkpoint_all_data
 ./hdfs dfs -mkdir -p 
/tmp/stream_test/\{checkpoint_all_data,bad_records_all_data}
 ./hdfs dfs -mkdir -p /Priyesh/streaming/csv/
 ./hdfs dfs -cp /chetan/100_olap_C20.csv /Priyesh/streaming/csv/

./hdfs dfs -cp /Priyesh/streaming/csv/100_olap_C20.csv 
/Priyesh/streaming/csv/100_olap_C21.csv

 

*+From Spark-beeline /Spark-sql /Spark-shell, execute :+*

DROP TABLE IF EXISTS all_datatypes_2048;
 create table all_datatypes_2048 (imei string,deviceInformationId int,MAC 
string,deviceColor string,device_backColor string,modelId string,marketName 
string,AMSize string,ROMSize string,CUPAudit string,CPIClocked string,series 
string,productionDate timestamp,bomCode string,internalModels string, 
deliveryTime string, channelsId string, channelsName string , deliveryAreaId 
string, deliveryCountry string, deliveryProvince string, deliveryCity 
string,deliveryDistrict string, deliveryStreet string, oxSingleNumber string, 
ActiveCheckTime string, ActiveAreaId string, ActiveCountry string, 
ActiveProvince string, Activecity string, ActiveDistrict string, ActiveStreet 
string, ActiveOperatorId string, Active_releaseId string, Active_EMUIVersion 
string, Active_operaSysVersion string, Active_BacVerNumber string, 
Active_BacFlashVer string, Active_webUIVersion string, Active_webUITypeCarrVer 
string,Active_webTypeDataVerNumber string, Active_operatorsVersion string, 
Active_phonePADPartitionedVersions string, Latest_YEAR int, Latest_MONTH int, 
Latest_DAY Decimal(30,10), Latest_HOUR string, Latest_areaId string, 
Latest_country string, Latest_province string, Latest_city string, 
Latest_district string, Latest_street string, Latest_releaseId string, 
Latest_EMUIVersion string, Latest_operaSysVersion string, Latest_BacVerNumber 
string, Latest_BacFlashVer string, Latest_webUIVersion string, 
Latest_webUITypeCarrVer string, Latest_webTypeDataVerNumber string, 
Latest_operatorsVersion string, Latest_phonePADPartitionedVersions string, 
Latest_operatorId string, gamePointDescription string,gamePointId 
double,contractNumber BigInt) stored as carbondata 
TBLPROPERTIES('table_blocksize'='2048','streaming'='true', 
'sort_columns'='imei');

 

*+From Spark-shell ,execute :+*

import org.apache.spark.sql.streaming._
 import org.apache.spark.sql.streaming.Trigger.ProcessingTime

val df_j=spark.readStream.text("hdfs://hacluster/Priyesh/streaming/csv/*.csv")

df_j.writeStream.format("carbondata").option("dbName","ranjan").option("carbon.stream.parser","org.apache.carbondata.streaming.parser.CSVStreamParserImp").option("checkpointLocation",
 
"hdfs://hacluster/tmp/stream_test/checkpoint_all_data").option("bad_records_action","hdfs://hacluster/tmp/stream_test/bad_records_all_data").option("tableName","all_datatypes_2048").trigger(ProcessingTime(6000)).option("carbon.streaming.auto.handoff.enabled","true").option("carbon.streaming.segment.max.size",102400).start

show segments for table all_datatypes_2048;

 

*issue 1 :*

*+when  copy csv file in hdfs folder for 1st time after streaming started 
,writestream fails with error:+*

scala> 
df_j.writeStream.format("carbondata").option("dbName","ranjan").option("carbon.stream.parser","org.apache.carbondata.streaming.parser.CSVStreamParserImp").option("checkpointLocation",
 
"hdfs://hacluster/tmp/stream_test/checkpoint_all_data").option("bad_records_action","hdfs://hacluster/tmp/stream_test/bad_records_all_data").option("tableName","all_datatypes_2048").trigger(ProcessingTime(6000)).option("carbon.streaming.auto.handoff.enabled","true").option("carbon.streaming.segment.max.size",102400).start
 21/08/26 12:53:11 WARN CarbonProperties: The enable mv value "null" is 
invalid. Using the default value "true"
 21/08/26 12:53:11 WARN CarbonProperties: The value "LOCALLOCK" configured for 
key carbon.lock.type is invalid for current file system. Use the default value 
HDFSLOCK instead.
 21/08/26 12:53:12 WARN HiveConf: HiveConf of name 
hive.metastore.rdb.password.decode.enable does not exist
 21/08/26 12:53:12 WARN HiveConf: HiveConf of name 
hive.metastore.db.ssl.enabled does not exist
 21/08/26 12:53:13 WARN HiveConf: HiveConf of name 
hive.metastore.rdb.password.decode.enable does not exist
 21/08/26 12:53:13 WARN HiveConf: HiveConf of name 
hive.metastore.db.ssl.enabled does not exist
 21/08/26 12:53:14 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
 res0: org.apache.spark.sql.streaming.StreamingQuery = 
org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@ad038f8

scala> 21/08/26 13:00:49 WARN DFSClient: DataStreamer Exception
 

[jira] [Updated] (CARBONDATA-4276) writestream fail when csv is copied to readstream hdfs path in Spark 2.4.5

2021-08-26 Thread Chetan Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated CARBONDATA-4276:

Summary: writestream fail when csv is copied to readstream hdfs path in 
Spark 2.4.5  (was: writestream fail when csv is copied to readstream hdfs path)

> writestream fail when csv is copied to readstream hdfs path in Spark 2.4.5
> --
>
> Key: CARBONDATA-4276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4276
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 2.2.0
> Environment: Spark 2.4.5
>Reporter: PRIYESH RANJAN
>Priority: Minor
>
> *steps :*
> *+In hdfs execute following command :+*
>  cd /opt/HA/C10/install/hadoop/datanode/bin/
> ./hdfs dfs -rm -r /tmp/stream_test/checkpoint_all_data
> ./hdfs dfs -mkdir -p 
> /tmp/stream_test/\{checkpoint_all_data,bad_records_all_data}
> ./hdfs dfs -mkdir -p /Priyesh/streaming/csv/
> ./hdfs dfs -cp /chetan/100_olap_C20.csv /Priyesh/streaming/csv/
> ./hdfs dfs -cp /Priyesh/streaming/csv/100_olap_C20.csv 
> /Priyesh/streaming/csv/100_olap_C21.csv
>  
> *+From Spark-beeline /Spark-sql /Spark-shell, execute :+*
> DROP TABLE IF EXISTS all_datatypes_2048;
> create table all_datatypes_2048 (imei string,deviceInformationId int,MAC 
> string,deviceColor string,device_backColor string,modelId string,marketName 
> string,AMSize string,ROMSize string,CUPAudit string,CPIClocked string,series 
> string,productionDate timestamp,bomCode string,internalModels string, 
> deliveryTime string, channelsId string, channelsName string , deliveryAreaId 
> string, deliveryCountry string, deliveryProvince string, deliveryCity 
> string,deliveryDistrict string, deliveryStreet string, oxSingleNumber string, 
> ActiveCheckTime string, ActiveAreaId string, ActiveCountry string, 
> ActiveProvince string, Activecity string, ActiveDistrict string, ActiveStreet 
> string, ActiveOperatorId string, Active_releaseId string, Active_EMUIVersion 
> string, Active_operaSysVersion string, Active_BacVerNumber string, 
> Active_BacFlashVer string, Active_webUIVersion string, 
> Active_webUITypeCarrVer string,Active_webTypeDataVerNumber string, 
> Active_operatorsVersion string, Active_phonePADPartitionedVersions string, 
> Latest_YEAR int, Latest_MONTH int, Latest_DAY Decimal(30,10), Latest_HOUR 
> string, Latest_areaId string, Latest_country string, Latest_province string, 
> Latest_city string, Latest_district string, Latest_street string, 
> Latest_releaseId string, Latest_EMUIVersion string, Latest_operaSysVersion 
> string, Latest_BacVerNumber string, Latest_BacFlashVer string, 
> Latest_webUIVersion string, Latest_webUITypeCarrVer string, 
> Latest_webTypeDataVerNumber string, Latest_operatorsVersion string, 
> Latest_phonePADPartitionedVersions string, Latest_operatorId string, 
> gamePointDescription string,gamePointId double,contractNumber BigInt) stored 
> as carbondata TBLPROPERTIES('table_blocksize'='2048','streaming'='true', 
> 'sort_columns'='imei');
>  
> *+From Spark-shell ,execute :+*
> import org.apache.spark.sql.streaming._
> import org.apache.spark.sql.streaming.Trigger.ProcessingTime
> val df_j=spark.readStream.text("hdfs://hacluster/Priyesh/streaming/csv/*.csv")
> df_j.writeStream.format("carbondata").option("dbName","ranjan").option("carbon.stream.parser","org.apache.carbondata.streaming.parser.CSVStreamParserImp").option("checkpointLocation",
>  
> "hdfs://hacluster/tmp/stream_test/checkpoint_all_data").option("bad_records_action","hdfs://hacluster/tmp/stream_test/bad_records_all_data").option("tableName","all_datatypes_2048").trigger(ProcessingTime(6000)).option("carbon.streaming.auto.handoff.enabled","true").option("carbon.streaming.segment.max.size",102400).start
> show segments for table all_datatypes_2048;
>  
> *issue 1 :*
> *+when  copy csv file in hdfs folder for 1st time after streaming started 
> ,writestream fails with error:+*
> scala> 
> df_j.writeStream.format("carbondata").option("dbName","ranjan").option("carbon.stream.parser","org.apache.carbondata.streaming.parser.CSVStreamParserImp").option("checkpointLocation",
>  
> "hdfs://hacluster/tmp/stream_test/checkpoint_all_data").option("bad_records_action","hdfs://hacluster/tmp/stream_test/bad_records_all_data").option("tableName","all_datatypes_2048").trigger(ProcessingTime(6000)).option("carbon.streaming.auto.handoff.enabled","true").option("carbon.streaming.segment.max.size",102400).start
> 21/08/26 12:53:11 WARN CarbonProperties: The enable mv value "null" is 
> invalid. Using the default value "true"
> 21/08/26 12:53:11 WARN CarbonProperties: The value "LOCALLOCK" configured for 
> key carbon.lock.type is invalid for current file system. Use the default 
> value HDFSLOCK instead.
> 21/08/26 12:53:12 

[jira] [Created] (CARBONDATA-4276) writestream fail when csv is copied to readstream hdfs path

2021-08-26 Thread PRIYESH RANJAN (Jira)
PRIYESH RANJAN created CARBONDATA-4276:
--

 Summary: writestream fail when csv is copied to readstream hdfs 
path
 Key: CARBONDATA-4276
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4276
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 2.2.0
 Environment: Spark 2.4.5
Reporter: PRIYESH RANJAN


*steps :*

*+In hdfs execute following command :+*

 cd /opt/HA/C10/install/hadoop/datanode/bin/
./hdfs dfs -rm -r /tmp/stream_test/checkpoint_all_data
./hdfs dfs -mkdir -p 
/tmp/stream_test/\{checkpoint_all_data,bad_records_all_data}
./hdfs dfs -mkdir -p /Priyesh/streaming/csv/
./hdfs dfs -cp /chetan/100_olap_C20.csv /Priyesh/streaming/csv/

./hdfs dfs -cp /Priyesh/streaming/csv/100_olap_C20.csv 
/Priyesh/streaming/csv/100_olap_C21.csv

 

*+From Spark-beeline /Spark-sql /Spark-shell, execute :+*

DROP TABLE IF EXISTS all_datatypes_2048;
create table all_datatypes_2048 (imei string,deviceInformationId int,MAC 
string,deviceColor string,device_backColor string,modelId string,marketName 
string,AMSize string,ROMSize string,CUPAudit string,CPIClocked string,series 
string,productionDate timestamp,bomCode string,internalModels string, 
deliveryTime string, channelsId string, channelsName string , deliveryAreaId 
string, deliveryCountry string, deliveryProvince string, deliveryCity 
string,deliveryDistrict string, deliveryStreet string, oxSingleNumber string, 
ActiveCheckTime string, ActiveAreaId string, ActiveCountry string, 
ActiveProvince string, Activecity string, ActiveDistrict string, ActiveStreet 
string, ActiveOperatorId string, Active_releaseId string, Active_EMUIVersion 
string, Active_operaSysVersion string, Active_BacVerNumber string, 
Active_BacFlashVer string, Active_webUIVersion string, Active_webUITypeCarrVer 
string,Active_webTypeDataVerNumber string, Active_operatorsVersion string, 
Active_phonePADPartitionedVersions string, Latest_YEAR int, Latest_MONTH int, 
Latest_DAY Decimal(30,10), Latest_HOUR string, Latest_areaId string, 
Latest_country string, Latest_province string, Latest_city string, 
Latest_district string, Latest_street string, Latest_releaseId string, 
Latest_EMUIVersion string, Latest_operaSysVersion string, Latest_BacVerNumber 
string, Latest_BacFlashVer string, Latest_webUIVersion string, 
Latest_webUITypeCarrVer string, Latest_webTypeDataVerNumber string, 
Latest_operatorsVersion string, Latest_phonePADPartitionedVersions string, 
Latest_operatorId string, gamePointDescription string,gamePointId 
double,contractNumber BigInt) stored as carbondata 
TBLPROPERTIES('table_blocksize'='2048','streaming'='true', 
'sort_columns'='imei');

 

*+From Spark-shell ,execute :+*

import org.apache.spark.sql.streaming._
import org.apache.spark.sql.streaming.Trigger.ProcessingTime

val df_j=spark.readStream.text("hdfs://hacluster/Priyesh/streaming/csv/*.csv")

df_j.writeStream.format("carbondata").option("dbName","ranjan").option("carbon.stream.parser","org.apache.carbondata.streaming.parser.CSVStreamParserImp").option("checkpointLocation",
 
"hdfs://hacluster/tmp/stream_test/checkpoint_all_data").option("bad_records_action","hdfs://hacluster/tmp/stream_test/bad_records_all_data").option("tableName","all_datatypes_2048").trigger(ProcessingTime(6000)).option("carbon.streaming.auto.handoff.enabled","true").option("carbon.streaming.segment.max.size",102400).start

show segments for table all_datatypes_2048;

 

*issue 1 :*

*+when  copy csv file in hdfs folder for 1st time after streaming started 
,writestream fails with error:+*

scala> 
df_j.writeStream.format("carbondata").option("dbName","ranjan").option("carbon.stream.parser","org.apache.carbondata.streaming.parser.CSVStreamParserImp").option("checkpointLocation",
 
"hdfs://hacluster/tmp/stream_test/checkpoint_all_data").option("bad_records_action","hdfs://hacluster/tmp/stream_test/bad_records_all_data").option("tableName","all_datatypes_2048").trigger(ProcessingTime(6000)).option("carbon.streaming.auto.handoff.enabled","true").option("carbon.streaming.segment.max.size",102400).start
21/08/26 12:53:11 WARN CarbonProperties: The enable mv value "null" is invalid. 
Using the default value "true"
21/08/26 12:53:11 WARN CarbonProperties: The value "LOCALLOCK" configured for 
key carbon.lock.type is invalid for current file system. Use the default value 
HDFSLOCK instead.
21/08/26 12:53:12 WARN HiveConf: HiveConf of name 
hive.metastore.rdb.password.decode.enable does not exist
21/08/26 12:53:12 WARN HiveConf: HiveConf of name hive.metastore.db.ssl.enabled 
does not exist
21/08/26 12:53:13 WARN HiveConf: HiveConf of name 
hive.metastore.rdb.password.decode.enable does not exist
21/08/26 12:53:13 WARN HiveConf: HiveConf of name hive.metastore.db.ssl.enabled 
does not exist
21/08/26 12:53:14 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException