[jira] [Commented] (CARBONDATA-4273) Cannot create table with partitions in Spark in EMR
[ https://issues.apache.org/jira/browse/CARBONDATA-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405241#comment-17405241 ] Brijoo Bopanna commented on CARBONDATA-4273: Thanks for sharing this issue, we will check and reply > Cannot create table with partitions in Spark in EMR > --- > > Key: CARBONDATA-4273 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4273 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 2.2.0 > Environment: Release label:emr-5.24.1 > Hadoop distribution:Amazon 2.8.5 > Applications: > Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, > JupyterHub 0.9.6 > Jar complied with: > apache-carbondata:2.2.0 > spark:2.4.5 > hadoop:2.8.3 >Reporter: Bigicecream >Priority: Critical > Labels: EMR, spark > > > When trying to create a table like this: > {code:sql} > CREATE TABLE IF NOT EXISTS will_not_work( > timestamp string, > name string > ) > PARTITIONED BY (dt string, hr string) > STORED AS carbondata > LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work > {code} > I get the following error: > {noformat} > org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: > Partition is not supported for external table > at > org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219) > at > org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235) > at > org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394) > at > org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69) > at > org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137) > at > org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137) > at > org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118) > at > org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134) > at > org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) > at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363) > at org.apache.spark.sql.Dataset.(Dataset.scala:194) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643) > ... 64 elided > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (CARBONDATA-2877) CarbonDataWriterException when loading data to carbon table with large number of rows/columns from Spark-Submit
[ https://issues.apache.org/jira/browse/CARBONDATA-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brijoo Bopanna reassigned CARBONDATA-2877: -- Assignee: Brijoo Bopanna (was: kumar vishal) > CarbonDataWriterException when loading data to carbon table with large number > of rows/columns from Spark-Submit > --- > > Key: CARBONDATA-2877 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2877 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.4.1 > Environment: Spark 2.1 >Reporter: Chetan Bhat >Assignee: Brijoo Bopanna >Priority: Major > > Steps : > from Spark-Submit. User creates a table with large number of columns(around > 100) and tries to load around 3 lakh records to the table. > Spark-submit command - spark-submit --master yarn --num-executors 3 > --executor-memory 75g --driver-memory 10g --executor-cores 12 --class > Actual Issue : Data loading fails with CarbonDataWriterException. > Executor yarn UI log- > org.apache.spark.util.TaskCompletionListenerException: > org.apache.carbondata.core.datastore.exception.CarbonDataWriterException > Previous exception in task: Error while initializing data handler : > > org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:141) > > org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51) > > org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD$$anon$1.(NewCarbonDataLoadRDD.scala:221) > > org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD.internalCompute(NewCarbonDataLoadRDD.scala:197) > org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > org.apache.spark.scheduler.Task.run(Task.scala:99) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > at > org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138) > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Expected : The dataloading should be successful from Spark-submit similar to > that in Beeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2833) NPE when we do a insert over a insert failure operation
Brijoo Bopanna created CARBONDATA-2833: -- Summary: NPE when we do a insert over a insert failure operation Key: CARBONDATA-2833 URL: https://issues.apache.org/jira/browse/CARBONDATA-2833 Project: CarbonData Issue Type: Bug Reporter: Brijoo Bopanna jdbc:hive2://10.18.5.188:23040/default> CREATE TABLE 0: jdbc:hive2://10.18.5.188:23040/default> IF NOT EXISTS test_table( 0: jdbc:hive2://10.18.5.188:23040/default> id string, 0: jdbc:hive2://10.18.5.188:23040/default> name string, 0: jdbc:hive2://10.18.5.188:23040/default> city string, 0: jdbc:hive2://10.18.5.188:23040/default> age Int) 0: jdbc:hive2://10.18.5.188:23040/default> STORED BY 'carbondata'; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.191 seconds) 0: jdbc:hive2://10.18.5.188:23040/default> 0: jdbc:hive2://10.18.5.188:23040/default> 0: jdbc:hive2://10.18.5.188:23040/default> 0: jdbc:hive2://10.18.5.188:23040/default> desc test_table 0: jdbc:hive2://10.18.5.188:23040/default> ; +---++--+--+ | col_name | data_type | comment | +---++--+--+ | id | string | NULL | | name | string | NULL | | city | string | NULL | | age | int | NULL | +---++--+--+ 4 rows selected (0.081 seconds) 0: jdbc:hive2://10.18.5.188:23040/default> insert into ab select 'berb','abc','ggg','1'; Error: java.lang.Exception: Data load failed due to bad record: The value with column name a and column data type INT is not a valid INT type.Please enable bad record logger to know the detail reason. (state=,code=0) 0: jdbc:hive2://10.18.5.188:23040/default> insert into ab select 'berb','abc','ggg','1'; *Error: java.lang.NullPointerException (state=,code=0)* 0: jdbc:hive2://10.18.5.188:23040/default> insert into test_table select 'berb','abc','ggg',1; +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.127 seconds) 0: jdbc:hive2://10.18.5.188:23040/default> show tables 0: jdbc:hive2://10.18.5.188:23040/default> ; +---+-+--+--+ | database | tableName | isTemporary | +---+-+--+--+ | praveen | a | false | | praveen | ab | false | | praveen | bbc | false | | praveen | test_table | false | +---+-+--+--+ 4 rows selected (0.041 seconds) 0: jdbc:hive2://10.18.5.188:23040/default> 0: jdbc:hive2://10.18.5.188:23040/default> desc ab 0: jdbc:hive2://10.18.5.188:23040/default> ; +---++--+--+ | col_name | data_type | comment | +---++--+--+ | a | int | NULL | | b | string | NULL | +---++--+--+ 2 rows selected (0.074 seconds) -- This message was sent by Atlassian JIRA (v7.6.3#76005)