[jira] [Commented] (CARBONDATA-4273) Cannot create table with partitions in Spark in EMR

2021-08-26 Thread Brijoo Bopanna (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405241#comment-17405241
 ] 

Brijoo Bopanna commented on CARBONDATA-4273:


Thanks for sharing this issue,  we will check and reply

> Cannot create table with partitions in Spark in EMR
> ---
>
> Key: CARBONDATA-4273
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4273
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.2.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, 
> JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.2.0
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Critical
>  Labels: EMR, spark
>
>  
> When trying to create a table like this:
> {code:sql}
> CREATE TABLE IF NOT EXISTS will_not_work(
> timestamp string,
> name string
> )
> PARTITIONED BY (dt string, hr string)
> STORED AS carbondata
> LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work
> {code}
> I get the following error:
> {noformat}
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
> Partition is not supported for external table
>   at 
> org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394)
>   at 
> org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
>   ... 64 elided
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CARBONDATA-2877) CarbonDataWriterException when loading data to carbon table with large number of rows/columns from Spark-Submit

2018-09-11 Thread Brijoo Bopanna (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brijoo Bopanna reassigned CARBONDATA-2877:
--

Assignee: Brijoo Bopanna  (was: kumar vishal)

> CarbonDataWriterException when loading data to carbon table with large number 
> of rows/columns from Spark-Submit
> ---
>
> Key: CARBONDATA-2877
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2877
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.4.1
> Environment: Spark 2.1
>Reporter: Chetan Bhat
>Assignee: Brijoo Bopanna
>Priority: Major
>
> Steps :
> from Spark-Submit. User creates a table with large number of columns(around 
> 100) and tries to load around 3 lakh records to the table.
> Spark-submit command - spark-submit --master yarn --num-executors 3 
> --executor-memory 75g --driver-memory 10g --executor-cores 12 --class
> Actual Issue : Data loading fails with CarbonDataWriterException.
> Executor yarn UI log-
> org.apache.spark.util.TaskCompletionListenerException: 
> org.apache.carbondata.core.datastore.exception.CarbonDataWriterException
> Previous exception in task: Error while initializing data handler : 
>  
> org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:141)
>  
> org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51)
>  
> org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD$$anon$1.(NewCarbonDataLoadRDD.scala:221)
>  
> org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD.internalCompute(NewCarbonDataLoadRDD.scala:197)
>  org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78)
>  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  org.apache.spark.scheduler.Task.run(Task.scala:99)
>  org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
>  
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  java.lang.Thread.run(Thread.java:748)
>  at 
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
>  at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Expected : The dataloading should be successful from Spark-submit similar to 
> that in Beeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2833) NPE when we do a insert over a insert failure operation

2018-08-06 Thread Brijoo Bopanna (JIRA)
Brijoo Bopanna created CARBONDATA-2833:
--

 Summary: NPE when we do a insert over a insert failure operation
 Key: CARBONDATA-2833
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2833
 Project: CarbonData
  Issue Type: Bug
Reporter: Brijoo Bopanna


jdbc:hive2://10.18.5.188:23040/default> CREATE TABLE

0: jdbc:hive2://10.18.5.188:23040/default> IF NOT EXISTS test_table(

0: jdbc:hive2://10.18.5.188:23040/default> id string,

0: jdbc:hive2://10.18.5.188:23040/default> name string,

0: jdbc:hive2://10.18.5.188:23040/default> city string,

0: jdbc:hive2://10.18.5.188:23040/default> age Int)

0: jdbc:hive2://10.18.5.188:23040/default> STORED BY 'carbondata';

+-+--+

| Result  |

+-+--+

+-+--+

No rows selected (0.191 seconds)

0: jdbc:hive2://10.18.5.188:23040/default>

0: jdbc:hive2://10.18.5.188:23040/default>

0: jdbc:hive2://10.18.5.188:23040/default>

0: jdbc:hive2://10.18.5.188:23040/default> desc test_table

0: jdbc:hive2://10.18.5.188:23040/default> ;

+---++--+--+

| col_name  | data_type  | comment  |

+---++--+--+

| id    | string | NULL |

| name  | string | NULL |

| city  | string | NULL |

| age   | int    | NULL |

+---++--+--+

4 rows selected (0.081 seconds)

0: jdbc:hive2://10.18.5.188:23040/default> insert into ab select 
'berb','abc','ggg','1';

Error: java.lang.Exception: Data load failed due to bad record: The value with 
column name a and column data type INT is not a valid INT type.Please enable 
bad record logger to know the detail reason. (state=,code=0)

0: jdbc:hive2://10.18.5.188:23040/default> insert into ab select 
'berb','abc','ggg','1';

*Error: java.lang.NullPointerException (state=,code=0)*

0: jdbc:hive2://10.18.5.188:23040/default> insert into test_table select 
'berb','abc','ggg',1;

+-+--+

| Result  |

+-+--+

+-+--+

No rows selected (1.127 seconds)

0: jdbc:hive2://10.18.5.188:23040/default> show tables

0: jdbc:hive2://10.18.5.188:23040/default> ;

+---+-+--+--+

| database  |  tableName  | isTemporary  |

+---+-+--+--+

| praveen   | a   | false    |

| praveen   | ab      | false    |

| praveen   | bbc | false    |

| praveen   | test_table  | false    |

+---+-+--+--+

4 rows selected (0.041 seconds)

0: jdbc:hive2://10.18.5.188:23040/default>

0: jdbc:hive2://10.18.5.188:23040/default> desc ab

0: jdbc:hive2://10.18.5.188:23040/default> ;

+---++--+--+

| col_name  | data_type  | comment  |

+---++--+--+

| a | int    | NULL |

| b | string | NULL |

+---++--+--+

2 rows selected (0.074 seconds)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)