[jira] [Commented] (CARBONDATA-1985) Insert into failed for multi partitioned table for static partition

Sangeeta Gulia (JIRA) Sun, 28 Jan 2018 23:15:54 -0800

    [ 
https://issues.apache.org/jira/browse/CARBONDATA-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342995#comment-16342995
 ]


Sangeeta Gulia commented on CARBONDATA-1985:
--------------------------------------------

[~geetikagupta] Hive also shows the same behavior. Hence it is an invalid bug. 
Please close.

To verify:

you can create a hive table with partition:

CREATE TABLE uniqdata_hive1(ACTIVE_EMUI_VERSION string, DOB timestamp,
DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 
decimal(30,10),
DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,
INTEGER_COLUMN1 int) Partitioned by (cust_id int, cust_name string) stored as 
parquet;

insert into uniqdata_hive1 partition(cust_id='1',cust_name='CUST_NAME_00002') 
select * from uniqdata_hive limit 10;

Below are the commands and result for your reference:

0: jdbc:hive2://localhost:10000> CREATE TABLE 
uniqdata_hive1(ACTIVE_EMUI_VERSION string, DOB timestamp,
0: jdbc:hive2://localhost:10000> DOJ timestamp, BIGINT_COLUMN1 
bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10),
0: jdbc:hive2://localhost:10000> DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 
double, Double_COLUMN2 double,
0: jdbc:hive2://localhost:10000> INTEGER_COLUMN1 int) Partitioned by (cust_id 
int, cust_name string) stored as parquet;
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.305 seconds)
0: jdbc:hive2://localhost:10000> insert into uniqdata_hive1 
partition(cust_id='1',cust_name='CUST_NAME_00002') select * from uniqdata_hive 
limit 10;
Error: org.apache.spark.sql.AnalysisException: Cannot insert into table 
`default`.`uniqdata_hive1` because the number of columns are different: need 10 
columns, but query has 12 columns.; (state=,code=0)
0: jdbc:hive2://localhost:10000>

 

 

> Insert into failed for multi partitioned table for static partition
> -------------------------------------------------------------------
>
>                 Key: CARBONDATA-1985
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1985
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-query
>    Affects Versions: 1.3.0
>         Environment: spark2.1
>            Reporter: Geetika Gupta
>            Priority: Major
>             Fix For: 1.3.0
>
>         Attachments: 2000_UniqData.csv
>
>
> I created a table using:
> CREATE TABLE uniqdata_int_string(ACTIVE_EMUI_VERSION string, DOB timestamp,
> DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 
> decimal(30,10),
> DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,
> INTEGER_COLUMN1 int) Partitioned by (cust_id int, cust_name string) STORED BY 
> 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB")
> Hive create and load table command:
> CREATE TABLE uniqdata_hive (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp,
> DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 
> decimal(30,10),
> DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,
> INTEGER_COLUMN1 int)ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',';
> LOAD DATA LOCAL INPATH 'file:///home/geetika/Downloads/2000_UniqData.csv' 
> into table UNIQDATA_HIVE;  
> Insert into table command:
> insert into uniqdata_int_string 
> partition(cust_id='1',cust_name='CUST_NAME_00002') select * from 
> uniqdata_hive limit 10;
> Output:
> Error: java.lang.IndexOutOfBoundsException: Index: 4, Size: 4 (state=,code=0)
> Here are the logs:
> 18/01/04 16:24:45 ERROR CarbonLoadDataCommand: pool-23-thread-6 
> org.apache.spark.sql.AnalysisException: Cannot insert into table 
> `28dec`.`uniqdata_int_string` because the number of columns are different: 
> need 10 columns, but query has 12 columns.;
>       at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.org$apache$spark$sql$execution$datasources$PreprocessTableInsertion$$preprocess(rules.scala:222)
>       at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:280)
>       at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:272)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
>       at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:277)
>       at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.apply(rules.scala:272)
>       at 
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.apply(rules.scala:207)
>       at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
>       at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
>       at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>       at scala.collection.immutable.List.foldLeft(List.scala:84)
>       at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
>       at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
>       at scala.collection.immutable.List.foreach(List.scala:381)
>       at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
>       at 
> org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonSessionState.scala:242)
>       at 
> org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonSessionState.scala:237)
>       at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:64)
>       at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:62)
>       at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
>       at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
>       at 
> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.loadDataWithPartition(CarbonLoadDataCommand.scala:641)
>       at 
> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.loadData(CarbonLoadDataCommand.scala:431)
>       at 
> org.apache.spark.sql.execution.command.management.CarbonLoadDataCommand.processData(CarbonLoadDataCommand.scala:223)
>       at 
> org.apache.spark.sql.execution.command.DataCommand.run(package.scala:71)
>       at 
> org.apache.spark.sql.execution.command.management.CarbonInsertIntoCommand.processData(CarbonInsertIntoCommand.scala:48)
>       at 
> org.apache.spark.sql.execution.command.DataCommand.run(package.scala:71)
>       at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>       at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>       at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>       at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>       at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>       at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>       at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>       at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
>       at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
>       at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
>       at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
>       at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
>       at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
>       at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
>       at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:220)
>       at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:163)
>       at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:160)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>       at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:173)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> P.S: Load command using static partition was successful
> LOAD DATA INPATH 'hdfs://localhost:54311/Files/2000_UniqData.csv' into table 
> uniqdata_int_string partition(cust_id='1', cust_name='CUST_NAME_00002') 
> OPTIONS ('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, 
> BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, 
> Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE');



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CARBONDATA-1985) Insert into failed for multi partitioned table for static partition

Reply via email to