Hong Shen created CARBONDATA-3642:
-------------------------------------

             Summary: Improve error msg when string length exceed 32000
                 Key: CARBONDATA-3642
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3642
             Project: CarbonData
          Issue Type: Improvement
          Components: spark-integration
            Reporter: Hong Shen


When I run a produce sql, {code} insert overwrite TABLE table1 select * from 
table2 {code} 
table1 is a carbon table, it failed with error message:
{code}
Previous exception in task: Dataload failed, String length cannot exceed 32000 
characters
        
org.apache.carbondata.streaming.parser.FieldConverter$.objectToString(FieldConverter.scala:53)
        
org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:71)
        
org.apache.carbondata.spark.rdd.NewRddIterator$$anonfun$next$1.apply$mcVI$sp(NewCarbonDataLoadRDD.scala:360)
        scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
        
org.apache.carbondata.spark.rdd.NewRddIterator.next(NewCarbonDataLoadRDD.scala:359)
        
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$1.next(DataLoadProcessorStepOnSpark.scala:66)
        
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$1.next(DataLoadProcessorStepOnSpark.scala:61)
        
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$4.next(DataLoadProcessorStepOnSpark.scala:179)
        
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anon$4.next(DataLoadProcessorStepOnSpark.scala:170)
        scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
        scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
        
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
        
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:216)
        
org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:109)
        
org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:102)
        
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$26.apply(RDD.scala:830)
        
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$26.apply(RDD.scala:830)
        org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        org.apache.spark.scheduler.Task.run(Task.scala:109)
        
org.apache.spark.executor.Executor$TaskRunner$$anon$2.run(Executor.scala:379)
        java.security.AccessController.doPrivileged(Native Method)
        javax.security.auth.Subject.doAs(Subject.java:360)
        
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1787)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:376)
        
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
        
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:621)
        java.lang.Thread.run(Thread.java:849)
                at 
org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139)
                at 
org.apache.spark.TaskContextImpl.markTaskFailed(TaskContextImpl.scala:107)
                at org.apache.spark.scheduler.Task.run(Task.scala:114)
                ... 8 more
{code}
Since the table1 has 61 columns, it's difficult to find which column length 
exceed, here is the columns in table1.
{code}
`user_id`           string 
`user_type_id`      bigint 
`loged_time`        string 
`log_time`          string 
`stay_second`       string
`product_id`        string
`product_version`   string
`biz_id`            string
`biz_app_id`        string
`biz_app_name`      string
`bu_app_id`         string
`bu_app_name`       string
`spm`               string
`spm_a`             string
`spm_b`             string
`spm_name`          string
`activity_id`       string
`page_id`           string
`scm`               string
`new_scm`           string
`scm_sys_name`      string
`session_id`        string
`user_session_id`   string
`parent_spm`        string
`parent_spm_a`      string
`parent_spm_b`      string
`parent_page_id`    string
`chinfo`            string
`new_chinfo`        string
`channel`           string
`landing_page_spm`  string
`public_id`         string
`utdid`             string
`tcid`              string
`ucid`              string
`device_model`      string
`os_version`        string
`network`           string
`inner_version`     string
`app_channel`       string
`language`          string
`ip`                string
`ip_country_name`   string
`ip_province_name`  string
`ip_city_name`      string
`city_id`           string
`city_name`         string
`province_id`       string
`province_name`     string
`country_id`        string
`country_abbr_name` string
`base_exinfo`       string
`exinfo1`           string
`exinfo2`           string
`exinfo3`           string
`exinfo4`           string
`exinfo5`           string
`env_type`          string
`log_type`          string
`behavior_id`       string
`experiment_ids`    string
{code}

If the error msg has column idx or column name, it will be more friendly to 
user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to