[jira] [Commented] (CARBONDATA-1445) if 'carbon.update.persist.enable'='false', it will fail to update data

2017-09-06 Thread Ashwini K (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155211#comment-16155211
 ] 

Ashwini K commented on CARBONDATA-1445:
---

This issue is fixed as a part of JIRA#1293 and PR 
https://github.com/apache/carbondata/pull/1161/

> if 'carbon.update.persist.enable'='false', it will fail to update data 
> ---
>
> Key: CARBONDATA-1445
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1445
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load, spark-integration, sql
>Affects Versions: 1.2.0
> Environment: CarbonData master branch, Spark 2.1.1
>Reporter: Zhichao  Zhang
>Assignee: Ashwini K
>Priority: Minor
>
> When updating data, if set 'carbon.update.persist.enable'='false', it will 
> fail.
> I debug code and find that in the method LoadTable.processData the 
> 'dataFrameWithTupleId' will call udf 'getTupleId()' which is defined in 
> CarbonEnv.init(): 'sparkSession.udf.register("getTupleId", () => "")', it 
> will return blank string to 'CarbonUpdateUtil.getRequiredFieldFromTID', so 
> ArrayIndexOutOfBoundsException occur.
> *the plans (logical and physical) for dataFrameWithTupleId :*
> == Parsed Logical Plan ==
> 'Project [unresolvedalias('stringField3, None), unresolvedalias('intField, 
> None), unresolvedalias('longField, None), unresolvedalias('int2Field, None), 
> unresolvedalias('stringfield1-updatedColumn, None), 
> unresolvedalias('stringfield2-updatedColumn, None), UDF('tupleId) AS 
> segId#286]
> +- Project [stringField3#113, intField#114, longField#115L, int2Field#116, 
> UDF:getTupleId() AS tupleId#262, concat(stringField1#111, _test) AS 
> stringfield1-updatedColumn#263, concat(stringField2#112, _test) AS 
> stringfield2-updatedColumn#264]
>+- Filter (isnotnull(stringField3#113) && (stringField3#113 = 1))
>   +- 
> Relation[stringField1#111,stringField2#112,stringField3#113,intField#114,longField#115L,int2Field#116]
>  CarbonDatasourceHadoopRelation [ Database name :default, Table name 
> :study_carbondata, Schema 
> :Some(StructType(StructField(stringField1,StringType,true), 
> StructField(stringField2,StringType,true), 
> StructField(stringField3,StringType,true), 
> StructField(intField,IntegerType,true), StructField(longField,LongType,true), 
> StructField(int2Field,IntegerType,true))) ]
> == Analyzed Logical Plan ==
> stringField3: string, intField: int, longField: bigint, int2Field: int, 
> stringfield1-updatedColumn: string, stringfield2-updatedColumn: string, 
> segId: string
> Project [stringField3#113, intField#114, longField#115L, int2Field#116, 
> stringfield1-updatedColumn#263, stringfield2-updatedColumn#264, 
> UDF(tupleId#262) AS segId#286]
> +- Project [stringField3#113, intField#114, longField#115L, int2Field#116, 
> UDF:getTupleId() AS tupleId#262, concat(stringField1#111, _test) AS 
> stringfield1-updatedColumn#263, concat(stringField2#112, _test) AS 
> stringfield2-updatedColumn#264]
>+- Filter (isnotnull(stringField3#113) && (stringField3#113 = 1))
>   +- 
> Relation[stringField1#111,stringField2#112,stringField3#113,intField#114,longField#115L,int2Field#116]
>  CarbonDatasourceHadoopRelation [ Database name :default, Table name 
> :study_carbondata, Schema 
> :Some(StructType(StructField(stringField1,StringType,true), 
> StructField(stringField2,StringType,true), 
> StructField(stringField3,StringType,true), 
> StructField(intField,IntegerType,true), StructField(longField,LongType,true), 
> StructField(int2Field,IntegerType,true))) ]
> == Optimized Logical Plan ==
> CarbonDictionaryCatalystDecoder [CarbonDecoderRelation(Map(int2Field#116 -> 
> int2Field#116, longField#115L -> longField#115L, stringField2#112 -> 
> stringField2#112, stringField1#111 -> stringField1#111, stringField3#113 -> 
> stringField3#113, intField#114 -> 
> intField#114),CarbonDatasourceHadoopRelation [ Database name :default, Table 
> name :study_carbondata, Schema 
> :Some(StructType(StructField(stringField1,StringType,true), 
> StructField(stringField2,StringType,true), 
> StructField(stringField3,StringType,true), 
> StructField(intField,IntegerType,true), StructField(longField,LongType,true), 
> StructField(int2Field,IntegerType,true))) ])], 
> ExcludeProfile(ArrayBuffer(stringField2#112, stringField1#111)), 
> CarbonAliasDecoderRelation(), true
> +- Project [stringField3#113, intField#114, longField#115, int2Field#116, 
> concat(stringField1#111, _test) AS stringfield1-updatedColumn#263, 
> concat(stringField2#112, _test) AS stringfield2-updatedColumn#264, 
> UDF(UDF:getTupleId()) AS segId#286]
>+- Filter (isnotnull(stringField3#113) && (stringField3#113 = 1))
>   +- 
> Relation[stringField1#111,stringField2#112,stringField3#113,intField#114,longField#1

[jira] [Commented] (CARBONDATA-1445) if 'carbon.update.persist.enable'='false', it will fail to update data

2017-09-04 Thread Zhichao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153118#comment-16153118
 ] 

Zhichao  Zhang commented on CARBONDATA-1445:


Anyone can take a look at this issue?

> if 'carbon.update.persist.enable'='false', it will fail to update data 
> ---
>
> Key: CARBONDATA-1445
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1445
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load, spark-integration, sql
>Affects Versions: 1.2.0
> Environment: CarbonData master branch, Spark 2.1.1
>Reporter: Zhichao  Zhang
>Priority: Minor
>
> When updating data, if set 'carbon.update.persist.enable'='false', it will 
> fail.
> I debug code and find that in the method LoadTable.processData the 
> 'dataFrameWithTupleId' will call udf 'getTupleId()' which is defined in 
> CarbonEnv.init(): 'sparkSession.udf.register("getTupleId", () => "")', it 
> will return blank string to 'CarbonUpdateUtil.getRequiredFieldFromTID', so 
> ArrayIndexOutOfBoundsException occur.
> *the plans (logical and physical) for dataFrameWithTupleId :*
> == Parsed Logical Plan ==
> 'Project [unresolvedalias('stringField3, None), unresolvedalias('intField, 
> None), unresolvedalias('longField, None), unresolvedalias('int2Field, None), 
> unresolvedalias('stringfield1-updatedColumn, None), 
> unresolvedalias('stringfield2-updatedColumn, None), UDF('tupleId) AS 
> segId#286]
> +- Project [stringField3#113, intField#114, longField#115L, int2Field#116, 
> UDF:getTupleId() AS tupleId#262, concat(stringField1#111, _test) AS 
> stringfield1-updatedColumn#263, concat(stringField2#112, _test) AS 
> stringfield2-updatedColumn#264]
>+- Filter (isnotnull(stringField3#113) && (stringField3#113 = 1))
>   +- 
> Relation[stringField1#111,stringField2#112,stringField3#113,intField#114,longField#115L,int2Field#116]
>  CarbonDatasourceHadoopRelation [ Database name :default, Table name 
> :study_carbondata, Schema 
> :Some(StructType(StructField(stringField1,StringType,true), 
> StructField(stringField2,StringType,true), 
> StructField(stringField3,StringType,true), 
> StructField(intField,IntegerType,true), StructField(longField,LongType,true), 
> StructField(int2Field,IntegerType,true))) ]
> == Analyzed Logical Plan ==
> stringField3: string, intField: int, longField: bigint, int2Field: int, 
> stringfield1-updatedColumn: string, stringfield2-updatedColumn: string, 
> segId: string
> Project [stringField3#113, intField#114, longField#115L, int2Field#116, 
> stringfield1-updatedColumn#263, stringfield2-updatedColumn#264, 
> UDF(tupleId#262) AS segId#286]
> +- Project [stringField3#113, intField#114, longField#115L, int2Field#116, 
> UDF:getTupleId() AS tupleId#262, concat(stringField1#111, _test) AS 
> stringfield1-updatedColumn#263, concat(stringField2#112, _test) AS 
> stringfield2-updatedColumn#264]
>+- Filter (isnotnull(stringField3#113) && (stringField3#113 = 1))
>   +- 
> Relation[stringField1#111,stringField2#112,stringField3#113,intField#114,longField#115L,int2Field#116]
>  CarbonDatasourceHadoopRelation [ Database name :default, Table name 
> :study_carbondata, Schema 
> :Some(StructType(StructField(stringField1,StringType,true), 
> StructField(stringField2,StringType,true), 
> StructField(stringField3,StringType,true), 
> StructField(intField,IntegerType,true), StructField(longField,LongType,true), 
> StructField(int2Field,IntegerType,true))) ]
> == Optimized Logical Plan ==
> CarbonDictionaryCatalystDecoder [CarbonDecoderRelation(Map(int2Field#116 -> 
> int2Field#116, longField#115L -> longField#115L, stringField2#112 -> 
> stringField2#112, stringField1#111 -> stringField1#111, stringField3#113 -> 
> stringField3#113, intField#114 -> 
> intField#114),CarbonDatasourceHadoopRelation [ Database name :default, Table 
> name :study_carbondata, Schema 
> :Some(StructType(StructField(stringField1,StringType,true), 
> StructField(stringField2,StringType,true), 
> StructField(stringField3,StringType,true), 
> StructField(intField,IntegerType,true), StructField(longField,LongType,true), 
> StructField(int2Field,IntegerType,true))) ])], 
> ExcludeProfile(ArrayBuffer(stringField2#112, stringField1#111)), 
> CarbonAliasDecoderRelation(), true
> +- Project [stringField3#113, intField#114, longField#115, int2Field#116, 
> concat(stringField1#111, _test) AS stringfield1-updatedColumn#263, 
> concat(stringField2#112, _test) AS stringfield2-updatedColumn#264, 
> UDF(UDF:getTupleId()) AS segId#286]
>+- Filter (isnotnull(stringField3#113) && (stringField3#113 = 1))
>   +- 
> Relation[stringField1#111,stringField2#112,stringField3#113,intField#114,longField#115L,int2Field#116]
>  CarbonDatasourceHadoopRelation [ Database name :default, Table