Re: Re: Getting [Problem in loading segment blocks] error after doing multi update operations

Liang Chen Fri, 23 Mar 2018 01:37:07 -0700

Hi

Already arrange to fix this issue, will raise the pull request asap.  Thanks
for your feedback.


Regards
Liang


yixu2001 wrote
> dev 
>  This issue has caused great trouble for our production. I will appreciate
> if you have any plan to fix it and let me know.
> 
> 
> yixu2001
>  
> From: BabuLal
> Date: 2018-03-23 00:20
> To: dev
> Subject: Re: Getting [Problem in loading segment blocks] error after doing
> multi update operations
> hi all 
> i am able to reproduce same exception in my cluster  and got the same
> exception. (Trace is listed below)
> ------ 
> scala> carbon.sql("select count(*) from public.c_compact4").show
> 2018-03-22 20:40:33,105 | WARN  | main | main
> spark.sql.sources.options.keys
> expected, but read nothing |
> org.apache.carbondata.common.logging.impl.StandardLogService.logWarnMessage(StandardLogService.java:168)
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
> tree:
> Exchange SinglePartition
> +- *HashAggregate(keys=[], functions=[partial_count(1)],
> output=[count#1443L])
>    +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :public,
> Table name :c_compact4, Schema
> :Some(StructType(StructField(id,StringType,true),
> StructField(qqnum,StringType,true), StructField(nick,StringType,true),
> StructField(age,StringType,true), StructField(gender,StringType,true),
> StructField(auth,StringType,true), StructField(qunnum,StringType,true),
> StructField(mvcc,StringType,true))) ] public.c_compact4[]
>   at
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
>   at
> org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:112)
>   at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
>   at
> org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:235)
>   at
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141)
>   at
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:372)
>   at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114
>   at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135
>   at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
>   at
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225)
>   at
> org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:308)
>   at
> org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:113)
>   at
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386)
>   at
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
>   at
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
>   at
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2128
>   at
> org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2127)
>   at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2818)
>   at org.apache.spark.sql.Dataset.head(Dataset.scala:2127)
>   at org.apache.spark.sql.Dataset.take(Dataset.scala:2342)
>   at org.apache.spark.sql.Dataset.showString(Dataset.scala:248)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:638)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:597)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:606)
>   ... 48 elided
> Caused by: java.io.IOException: Problem in loading segment blocks.
>   at
> org.apache.carbondata.core.indexstore.BlockletDataMapIndexStore.getAll(BlockletDataMapIndexStore.java:153)
>   at
> org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory.getDataMaps(BlockletDataMapFactory.java:76)
>   at
> org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:72)
>   at
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:739
>   at
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:666)
>   at
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:426)
>   at
> org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:107)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
>   at
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
>   at scala.Option.getOrElse(Option.scala:121
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:251
>   at
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
>   at org.apache.spark.ShuffleDependency.
> <init>
> (Dependency.scala:91)
>   at
> org.apache.spark.sql.execution.exchange.ShuffleExchange$.prepareShuffleDependency(ShuffleExchange.scala:273)
>   at
> org.apache.spark.sql.execution.exchange.ShuffleExchange.prepareShuffleDependency(ShuffleExchange.scala:84)
>   at
> org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:121)
>   at
> org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:112)
>   at
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
>   ... 81 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
>   at
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getLocations(AbstractDFSCarbonFile.java:509)
>   at
> org.apache.carbondata.core.indexstore.BlockletDataMapIndexStore.getAll(BlockletDataMapIndexStore.java:142)
>  
> ----------------Store location---- ----
> linux-49:/opt/babu # hadoop fs -ls
> /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/*.deletedelta
> -rw-rw-r--+  3 hdfs hive     177216 2018-03-22 18:20
> /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-0_batchno0-0-1521723019528.deletedelta
> -rw-r--r--   3 hdfs hive          0 2018-03-22 19:35
> /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-0_batchno0-0-1521723886214.deletedelta
> -rw-rw-r--+  3 hdfs hive      87989 2018-03-22 18:20
> /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-1_batchno0-0-1521723019528.deletedelta
> -rw-r--r--   3 hdfs hive          0 2018-03-22 19:35
> /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-1_batchno0-0-1521723886214.deletedelta
> -rw-rw-r--+  3 hdfs hive      87989 2018-03-22 18:20
> /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-2_batchno0-0-1521723019528.deletedelta
> -rw-r--r--   3 hdfs hive          0 2018-03-22 19:35
> /user/hive/warehouse/carbon.store/public/c_compact4/Fact/Part0/Segment_0/part-0-2_batchno0-0-1521723886214.deletedelta
>  
>  
> -----------------------------------------------------------
>  
> Issue reproduced technique :-
> Writing  content of delete delta is failed but deletedelta file created
> successfully . Failed  during Horizontal  Compaction ( added setSpaceQuota
> in hdfs so that file can created successfully and write to this file is
> failed)
> *Below points to be handled to fix this issue.* 
>  
> 1. When Horizontal Compaction is failed 0 byte delete delta file should be
> deleted currently it is not deleted. This is a cleaning part of the
> Horizontal Compaction fail .
> 2. delete delta of 0 byte should not be considered while reading .( we can
> further discuss about this solution ) . currently tablestatus file has the
> entry of deletedelta  timestamp.
> 3. If deleting  is in progress , file is created (name node has entry of
> file) but data writing is in progress (not yet flush) but at same time 
> select query is  triggered ,then Query will failed so this scenario also
> need to  handle.
>  
> @dev :- Please Let me know if any other detail is needed.
>  
> Thanks
> Babu
>  
>  
>  
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Re: Getting [Problem in loading segment blocks] error after doing multi update operations

Reply via email to