Re:Re: insert into carbon table failed

a Sat, 25 Mar 2017 09:50:01 -0700

Thank you  Ravindra!
Version:
My carbondata version is 1.0,spark version is 1.6.3,hadoop version is 
2.7.1,hive version is 1.1.0
one of the containers log:
17/03/25 22:07:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 
15: SIGTERM
17/03/25 22:07:09 INFO storage.DiskBlockManager: Shutdown hook called
17/03/25 22:07:09 INFO util.ShutdownHookManager: Shutdown hook called
17/03/25 22:07:09 INFO util.ShutdownHookManager: Deleting directory 
/data1/hadoop/hd_space/tmp/nm-local-dir/usercache/storm/appcache/application_1490340325187_0042/spark-84b305f9-af7b-4f58-a809-700345a84109
17/03/25 22:07:10 ERROR impl.ParallelReadMergeSorterImpl: pool-23-thread-2 
java.io.IOException: Error reading file: 
hdfs://xxxx_table_tmp/dt=2017-01-01/pt=ios/000006_0
        at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1046)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:263)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(OrcRawRecordMerger.java:547)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1234)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1218)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1150)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1136)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:249)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at 
org.apache.carbondata.spark.rdd.NewRddIterator.hasNext(NewCarbonDataLoadRDD.scala:412)
        at 
org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.internalHasNext(InputProcessorStepImpl.java:163)
        at 
org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.getBatch(InputProcessorStepImpl.java:221)
        at 
org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:183)
        at 
org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:117)
        at 
org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:80)
        at 
org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:73)
        at 
org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:196)
        at 
org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:177)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
        at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
        at java.io.DataInputStream.readFully(DataInputStream.java:195)
        at 
org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:112)
        at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228)
        at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805)
        at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776)
        at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
        at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
        at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1042)
        ... 26 more
I will try to set enable.unsafe.sort=true and remove BUCKETCOLUMNS property 
,and try again.



At 2017-03-25 20:55:03, "Ravindra Pesala" <ravi.pes...@gmail.com> wrote:
>Hi,
>
>Carbodata launches one job per each node to sort the data at node level and
>avoid shuffling. Internally it uses threads to use parallel load. Please
>use carbon.number.of.cores.while.loading property in carbon.properties file
>and set the number of cores it should use per machine while loading.
>Carbondata sorts the data  at each node level to maintain the Btree for
>each node per segment. It improves the query performance by filtering
>faster if we have Btree at node level instead of each block level.
>
>1.Which version of Carbondata are you using?
>2.There are memory issues in Carbondata-1.0 version and are fixed current
>master.
>3.And you can improve the performance by enabling enable.unsafe.sort=true in
>carbon.properties file. But it is not supported if bucketing of columns are
>enabled. We are planning to support unsafe sort load for bucketing also in
>next version.
>
>Please send the executor log to know about the error you are facing.
>
>
>
>
>
>
>Regards,
>Ravindra
>
>On 25 March 2017 at 16:18, ww...@163.com <ww...@163.com> wrote:
>
>> Hello!
>>
>> *0、The failure*
>> When i insert into carbon table，i encounter failure。The failure is  as
>> follow:
>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
>> Reason: Slave lost+details
>>
>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most 
>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): 
>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks) 
>> Reason: Slave lost
>> Driver stacktrace:
>>
>> the stage:
>>
>> *Step:*
>> *1、start spark－shell*
>> ./bin/spark-shell \
>> --master yarn-client \
>> --num-executors 5 \  (I tried to set this parameter range from 10 to
>> 20,but the second job has only 5 tasks)
>> --executor-cores 5 \
>> --executor-memory 20G \
>> --driver-memory 8G \
>> --queue root.default \
>> --jars /xxx.jar
>>
>> //spark-default.conf spark.default.parallelism=320
>>
>> import org.apache.spark.sql.CarbonContext
>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore")
>>
>> *2、create table*
>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
>> String,plat String,sty String,is_pay String,is_vip String,is_mpack
>> String,scene String,status String,nw String,isc String,area String,spttag
>> String,province String,isp String,city String,tv String,hwm String,pip
>> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt
>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED
>> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh,
>> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,
>> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,
>> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,
>> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')")
>>
>> //notes，set "fo" column BUCKETCOLUMNS is to join another table
>> //the column distinct values are as follows:
>>
>>
>> *3、insert into table*（xxxx_table_tmp  is a hive extenal orc table，has 20
>> 0000 0000 records）
>> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_
>> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,
>> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from
>> xxxx_table_tmp where dt='2017-01-01'")
>>
>> *4、spark split sql into two jobs，the first finished succeeded, but the
>> second failed:*
>>
>>
>> *5、The second job stage:*
>>
>>
>>
>> *Question:*
>> 1、Why the second job has only five jobs,but the first job has 994 jobs ?(
>> note:My hadoop cluster has 5 datanode）
>>       I guess it caused the failure
>> 2、In the sources,i find DataLoadPartitionCoalescer.class，is it means that
>> "one datanode has only one partition ,and then the task is only one on the
>> datanode"?
>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set
>> as follow,but i can not find "carbon.table.split.partition.enable" in
>> other parts of the project。
>>      I set "carbon.table.split.partition.enable" to true, but the second
>> job has only five jobs.How to use this property?
>>      ExampleUtils :
>>     // whether use table split partition
>>     // true -> use table split partition, support multiple partition
>> loading
>>     // false -> use node split partition, support data load by host
>> partition
>>     
>> CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
>> "false")
>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can
>> i speed it.
>> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
>> 20,but the second job has only 5 tasks
>>      the other parameter executor-memory = 20G is enough?
>>
>> I need your help!Thank you very much!
>>
>> ww...@163.com
>>
>> ------------------------------
>> ww...@163.com
>>
>
>
>
>-- 
>Thanks & Regards,
>Ravi

Re:Re: insert into carbon table failed

Reply via email to