Re: Re: Getting [Problem in loading segment blocks] error after doing multi update operations

2018-03-21 Thread yixu2001
dev 
   loginfo
 first of all,writing deletedelta file will create an empty file and then do 
write,flush,close,and exception during write,flush,close would cause an empty 
file (refer to 
:org.apache.carbondata.core.writer.CarbonDeleteDeltaWriterImpl#write(org.apache.carbondata.core.mutate.DeleteDeltaBlockDetails))
1.as for a and b,we add logs and exception happends during close.
WARN DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/user/ip_crm/public/offer_prod_inst_rel_cab/Fact/Part0/Segment_0/part-8-4_batchno0-0-1518490201583.deletedelta
 (inode 1306621743): File does not exist. Holder 
DFSClient_NONMAPREDUCE_-754557169_117 does not have any open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3439)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3242)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3080)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3040)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:789)
...
2.for c,yes you can give us your jar   spark 2.1.1+hadoop2.7.2 mail 
93224...@qq.com


yixu2001
 
From: Liang Chen
Date: 2018-03-20 22:06
To: dev
Subject: Re: Getting [Problem in loading segment blocks] error after doing 
multi update operations
Hi
 
Thanks for your feedback.
Let me first reproduce this issue, and check the detail.
 
Regards
Liang
 
 
yixu2001 wrote
> I'm using carbondata1.3+spark2.1.1+hadoop2.7.1 to do multi update
> operations
> here is the replay step:
> 
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.CarbonSession._
> val cc =
> SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://ns1/user/ip_crm")
>   
> // create table
> cc.sql("CREATE TABLE IF NOT EXISTS public.c_compact3 (id string,qqnum
> string,nick string,age string,gender string,auth string,qunnum string,mvcc
> string) STORED BY 'carbondata' TBLPROPERTIES ('SORT_COLUMNS'='id')").show;
> // data prepare
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.Row
> val schema =
> StructType(StructField("id",StringType,true)::StructField("qqnum",StringType,true)::StructField("nick",StringType,true)::StructField("age",StringType,true)::StructField("gender",StringType,true)::StructField("auth",StringType,true)::StructField("qunnum",StringType,true)::StructField("mvcc",IntegerType,true)::Nil)
> val data = cc.sparkContext.parallelize(1 to 5000,4).map { i =>
> Row.fromSeq(Seq(i.toString,i.toString.concat("").concat(i.toString),"2009-05-27",i.toString.concat("c").concat(i.toString),"1","1",i.toString.concat("dd").concat(i.toString),1))
> }
> cc.createDataFrame(data, schema).createOrReplaceTempView("ddd")
> cc.sql("insert into public.c_compact3 select * from ddd").show;
> 
> // update table multi times in while loop
> import scala.util.Random
>var bcnum=1;
>  while (true) {
>bcnum=1+bcnum;
>   println(bcnum);
>   println("1");
>   var randomNmber = Random.nextInt(1000)
>   cc.sql(s"DROP TABLE IF EXISTS cache_compact3").show;
>   cc.sql(s"cache table  cache_compact3  as select * from  
> public.c_compact3  where pmod(cast(id as
> int),1000)=$randomNmber").show(100, false);
>   cc.sql("select count(*) from cache_compact3").show;
>cc.sql("update public.c_compact3 a set
> (a.id,a.qqnum,a.nick,a.age,a.gender,a.auth,a.qunnum,a.mvcc)=(select
> b.id,b.qqnum,b.nick,b.age,b.gender,b.auth,b.qunnum,b.mvcc from  
> cache_compact3 b where b.id=a.id)").show;
>println("2");
>Thread.sleep(3);
>}
> 
> after about 30 loop,[Problem in loading segment blocks] error happended.
> then performing select count operations on the table and get exception
> like follows:
> 
> scala>cc.sql("select count(*) from  public.c_compact3").show;
> 18/02/25 08:49:46 AUDIT CarbonMetaStoreFactory:
> [hdd340][ip_crm][Thread-1]File based carbon metastore is enabled
> Exchange SinglePartition
> +- *HashAggregate(keys=[], functions=[partial_count(1)],
> output=[count#33L])
>+- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :public,
> Table name :c_compact3, Schema
> :Some(StructType(StructField(id,StringType,true),
> StructField(qqnum,StringType,true), StructField(nick,StringType,true),
> StructField(age,StringType,true), StructField(gender,StringType,true),
> StructField(auth,StringType,true), StructField(qunnum,StringType,true),
> StructField(mvcc,StringType,true))) ] public.c_compact3[]
> 
>   at
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
>   at
> org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:112)
>   at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   

Getting Different Encoding in timestamp and date datatype.

2018-03-21 Thread Jatin Demla
Hi,

while analyzing an issue related to filter query on timestamp(or Date)
column,
I found that, in case of Timestamp datatype, the encoding list is having
only INVERTED_INDEX encoding and in case of Date datatype the encoding list
contains DIRECT_DICTIONARY, DICTIONARY and INVERTED_INDEX.

Is it correct to have different encoding list for date and timestamp
datatype?
-- 
Thanks & Regards
Jatin


Re: Getting Different Encoding in timestamp and date datatype.

2018-03-21 Thread David CaiQiang
Hi Jatin, Timestamp column is non-dictionary by default. After adding the
Timestamp column to the table property 'dictionary_include', it will have
the same encoding list.





-
Best Regards
David Cai
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Getting Different Encoding in timestamp and date datatype.

2018-03-21 Thread Jatin Demla
Hi David

Thanks for the reply.
As per the earlier implementation, both date and timestamp data types were
treated as Direct Dictionary type and for both the keys were generated
using the DirectDictionaryKeyGenerator.
Now the behavior is changed and timestamp is treated as no dictionary by
default. I am not clear why this behavior is changed. Can you please
clarify my doubt?


On Thu, Mar 22, 2018 at 9:06 AM, David CaiQiang 
wrote:

> Hi Jatin, Timestamp column is non-dictionary by default. After adding the
> Timestamp column to the table property 'dictionary_include', it will have
> the same encoding list.
>
>
>
>
>
> -
> Best Regards
> David Cai
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>



-- 
Thanks & Regards
Jatin


CarbonData

2018-03-21 Thread Flying
?? http://carbondata.apache.org/timeseries-datamap-guide.html ??  
granualrity?? granularity

Re: Getting Different Encoding in timestamp and date datatype.

2018-03-21 Thread David CaiQiang
The direct dictionary ignores the millisecond of the timestamp data.
If the millisecond is needless, the direct dictionary uses the integer to
improve compression.



-
Best Regards
David Cai
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/