回复: Block B-tree loading failed add debug information

2017-11-06 Thread yixu2001
dev 
 What time is expected to release a patch, we can test 
 
发件人: yixu2001
发送时间: 2017-10-30 18:11
收件人: dev
主题: Block B-tree loading failed add debug information
dev 


environment  spark.2.1.1 carbondata 1.1.1  hadoop 2.7.2


add  debug information

Block B-tree loading faile

why CarbonUtil.calculateMetaSizeCalculation results getBlockLength=0 
getBlockOffset=8301549 ?

Caused by: 
org.apache.carbondata.core.datastore.exception.IndexBuilderException: Invalid 
carbon data file: 
hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_his1023c/Fact/Part0/Segment_1.1/part-0-172_batchno0-0-1508833127408.carbondata
 :getBlockLength=0 getBlockOffset=8301549 requiredMetaSize=-8301549 isV1=false 
getVersion=ColumnarFormatV3




1 debug information
scala> cc.sql("select prod_inst_id,count(*) from e_carbon.prod_inst_his1023c 
group by prod_inst_id having count(*)>1").show
[Stage 0:=>(157 + 50) / 
283]17/10/30 10:39:24 WARN scheduler.TaskSetManager: Lost task 252.0 in stage 
0.0 (TID 201, HDD010, executor 22): 
org.apache.carbondata.core.datastore.exception.IndexBuilderException: Block 
B-tree loading failed
at 
org.apache.carbondata.core.datastore.BlockIndexStore.fillLoadedBlocks(BlockIndexStore.java:264)
at 
org.apache.carbondata.core.datastore.BlockIndexStore.getAll(BlockIndexStore.java:189)
at 
org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:131)
at 
org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:186)
at 
org.apache.carbondata.core.scan.executor.impl.VectorDetailQueryExecutor.execute(VectorDetailQueryExecutor.java:36)
at 
org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.initialize(VectorizedCarbonRecordReader.java:112)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:204)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: 
org.apache.carbondata.core.datastore.exception.IndexBuilderException: Invalid 
carbon data file: 
hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_his1023c/Fact/Part0/Segment_1.1/part-0-172_batchno0-0-1508833127408.carbondata
 getBlockLength=0 getBlockOffset=8301549 requiredMetaSize=-8301549  
getVersion=ColumnarFormatV3
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.carbondata.core.datastore.BlockIndexStore.fillLoadedBlocks(BlockIndexStore.java:254)
... 21 more
Caused by: 
org.apache.carbondata.core.datastore.exception.IndexBuilderException: Invalid 
carbon data file: 
hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_his1023c/Fact/Part0/Segment_1.1/part-0-172_batchno0-0-1508833127408.carbondata=lianch:getBlockLength=0
 getBlockOffset=8301549 requiredMetaSize=-8301549 isV1=false 
getVersion=ColumnarFormatV3
at 
org.apache.carbondata.core.datastore.AbstractBlockIndexStoreCache.checkAndLoadTableBlocks(AbstractBlockIndexStoreCache.java:116)
at 
org.apache.carbondata.core.datastore.BlockIndexStore.loadBlock(BlockIndexStore.java:304)
at 
org.apache.carbondata.core.datastore.BlockIndexStore.get(BlockIndexStore.java:109)
at 
org.apache.carbondata.core.datastore.BlockIndexStore$BlockLoaderThread.call(BlockIndexStore.java:294)
at 
org.apache.carbondata.core.datastore.BlockIndexStore$BlockLoaderThread.call(BlockIndexStore.java:284)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more

[Stage 0:==>   (223 + 50) / 
283]17/10/30 10:39:26 ERROR scheduler.TaskSetManager: Task 252 in stage 0.0 
failed 10 times; aborting job
17/10/30 10:39:26 WARN scheduler.TaskSetManager: Lost task 61.0 in stage 0.0 
(TID 184, HDD012, executor 7): TaskKilled (killed intentionally)
17/10/30 10:39:26 WARN scheduler.TaskSetManager: Lost task 71.0 in stage 0.0 
(TID 212, HDD008, executor 18): TaskKilled (killed 

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

2017-11-06 Thread yixu2001
dev 
What time is expected to release a patch, we can test


yixu2001
 
From: yixu2001
Date: 2017-11-07 15:47
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for 
same row" in version 1.2.0,
dev 
 hello there is no solution, the next version carbondata 1.2.1. can be released?


yixu2001
 
From: yixu2001
Date: 2017-10-27 16:55
To: dev
CC: sounak; chenliang6136
Subject: Re: Re: Update statement failed with "Multiple input rows matched for 
same row" in version 1.2.0,
dev 
 
thank you
 We hope it will be resolved in a new release version carbondata 1.2.1.
In addition, the problem about maillist "Is query slowing down due to the 
fragmentations produced by the many update operations?" is also a very 
important problem in our selection process. We will be grateful if it can also 
be resolved in carbondata 1.2.1.


yixu2001
 
From: yixu2001
Date: 2017-10-21 11:12
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for 
same row" in version 1.2.0,
dev 
 
testdata
链接:http://pan.baidu.com/s/1boVAqeF 密码:qtdh
CSV
 
 
yixu2001
From: Raghunandan S
Date: 2017-10-20 16:56
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for 
same row" in version 1.2.0,
Should be fine.do we need to create any login to access it?
On Fri, 20 Oct 2017 at 1:31 PM, yixu2001  wrote:
> dev
>Baidu SkyDrive share it ?
>
>
> yixu2001
>
> From: sounak
> Date: 2017-10-20 16:14
> To: dev
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> As the tupleIds are also unique, the only option left is to get hold of
> your data and reproducing it in-house. Is it possible to generate your data
> in-house or get it?
>
> On Fri, Oct 20, 2017 at 12:14 PM, yixu2001  wrote:
>
> > dev
> >
> > scala> cc.sql(" select  c.tupleId,count(*)  from (select a.remark,a.id,
> > getTupleId() as tupleId from c_indextest1 a,c_indextest2 b where a.id=
> b.id)
> > c group by c.tupleId having count(*)>1").show;
> > +---++
> > |tupleId|count(1)|
> > +---++
> > +---++
> >
> > no Multiple input rows
> >
> >
> > yixu2001
> >
> > From: sounak
> > Date: 2017-10-20 10:57
> > To: dev
> > Subject: Re: Re: Update statement failed with "Multiple input rows
> matched
> > for same row" in version 1.2.0,
> > Slight rectification in typo of the query syntax
> >
> > " select a.remark,a.id, getTupleId() as *tupleId* from c_indextest1 a,
> > c_indextest2
> > b where a.id=b.id;"
> >
> > On Fri, Oct 20, 2017 at 7:54 AM, sounak  wrote:
> >
> > > Internally, we call a UDF to generate the TupleId and based on that
> > > tupleId we decide if the row is duplicated or not. You can run a
> slightly
> > > tweaked query
> > >
> > > " select a.remark,a.id, getTupleId() as TupleId from c_indextest1 a,
> > c_indextest2
> > > b where a.id=b.id; "
> > >
> > > In order to prompt Multiple Input Rows most probably the getTupleId()
> > will
> > > give duplicate entries.
> > >
> > > In between your given query runs fine with the data i have generated in
> > > house. So this is data dependent. So it will be good if we get
> > > *origin_data* or the script which has build the data for origin_data.
> > >
> > > On Fri, Oct 20, 2017 at 7:28 AM, yixu2001  wrote:
> > >
> > >> dev
> > >>one row record
> > >>
> > >>
> > >> yixu2001
> > >>
> > >> From: Liang Chen
> > >> Date: 2017-10-19 22:26
> > >> To: dev
> > >> Subject: Re: Re: Update statement failed with "Multiple input rows
> > >> matched for same row" in version 1.2.0,
> > >> Hi
> > >>
> > >> Execute the below query, return one row record or multiple row
> records ?
> > >> -
> > >> select a.remark from  c_indextest1 a where a.id=b.id
> > >>
> > >> Regards
> > >> Liang
> > >>
> > >>
> > >> yixu2001 wrote
> > >> > dev
> > >> >  You can follow the steps below to reproduce the problem.
> > >> > tables c_indextest2 has 1700w records and table c_indextest1 has
> about
> > >> 30w
> > >> > records.
> > >> >
> > >> > step 1:
> > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark
> > >> string)
> > >> > STORED BY 'carbondata'").show;
> > >> >
> > >> > step 2: origin_data is a existing table with 1700w records, and
> fields
> > >> in
> > >> > the table do not matter.
> > >> > cc.sql("insert into c_indextest2 select row_number() over(partition
> by
> > >> > a.PKid order by a.pkid) id,a.remark from  (SELECT '1' PKID,'dfsdd'
> > >> remark
> > >> > from origin_data limit 1700) a").show;
> > >> >
> > >> > step 3:
> > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark
> > >> string)
> > >> > STORED BY 'carbondata'").show;
> > >> >
> > >> > step 4:
> > >> > cc.sql("insert into c_indextest1 select * from  c_indextest2 where
> > >> > pmod(cast(ID as int),50)=43").show;
> > >> >
> > >> > setp 5:
> > >> > cc.sql("update c_indextest2 b set 

Re: Re: After MAJOR index lost

2017-11-06 Thread yixu2001
dev 
 What time is expected to release a patch, we can test


yixu2001
 
From: yixu2001
Date: 2017-11-07 15:14
To: dev
Subject: Re: Re: After MAJOR index lost
dev 
run   Multiple updates  Generate ten thousand delta  file
cc.sql("update e_carbon.prod_inst_his_c  A set 
(a.ETL_date,a.prod_inst_id,a.OWNER_CUST_ID,a.ACC_PROD_INST_ID,a.DVERSION,a.GTID,a.IND,a.ODS_STATE,A.SRC,a.kafka_date,a.PRODUCT_ID,a.ADDRESS_ID,a.PAYMENT_MODE_CD,a.PRODUCT_PASSWORD,a.IMPORTANT_LEVEL,a.AREA_CODE,a.ACC_NBR,a.EXCH_ID,a.COMMON_REGION_ID,a.REMARK,a.PAY_CYCLE,a.BEGIN_RENT_TIME,a.STOP_RENT_TIME,a.FINISH_TIME,a.STOP_STATUS,a.STATUS_CD,a.CREATE_DATE,a.STATUS_DATE,a.UPDATE_DATE,a.PROC_SERIAL,a.USE_CUST_ID,a.EXT_PROD_INST_ID,a.ADDRESS_DESC,a.AREA_ID,a.UPDATE_STAFF,a.CREATE_STAFF,a.REC_UPDATE_DATE,a.ACCOUNT,a.VERSION,a.COMMUNITY_ID,a.EXT_ACC_PROD_INST_ID,a.DISTRIBUTOR_ID,a.SHARDING_ID)=(select
 
b.etl_date,b.prod_inst_id,b.OWNER_CUST_ID,b.ACC_PROD_INST_ID,B.DVERSION,b.GTID,b.IND,B.ODS_STATE,B.SRC,b.kafka_date,b.PRODUCT_ID,b.ADDRESS_ID,b.PAYMENT_MODE_CD,b.PRODUCT_PASSWORD,b.IMPORTANT_LEVEL,b.AREA_CODE,b.ACC_NBR,b.EXCH_ID,b.COMMON_REGION_ID,b.REMARK,b.PAY_CYCLE,b.BEGIN_RENT_TIME,b.STOP_RENT_TIME,b.FINISH_TIME,b.STOP_STATUS,b.STATUS_CD,b.CREATE_DATE,b.STATUS_DATE,b.UPDATE_DATE,b.PROC_SERIAL,b.USE_CUST_ID,b.EXT_PROD_INST_ID,b.ADDRESS_DESC,b.AREA_ID,b.UPDATE_STAFF,b.CREATE_STAFF,b.REC_UPDATE_DATE,b.ACCOUNT,b.VERSION,b.COMMUNITY_ID,b.EXT_ACC_PROD_INST_ID,b.DISTRIBUTOR_ID,b.SHARDING_ID
 from cache_prod_inst_his_u b where a.his_id=b.his_id)").show;


yixu2001
 
From: Liang Chen
Date: 2017-11-02 02:29
To: dev
Subject: Re: After MAJOR index lost
Hi
 
Yes, checked the log message, looks have some issues.
Can you share the reproduce steps:
Did you use how many machines to do data load, and load how many times? 
 
Regards
Liang
 
 
yixu2001 wrote
> dev 
> environment  spark.2.1.1 carbondata 1.1.1  hadoop 2.7.2
> 
> run  ALTER table  e_carbon.prod_inst_all_c COMPACT 'MAJOR'
> CLEAN FILES FOR TABLE  e_carbon.prod_inst_all_c
> 
> 17/10/30 14:59:21 ERROR filesystem.AbstractDFSCarbonFile: main Exception
> occurred:File does not exist:
> hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/624_batchno0-0-1509233731459.carbonindex
> 
> 
> 623_batchno0-0-1509233118616.carbonindex  and 
> 625_batchno0-0-1509233731459.carbonindex betwwen the lost  of
> 624_batchno0-0-1509233731459.carbonindex
> 
> -rw-r--r--   3 e_carbon e_carbon_group   6750 2017-10-29 07:17
> /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/621_batchno0-0-1509231670521.carbonindex
> -rw-r--r--   3 e_carbon e_carbon_group  11320 2017-10-29 07:19
> /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/622_batchno0-0-1509232641994.carbonindex
> -rw-r--r--   3 e_carbon e_carbon_group   6858 2017-10-29 07:35
> /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/623_batchno0-0-1509233118616.carbonindex
> -rw-r--r--   3 e_carbon e_carbon_group  11423 2017-10-29 07:37
> /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/625_batchno0-0-1509233731459.carbonindex
> 
> scala> cc.sql("select his_id,count(*) from e_carbon.prod_inst_his_c group
> by his_id having count(*)>1").show
> 17/10/30 14:59:21 ERROR filesystem.AbstractDFSCarbonFile: main Exception
> occurred:File does not exist:
> hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/624_batchno0-0-1509233731459.carbonindex
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
> tree:
> Exchange hashpartitioning(his_id#1818, 100)
> +- *HashAggregate(keys=[his_id#1818], functions=[partial_count(1),
> partial_count(1)], output=[his_id#1818, count#1967L, count#1968L])
>+- *BatchedScan CarbonDatasourceHadoopRelation [ Database name
> :e_carbon, Table name :prod_inst_his_c, Schema
> :Some(StructType(StructField(his_id,StringType,true),
> StructField(ETL_date,StringType,true),
> StructField(prod_inst_id,StringType,true),
> StructField(owner_cust_id,StringType,true),
> StructField(acc_prod_inst_id,StringType,true),
> StructField(DVERSION,StringType,true), StructField(GTID,StringType,true),
> StructField(IND,StringType,true), StructField(ODS_STATE,StringType,true),
> StructField(SRC,StringType,true), StructField(kafka_date,StringType,true),
> StructField(product_id,StringType,true),
> StructField(address_id,StringType,true),
> StructField(payment_mode_cd,StringType,true),
> StructField(product_password,StringType,true),
> StructField(important_level,StringType,true),
> StructField(area_code,StringType,true),
> StructField(acc_nbr,StringType,true),
> StructField(exch_id,StringType,true),
> StructField(common_region_id,StringType,true),
> StructField(remark,StringType,true),
> StructField(pay_cycle,StringType,true),
> StructField(begin_rent_time,StringType,true),
> StructField(stop_rent_time,StringType,true),
> 

Re: Re: Delegation Token can be issued only with kerberos or web authentication" will occur in yarn cluster

2017-11-06 Thread yixu2001
dev 
 hello there is no recent progress


yixu2001
 
From: yixu2001
Date: 2017-11-06 16:43
To: dev
CC: prnaresh.naresh; 郭海涛
Subject: Re: Re: Delegation Token can be issued only with kerberos or web 
authentication" will occur in yarn cluster
dev 
 I have copied core-site.xml to spark2 conf folder, it does not work.
I will share you the jar I am using(please get the jar file from link  

download link :https://pan.baidu.com/s/1b6AG5Spassword:rww2
 whould you help me to confirm whether the jar is suitable?
If the jar is not suitable, could you please send me a suitable one to try? 


yixu2001
 
From: Naresh P R
Date: 2017-11-04 01:26
To: dev
Subject: Re: Re: Delegation Token can be issued only with kerberos or web 
authentication" will occur in yarn cluster
Hi yixu2001,
 
From hadoop code, i could see IOException("Delegation Token can be issued
only with kerberos or web authentication") is thrown only if authentication
method is set as "SIMPLE".
 
private boolean isAllowedDelegationTokenOp() throws IOException {
AuthenticationMethod authMethod = this.getConnectionAuthenticationMethod();
return !UserGroupInformation.isSecurityEnabled() || authMethod ==
AuthenticationMethod.KERBEROS || authMethod ==
AuthenticationMethod.KERBEROS_SSL || authMethod ==
AuthenticationMethod.CERTIFICATE;
}
 
Token getDelegationToken(Text renewer)
throws IOException {

if (!this.isAllowedDelegationTokenOp()) {
throw new IOException("Delegation Token can be issued only
with kerberos or web authentication");
}
 

 
}
 
Can you try to execute queries by copying core-site.xml from hadoop-conf
folder to spark2 conf folder & classpath of spark-submit?
 
From the provided logs, i could
see carbondata_2.11-1.1.1-bdd-hadoop2.7.2.jar size is 9607344bytes, please
make sure it has only carbondata classes.
 
I could see Carbon explicitly calling
"TokenCache.obtainTokensForNamenodes" which
is throwing this exception.
 
If above mentioned steps dint work, you can raise a JIRA to investigate
further on this.

Regards,
Naresh P R
 
On Fri, Nov 3, 2017 at 3:10 PM, yixu2001  wrote:
 
> prnaresh.naresh, dev
>
>  The carbon jar I used does not include hadoop classes & core-site.xml.
> The attachment include the jar list while submtting
> spark job, please confirm it.
> --
> yixu2001
>
>
> *From:* Naresh P R 
> *Date:* 2017-11-03 16:07
> *To:* yixu2001 
> *Subject:* Re: Re: Delegation Token can be issued only with kerberos or
> web authentication" will occur in yarn cluster
> Hi yixu2001,
>
> Are you using carbon shaded jar with hadoop classes & core-site.xml
> included in carbon jar ?
>
> If so, can you try to use carbondata individual component jars while
> submtting spark job?
>
> As per my understanding, this happens if client core-site.xml has
> hadoop.security.authentication=simple & hdfs is kerberized.
>
> You can also enable verbose to see hadoop jars used in the error trace
> while querying carbon tables.
>
> Also i am not sure whether CarbonData is tested in HDP kerberos cluster.
> ---
> Regards,
> Naresh P R
>
>
> On Fri, Nov 3, 2017 at 8:36 AM, yixu2001  wrote:
>
>> Naresh P R:
>>  For the attachments can not be uploaded in maillist,I have
>> add the attachments to the mail for you, please check it.
>>  Our platform is installed with HDP 2.4, but spark 2.1 is
>> not included in HDP 2.4, we using spark 2.1 with additiona
>> l installed of apache version.
>> --
>> yixu2001
>>
>>
>> *From:* Naresh P R 
>> *Date:* 2017-11-02 22:02
>> *To:* dev 
>> *Subject:* Re: Re: Delegation Token can be issued only with kerberos or
>> web authentication" will occur in yarn cluster
>> Hi yixu,
>>
>> I am not able to see any attachment in your previous mail.
>> ---
>> Regards,
>> Naresh P R
>>
>> On Thu, Nov 2, 2017 at 4:40 PM, yixu2001  wrote:
>>
>>> dev
>>>  Please refer to the attachment "cluster carbon error2.txt"
>>> for the log trace.
>>> In this log, I try 2 query statements:
>>> select * from e_carbon.prod_inst_his   prod_inst_his is
>>> a hive table, it success.
>>> select * from e_carbon.prod_inst_his_c prod_inst_his_c i
>>> s a carbon table, it failed.
>>>
>>> I pass the principal in my start script, please refer to the
>>>  attachment "testCluster.sh
>>>
>>> ".
>>>
>>> I have set hive.server2.enable.doAs = false in the above tes
>>> t and I have printed it in the log.
>>> --
>>> yixu2001
>>>
>>>
>>> *From:* Naresh P R 
>>> *Date:* 2017-11-01 19:40
>>> *To:* dev 
>>> *Subject:* Re: Delegation Token can be issued only with kerberos or web
>>> authentication" will occur in yarn cluster
>>> Hi,
>>>
>>> Ideally kerberos authentication should work with carbon table, Can you
>>> share us log trace to analyze 

Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,

2017-11-06 Thread yixu2001
dev 
 hello there is no solution, the next version carbondata 1.2.1. can be released?


yixu2001
 
From: yixu2001
Date: 2017-10-27 16:55
To: dev
CC: sounak; chenliang6136
Subject: Re: Re: Update statement failed with "Multiple input rows matched for 
same row" in version 1.2.0,
dev 
 
thank you
 We hope it will be resolved in a new release version carbondata 1.2.1.
In addition, the problem about maillist "Is query slowing down due to the 
fragmentations produced by the many update operations?" is also a very 
important problem in our selection process. We will be grateful if it can also 
be resolved in carbondata 1.2.1.


yixu2001
 
From: yixu2001
Date: 2017-10-21 11:12
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for 
same row" in version 1.2.0,
dev 
 
testdata
链接:http://pan.baidu.com/s/1boVAqeF 密码:qtdh
CSV
 
 
yixu2001
From: Raghunandan S
Date: 2017-10-20 16:56
To: dev
Subject: Re: Re: Update statement failed with "Multiple input rows matched for 
same row" in version 1.2.0,
Should be fine.do we need to create any login to access it?
On Fri, 20 Oct 2017 at 1:31 PM, yixu2001  wrote:
> dev
>Baidu SkyDrive share it ?
>
>
> yixu2001
>
> From: sounak
> Date: 2017-10-20 16:14
> To: dev
> Subject: Re: Re: Update statement failed with "Multiple input rows matched
> for same row" in version 1.2.0,
> As the tupleIds are also unique, the only option left is to get hold of
> your data and reproducing it in-house. Is it possible to generate your data
> in-house or get it?
>
> On Fri, Oct 20, 2017 at 12:14 PM, yixu2001  wrote:
>
> > dev
> >
> > scala> cc.sql(" select  c.tupleId,count(*)  from (select a.remark,a.id,
> > getTupleId() as tupleId from c_indextest1 a,c_indextest2 b where a.id=
> b.id)
> > c group by c.tupleId having count(*)>1").show;
> > +---++
> > |tupleId|count(1)|
> > +---++
> > +---++
> >
> > no Multiple input rows
> >
> >
> > yixu2001
> >
> > From: sounak
> > Date: 2017-10-20 10:57
> > To: dev
> > Subject: Re: Re: Update statement failed with "Multiple input rows
> matched
> > for same row" in version 1.2.0,
> > Slight rectification in typo of the query syntax
> >
> > " select a.remark,a.id, getTupleId() as *tupleId* from c_indextest1 a,
> > c_indextest2
> > b where a.id=b.id;"
> >
> > On Fri, Oct 20, 2017 at 7:54 AM, sounak  wrote:
> >
> > > Internally, we call a UDF to generate the TupleId and based on that
> > > tupleId we decide if the row is duplicated or not. You can run a
> slightly
> > > tweaked query
> > >
> > > " select a.remark,a.id, getTupleId() as TupleId from c_indextest1 a,
> > c_indextest2
> > > b where a.id=b.id; "
> > >
> > > In order to prompt Multiple Input Rows most probably the getTupleId()
> > will
> > > give duplicate entries.
> > >
> > > In between your given query runs fine with the data i have generated in
> > > house. So this is data dependent. So it will be good if we get
> > > *origin_data* or the script which has build the data for origin_data.
> > >
> > > On Fri, Oct 20, 2017 at 7:28 AM, yixu2001  wrote:
> > >
> > >> dev
> > >>one row record
> > >>
> > >>
> > >> yixu2001
> > >>
> > >> From: Liang Chen
> > >> Date: 2017-10-19 22:26
> > >> To: dev
> > >> Subject: Re: Re: Update statement failed with "Multiple input rows
> > >> matched for same row" in version 1.2.0,
> > >> Hi
> > >>
> > >> Execute the below query, return one row record or multiple row
> records ?
> > >> -
> > >> select a.remark from  c_indextest1 a where a.id=b.id
> > >>
> > >> Regards
> > >> Liang
> > >>
> > >>
> > >> yixu2001 wrote
> > >> > dev
> > >> >  You can follow the steps below to reproduce the problem.
> > >> > tables c_indextest2 has 1700w records and table c_indextest1 has
> about
> > >> 30w
> > >> > records.
> > >> >
> > >> > step 1:
> > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark
> > >> string)
> > >> > STORED BY 'carbondata'").show;
> > >> >
> > >> > step 2: origin_data is a existing table with 1700w records, and
> fields
> > >> in
> > >> > the table do not matter.
> > >> > cc.sql("insert into c_indextest2 select row_number() over(partition
> by
> > >> > a.PKid order by a.pkid) id,a.remark from  (SELECT '1' PKID,'dfsdd'
> > >> remark
> > >> > from origin_data limit 1700) a").show;
> > >> >
> > >> > step 3:
> > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark
> > >> string)
> > >> > STORED BY 'carbondata'").show;
> > >> >
> > >> > step 4:
> > >> > cc.sql("insert into c_indextest1 select * from  c_indextest2 where
> > >> > pmod(cast(ID as int),50)=43").show;
> > >> >
> > >> > setp 5:
> > >> > cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from
> > >> > c_indextest1 a where a.id=b.id)").show;
> > >> >
> > >> >
> > >> > yixu2001
> > >> >
> > >> > From: sounak
> > >> > Date: 2017-10-19 18:26
> > >> > To: dev
> > >> > Subject: Re: Re: 

Re: Re: After MAJOR index lost

2017-11-06 Thread yixu2001
dev 
run   Multiple updates  Generate ten thousand delta  file
cc.sql("update e_carbon.prod_inst_his_c  A set 
(a.ETL_date,a.prod_inst_id,a.OWNER_CUST_ID,a.ACC_PROD_INST_ID,a.DVERSION,a.GTID,a.IND,a.ODS_STATE,A.SRC,a.kafka_date,a.PRODUCT_ID,a.ADDRESS_ID,a.PAYMENT_MODE_CD,a.PRODUCT_PASSWORD,a.IMPORTANT_LEVEL,a.AREA_CODE,a.ACC_NBR,a.EXCH_ID,a.COMMON_REGION_ID,a.REMARK,a.PAY_CYCLE,a.BEGIN_RENT_TIME,a.STOP_RENT_TIME,a.FINISH_TIME,a.STOP_STATUS,a.STATUS_CD,a.CREATE_DATE,a.STATUS_DATE,a.UPDATE_DATE,a.PROC_SERIAL,a.USE_CUST_ID,a.EXT_PROD_INST_ID,a.ADDRESS_DESC,a.AREA_ID,a.UPDATE_STAFF,a.CREATE_STAFF,a.REC_UPDATE_DATE,a.ACCOUNT,a.VERSION,a.COMMUNITY_ID,a.EXT_ACC_PROD_INST_ID,a.DISTRIBUTOR_ID,a.SHARDING_ID)=(select
 
b.etl_date,b.prod_inst_id,b.OWNER_CUST_ID,b.ACC_PROD_INST_ID,B.DVERSION,b.GTID,b.IND,B.ODS_STATE,B.SRC,b.kafka_date,b.PRODUCT_ID,b.ADDRESS_ID,b.PAYMENT_MODE_CD,b.PRODUCT_PASSWORD,b.IMPORTANT_LEVEL,b.AREA_CODE,b.ACC_NBR,b.EXCH_ID,b.COMMON_REGION_ID,b.REMARK,b.PAY_CYCLE,b.BEGIN_RENT_TIME,b.STOP_RENT_TIME,b.FINISH_TIME,b.STOP_STATUS,b.STATUS_CD,b.CREATE_DATE,b.STATUS_DATE,b.UPDATE_DATE,b.PROC_SERIAL,b.USE_CUST_ID,b.EXT_PROD_INST_ID,b.ADDRESS_DESC,b.AREA_ID,b.UPDATE_STAFF,b.CREATE_STAFF,b.REC_UPDATE_DATE,b.ACCOUNT,b.VERSION,b.COMMUNITY_ID,b.EXT_ACC_PROD_INST_ID,b.DISTRIBUTOR_ID,b.SHARDING_ID
 from cache_prod_inst_his_u b where a.his_id=b.his_id)").show;


yixu2001
 
From: Liang Chen
Date: 2017-11-02 02:29
To: dev
Subject: Re: After MAJOR index lost
Hi
 
Yes, checked the log message, looks have some issues.
Can you share the reproduce steps:
Did you use how many machines to do data load, and load how many times? 
 
Regards
Liang
 
 
yixu2001 wrote
> dev 
> environment  spark.2.1.1 carbondata 1.1.1  hadoop 2.7.2
> 
> run  ALTER table  e_carbon.prod_inst_all_c COMPACT 'MAJOR'
> CLEAN FILES FOR TABLE  e_carbon.prod_inst_all_c
> 
> 17/10/30 14:59:21 ERROR filesystem.AbstractDFSCarbonFile: main Exception
> occurred:File does not exist:
> hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/624_batchno0-0-1509233731459.carbonindex
> 
> 
> 623_batchno0-0-1509233118616.carbonindex  and 
> 625_batchno0-0-1509233731459.carbonindex betwwen the lost  of
> 624_batchno0-0-1509233731459.carbonindex
> 
> -rw-r--r--   3 e_carbon e_carbon_group   6750 2017-10-29 07:17
> /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/621_batchno0-0-1509231670521.carbonindex
> -rw-r--r--   3 e_carbon e_carbon_group  11320 2017-10-29 07:19
> /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/622_batchno0-0-1509232641994.carbonindex
> -rw-r--r--   3 e_carbon e_carbon_group   6858 2017-10-29 07:35
> /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/623_batchno0-0-1509233118616.carbonindex
> -rw-r--r--   3 e_carbon e_carbon_group  11423 2017-10-29 07:37
> /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/625_batchno0-0-1509233731459.carbonindex
> 
> scala> cc.sql("select his_id,count(*) from e_carbon.prod_inst_his_c group
> by his_id having count(*)>1").show
> 17/10/30 14:59:21 ERROR filesystem.AbstractDFSCarbonFile: main Exception
> occurred:File does not exist:
> hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/624_batchno0-0-1509233731459.carbonindex
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
> tree:
> Exchange hashpartitioning(his_id#1818, 100)
> +- *HashAggregate(keys=[his_id#1818], functions=[partial_count(1),
> partial_count(1)], output=[his_id#1818, count#1967L, count#1968L])
>+- *BatchedScan CarbonDatasourceHadoopRelation [ Database name
> :e_carbon, Table name :prod_inst_his_c, Schema
> :Some(StructType(StructField(his_id,StringType,true),
> StructField(ETL_date,StringType,true),
> StructField(prod_inst_id,StringType,true),
> StructField(owner_cust_id,StringType,true),
> StructField(acc_prod_inst_id,StringType,true),
> StructField(DVERSION,StringType,true), StructField(GTID,StringType,true),
> StructField(IND,StringType,true), StructField(ODS_STATE,StringType,true),
> StructField(SRC,StringType,true), StructField(kafka_date,StringType,true),
> StructField(product_id,StringType,true),
> StructField(address_id,StringType,true),
> StructField(payment_mode_cd,StringType,true),
> StructField(product_password,StringType,true),
> StructField(important_level,StringType,true),
> StructField(area_code,StringType,true),
> StructField(acc_nbr,StringType,true),
> StructField(exch_id,StringType,true),
> StructField(common_region_id,StringType,true),
> StructField(remark,StringType,true),
> StructField(pay_cycle,StringType,true),
> StructField(begin_rent_time,StringType,true),
> StructField(stop_rent_time,StringType,true),
> StructField(finish_time,StringType,true),
> StructField(stop_status,StringType,true),
> StructField(status_cd,StringType,true),
> StructField(create_date,StringType,true),
> 

Re: Error while creating table in carbondata

2017-11-06 Thread Bhavya Aggarwal
Hi,

I think the problem is that the class signature of OpenSource spark and
Cloudera spark do not match for CatalogTable class,  there is an additional
parameter in the Cloudera spark version shown highlighted below, we may to
try building the Carbondata with the spark cloudera version to make it work.



















*case class CatalogTable(identifier: TableIdentifier,tableType:
CatalogTableType,storage: CatalogStorageFormat,schema:
StructType,provider: Option[String] = None,partitionColumnNames:
Seq[String] = Seq.empty,bucketSpec: Option[BucketSpec] = None,
owner: String = "",createTime: Long = System.currentTimeMillis,
lastAccessTime: Long = -1,properties: Map[String, String] =
Map.empty,stats: Option[Statistics] = None,viewOriginalText:
Option[String] = None,viewText: Option[String] = None,comment:
Option[String] = None,unsupportedFeatures: Seq[String] = Seq.empty,
tracksPartitionsInCatalog: Boolean = false,schemaPreservesCase: Boolean
= true) {*


Thanks and regards
Bhavya

On Tue, Nov 7, 2017 at 7:17 AM, Lionel CL  wrote:

> mvn -DskipTests -Pspark-2.1 clean package
> The pom file was changed as which provided in former email.
>
>
>
> 在 2017/11/6 下午7:47,“Bhavya Aggarwal” 写入:
>
> >Hi,
> >
> >Can you please let me know how are you building the Carbondata assembly
> >jar, or which command you are running to build carbondata.
> >
> >Regards
> >Bhavya
> >
> >On Mon, Nov 6, 2017 at 2:18 PM, Lionel CL  wrote:
> >
> >> Yes, there is a catalyst jar under the path
> /opt/cloudera/parcels/SPARK2/
> >> lib/spark2/jars/
> >>
> >> spark-catalyst_2.11-2.1.0.cloudera1.jar
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> 在 2017/11/6 下午4:12,“Bhavya Aggarwal” 写入:
> >>
> >> >Hi,
> >> >
> >> >Can you please check if you have spark-catalyst jar in $SPARK_HOME/jars
> >> >folder for your  cloudera version, if its not there please try to
> include
> >> >it and retry.
> >> >
> >> >Thanks and regards
> >> >Bhavya
> >> >
> >> >On Sun, Nov 5, 2017 at 7:24 PM, Lionel CL 
> wrote:
> >> >
> >> >> I have the same problem in CDH 5.8.0
> >> >> spark2 version is 2.1.0.cloudera1
> >> >> carbondata version 1.2.0.
> >> >>
> >> >> There's no error occurred when using open source version spark.
> >> >>
> >> >> 2.6.0-cdh5.8.0
> >> >> 2.1.0.cloudera1
> >> >> 2.11
> >> >> 2.11.8
> >> >>
> >> >>
> >> >> scala> cc.sql("create table t111(vin string) stored by 'carbondata'")
> >> >> 17/11/03 10:22:03 AUDIT command.CreateTable: [][][Thread-1]Creating
> >> Table
> >> >> with Database name [default] and Table name [t111]
> >> >> java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.
> >> >> catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/
> >> >> TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/
> >> >> CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/
> >> >> CatalogStorageFormat;Lorg/apache/spark/sql/types/StructT
> >> >> ype;Lscala/Option;Lscala/collection/Seq;Lscala/Option;
> >> >> Ljava/lang/String;JJLscala/collection/immutable/Map;
> >> >> Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;
> >> >> Lscala/collection/Seq;Z)Lorg/apache/spark/sql/catalyst/
> >> >> catalog/CatalogTable;
> >> >>   at org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar
> >> >> bonSchema(CarbonSource.scala:253)
> >> >>   at org.apache.spark.sql.execution.command.DDLStrategy.apply(
> >> >> DDLStrategy.scala:135)
> >> >>   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
> >> >> $1.apply(QueryPlanner.scala:62)
> >> >>   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
> >> >> $1.apply(QueryPlanner.scala:62)
> >> >>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> >> >>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> >> >>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> >> >>
> >> >>
> >> >> 在 2017/11/1 上午1:58,“chenliang613”> chenlia
> >> >> ng6...@gmail.com>> 写入:
> >> >>
> >> >> Hi
> >> >>
> >> >> Did you use open source spark version?
> >> >>
> >> >> Can you provide more detail info :
> >> >> 1. which carbondata version and spark version, you used ?
> >> >> 2. Can you share with us , reproduce script and steps.
> >> >>
> >> >> Regards
> >> >> Liang
> >> >>
> >> >>
> >> >> hujianjun wrote
> >> >> scala> carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table(id
> >> string,name
> >> >> string,city string,age Int)STORED BY 'carbondata'")
> >> >> 17/10/23 19:13:52 AUDIT command.CarbonCreateTableCommand:
> >> >> [master][root][Thread-1]Creating Table with Database name
> [clb_carbon]
> >> and
> >> >> Table name [carbon_table]
> >> >> java.lang.NoSuchMethodError:
> >> >> org.apache.spark.sql.catalyst.catalog.CatalogTable.copy(Lorg
> >> >> /apache/spark/sql/catalyst/TableIdentifier;Lorg/apache/
> >> >> spark/sql/catalyst/catalog/CatalogTableType;Lorg/apache/
> 

Re: Error while creating table in carbondata

2017-11-06 Thread Lionel CL
mvn -DskipTests -Pspark-2.1 clean package
The pom file was changed as which provided in former email.



在 2017/11/6 下午7:47,“Bhavya Aggarwal” 写入:

>Hi,
>
>Can you please let me know how are you building the Carbondata assembly
>jar, or which command you are running to build carbondata.
>
>Regards
>Bhavya
>
>On Mon, Nov 6, 2017 at 2:18 PM, Lionel CL  wrote:
>
>> Yes, there is a catalyst jar under the path /opt/cloudera/parcels/SPARK2/
>> lib/spark2/jars/
>>
>> spark-catalyst_2.11-2.1.0.cloudera1.jar
>>
>>
>>
>>
>>
>>
>>
>> 在 2017/11/6 下午4:12,“Bhavya Aggarwal” 写入:
>>
>> >Hi,
>> >
>> >Can you please check if you have spark-catalyst jar in $SPARK_HOME/jars
>> >folder for your  cloudera version, if its not there please try to include
>> >it and retry.
>> >
>> >Thanks and regards
>> >Bhavya
>> >
>> >On Sun, Nov 5, 2017 at 7:24 PM, Lionel CL  wrote:
>> >
>> >> I have the same problem in CDH 5.8.0
>> >> spark2 version is 2.1.0.cloudera1
>> >> carbondata version 1.2.0.
>> >>
>> >> There's no error occurred when using open source version spark.
>> >>
>> >> 2.6.0-cdh5.8.0
>> >> 2.1.0.cloudera1
>> >> 2.11
>> >> 2.11.8
>> >>
>> >>
>> >> scala> cc.sql("create table t111(vin string) stored by 'carbondata'")
>> >> 17/11/03 10:22:03 AUDIT command.CreateTable: [][][Thread-1]Creating
>> Table
>> >> with Database name [default] and Table name [t111]
>> >> java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.
>> >> catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/
>> >> TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/
>> >> CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/
>> >> CatalogStorageFormat;Lorg/apache/spark/sql/types/StructT
>> >> ype;Lscala/Option;Lscala/collection/Seq;Lscala/Option;
>> >> Ljava/lang/String;JJLscala/collection/immutable/Map;
>> >> Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;
>> >> Lscala/collection/Seq;Z)Lorg/apache/spark/sql/catalyst/
>> >> catalog/CatalogTable;
>> >>   at org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar
>> >> bonSchema(CarbonSource.scala:253)
>> >>   at org.apache.spark.sql.execution.command.DDLStrategy.apply(
>> >> DDLStrategy.scala:135)
>> >>   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
>> >> $1.apply(QueryPlanner.scala:62)
>> >>   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
>> >> $1.apply(QueryPlanner.scala:62)
>> >>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>> >>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>> >>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
>> >>
>> >>
>> >> 在 2017/11/1 上午1:58,“chenliang613” chenlia
>> >> ng6...@gmail.com>> 写入:
>> >>
>> >> Hi
>> >>
>> >> Did you use open source spark version?
>> >>
>> >> Can you provide more detail info :
>> >> 1. which carbondata version and spark version, you used ?
>> >> 2. Can you share with us , reproduce script and steps.
>> >>
>> >> Regards
>> >> Liang
>> >>
>> >>
>> >> hujianjun wrote
>> >> scala> carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table(id
>> string,name
>> >> string,city string,age Int)STORED BY 'carbondata'")
>> >> 17/10/23 19:13:52 AUDIT command.CarbonCreateTableCommand:
>> >> [master][root][Thread-1]Creating Table with Database name [clb_carbon]
>> and
>> >> Table name [carbon_table]
>> >> java.lang.NoSuchMethodError:
>> >> org.apache.spark.sql.catalyst.catalog.CatalogTable.copy(Lorg
>> >> /apache/spark/sql/catalyst/TableIdentifier;Lorg/apache/
>> >> spark/sql/catalyst/catalog/CatalogTableType;Lorg/apache/
>> >> spark/sql/catalyst/catalog/CatalogStorageFormat;Lorg/
>> >> apache/spark/sql/types/StructType;Lscala/Option;Lscala/
>> >> collection/Seq;Lscala/Option;Ljava/lang/String;JJLscala/
>> >> collection/immutable/Map;Lscala/Option;Lscala/Option;
>> >> Lscala/Option;Lscala/Option;Lscala/collection/Seq;Z)Lorg/
>> >> apache/spark/sql/catalyst/catalog/CatalogTable;
>> >>at
>> >> org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar
>> >> bonSchema(CarbonSource.scala:253)
>> >>at
>> >> org.apache.spark.sql.execution.strategy.DDLStrategy.apply(
>> >> DDLStrategy.scala:154)
>> >>at
>> >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
>> >> $1.apply(QueryPlanner.scala:62)
>> >>at
>> >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
>> >> $1.apply(QueryPlanner.scala:62)
>> >>at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>> >>at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>> >>at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
>> >>at
>> >> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(Que
>> >> ryPlanner.scala:92)
>> >>at
>> >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
>> >> $2$$anonfun$apply$2.apply(QueryPlanner.scala:77)
>> >>at
>> >> 

Re: [Discussion] Support pre-aggregate table to improve OLAP performance

2017-11-06 Thread Ravindra Pesala
Hi Bill,

Please find my comments.

1. We are not supporting join queries in this design so it will be always
one parent table for an aggregate table. We may consider the join queries
for creating aggregation queries in future.

2. Aggregation column name will be created internally and it would be line
agg_parentcolumnname.

3. Yes if we create aggtable on dictionary column of parent table then it
uses same parent dictionary. Aggregation table does not generate any
dictionary files.

4. time-series.eventtime is the time column of the main table, there should
be at least one timestamp column on the main table to create
timeseries tables. In design, the granularity is replaced with hierarchy it
means the user can give the time hierarchy like a minute, hour, day so
three aggregation tables of a minute , hour and day aggregation tables will
be created automatically and loaded the data to them for every load.

5. In new design v1.1 it is now changed please check the same.

6. As I mentioned above in new V1.1 design it got changed to hierarchy so
user can define his own time hierarchy.

7. Ok, we will discuss and check whether we can expose this  SORT_COLUMNS
configuration on aggregation table. Even if we don't support now we can
expose in future.

8. Yes, merge index s applicable for aggregation table as well.

Regards,
Ravindra.

On 3 November 2017 at 09:05, bill.zhou  wrote:

> hi  Jacky & Ravindra, I have little more query about this design, thank you
> very much can clarify my query.
>
>
> 1. if we support create aggreagation tables from two or more tabels join,
> how to set the aggretate.parent?, whether can be like
> 'aggretate.parent'='fact1,dim1,dim1'
> 2. what's the agg table colum name ? for following create command it will
> be
> as: user_id,name,c2, price ?
> CREATE TABLE agg_sales
> STORED BY 'carbondata'
> TBLPROPERTIES ('aggregate.parent'='sales')
> AS SELECT user_id,user_name as name, sum(quantity) as c2, avg(price) FROM
> sales GROUP BY user_id.
> 3. if we create the dictioanry column in agg table, whether the dictionary
> file will use the same one main table?
>
> 4. for rollup table main table creation: what's the mean for
> timeseries.eventtime, granualarity? what's column can belong to this?
> 5. for rollup table main table creation: what's the mean for
> ‘timeseries.aggtype’ =’quantity:sum, max', it means the column quantity
> only
> support sum, max ?
>
> 6. In both the above cases carbon generates the 4 pre-aggregation tables
> automatically for
> year, month, day and hour. (their table name will be prefixed with
> agg_sales). -- in about cause only see the column hour, how to generate the
> year, month and day ?
>
> 7.In internal implementation, carbon will create these table with
> SORT_COLUMNS=’group by
> column defined above’, so that filter group by query on main table will be
> faster because it
> can leverage the index in pre-aggregate tables. -- I suggstion user can
> control the sort columns order
> 8. whether support merge index to agg table ? -- it is usefull.
>
>
> Jacky Li wrote
> > Hi community,
> >
> > In traditional data warehouse, pre-aggregate table or cube is a common
> > technology to improve OLAP query performance. To take carbondata support
> > for OLAP to next level, I’d like to propose pre-aggregate table support
> in
> > carbondata.
> >
> > Please refer to CARBONDATA-1516
> > https://issues.apache.org/jira/browse/CARBONDATA-1516; and the
> > design document attached in the JIRA ticket
> > (https://issues.apache.org/jira/browse/CARBONDATA-1516
> > https://issues.apache.org/jira/browse/CARBONDATA-1516;)
> >
> > This design is still in initial phase, proposed usage and SQL syntax are
> > subject to change. Please provide your comment to improve this feature.
> > Any suggestion on the design from community is welcomed.
> >
> > Regards,
> > Jacky Li
>
>
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>



-- 
Thanks & Regards,
Ravi


Re: Error while creating table in carbondata

2017-11-06 Thread Bhavya Aggarwal
Hi,

Can you please let me know how are you building the Carbondata assembly
jar, or which command you are running to build carbondata.

Regards
Bhavya

On Mon, Nov 6, 2017 at 2:18 PM, Lionel CL  wrote:

> Yes, there is a catalyst jar under the path /opt/cloudera/parcels/SPARK2/
> lib/spark2/jars/
>
> spark-catalyst_2.11-2.1.0.cloudera1.jar
>
>
>
>
>
>
>
> 在 2017/11/6 下午4:12,“Bhavya Aggarwal” 写入:
>
> >Hi,
> >
> >Can you please check if you have spark-catalyst jar in $SPARK_HOME/jars
> >folder for your  cloudera version, if its not there please try to include
> >it and retry.
> >
> >Thanks and regards
> >Bhavya
> >
> >On Sun, Nov 5, 2017 at 7:24 PM, Lionel CL  wrote:
> >
> >> I have the same problem in CDH 5.8.0
> >> spark2 version is 2.1.0.cloudera1
> >> carbondata version 1.2.0.
> >>
> >> There's no error occurred when using open source version spark.
> >>
> >> 2.6.0-cdh5.8.0
> >> 2.1.0.cloudera1
> >> 2.11
> >> 2.11.8
> >>
> >>
> >> scala> cc.sql("create table t111(vin string) stored by 'carbondata'")
> >> 17/11/03 10:22:03 AUDIT command.CreateTable: [][][Thread-1]Creating
> Table
> >> with Database name [default] and Table name [t111]
> >> java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.
> >> catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/
> >> TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/
> >> CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/
> >> CatalogStorageFormat;Lorg/apache/spark/sql/types/StructT
> >> ype;Lscala/Option;Lscala/collection/Seq;Lscala/Option;
> >> Ljava/lang/String;JJLscala/collection/immutable/Map;
> >> Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;
> >> Lscala/collection/Seq;Z)Lorg/apache/spark/sql/catalyst/
> >> catalog/CatalogTable;
> >>   at org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar
> >> bonSchema(CarbonSource.scala:253)
> >>   at org.apache.spark.sql.execution.command.DDLStrategy.apply(
> >> DDLStrategy.scala:135)
> >>   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
> >> $1.apply(QueryPlanner.scala:62)
> >>   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
> >> $1.apply(QueryPlanner.scala:62)
> >>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> >>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> >>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> >>
> >>
> >> 在 2017/11/1 上午1:58,“chenliang613” >> ng6...@gmail.com>> 写入:
> >>
> >> Hi
> >>
> >> Did you use open source spark version?
> >>
> >> Can you provide more detail info :
> >> 1. which carbondata version and spark version, you used ?
> >> 2. Can you share with us , reproduce script and steps.
> >>
> >> Regards
> >> Liang
> >>
> >>
> >> hujianjun wrote
> >> scala> carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table(id
> string,name
> >> string,city string,age Int)STORED BY 'carbondata'")
> >> 17/10/23 19:13:52 AUDIT command.CarbonCreateTableCommand:
> >> [master][root][Thread-1]Creating Table with Database name [clb_carbon]
> and
> >> Table name [carbon_table]
> >> java.lang.NoSuchMethodError:
> >> org.apache.spark.sql.catalyst.catalog.CatalogTable.copy(Lorg
> >> /apache/spark/sql/catalyst/TableIdentifier;Lorg/apache/
> >> spark/sql/catalyst/catalog/CatalogTableType;Lorg/apache/
> >> spark/sql/catalyst/catalog/CatalogStorageFormat;Lorg/
> >> apache/spark/sql/types/StructType;Lscala/Option;Lscala/
> >> collection/Seq;Lscala/Option;Ljava/lang/String;JJLscala/
> >> collection/immutable/Map;Lscala/Option;Lscala/Option;
> >> Lscala/Option;Lscala/Option;Lscala/collection/Seq;Z)Lorg/
> >> apache/spark/sql/catalyst/catalog/CatalogTable;
> >>at
> >> org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar
> >> bonSchema(CarbonSource.scala:253)
> >>at
> >> org.apache.spark.sql.execution.strategy.DDLStrategy.apply(
> >> DDLStrategy.scala:154)
> >>at
> >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
> >> $1.apply(QueryPlanner.scala:62)
> >>at
> >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
> >> $1.apply(QueryPlanner.scala:62)
> >>at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> >>at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> >>at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> >>at
> >> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(Que
> >> ryPlanner.scala:92)
> >>at
> >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
> >> $2$$anonfun$apply$2.apply(QueryPlanner.scala:77)
> >>at
> >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
> >> $2$$anonfun$apply$2.apply(QueryPlanner.scala:74)
> >>at
> >> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(T
> >> raversableOnce.scala:157)
> >>at
> >> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(T
> >> raversableOnce.scala:157)
> >>at 

Re: Error while creating table in carbondata

2017-11-06 Thread Lionel CL
Yes, there is a catalyst jar under the path 
/opt/cloudera/parcels/SPARK2/lib/spark2/jars/

spark-catalyst_2.11-2.1.0.cloudera1.jar







在 2017/11/6 下午4:12,“Bhavya Aggarwal” 写入:

>Hi,
>
>Can you please check if you have spark-catalyst jar in $SPARK_HOME/jars
>folder for your  cloudera version, if its not there please try to include
>it and retry.
>
>Thanks and regards
>Bhavya
>
>On Sun, Nov 5, 2017 at 7:24 PM, Lionel CL  wrote:
>
>> I have the same problem in CDH 5.8.0
>> spark2 version is 2.1.0.cloudera1
>> carbondata version 1.2.0.
>>
>> There's no error occurred when using open source version spark.
>>
>> 2.6.0-cdh5.8.0
>> 2.1.0.cloudera1
>> 2.11
>> 2.11.8
>>
>>
>> scala> cc.sql("create table t111(vin string) stored by 'carbondata'")
>> 17/11/03 10:22:03 AUDIT command.CreateTable: [][][Thread-1]Creating Table
>> with Database name [default] and Table name [t111]
>> java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.
>> catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/
>> TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/
>> CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/
>> CatalogStorageFormat;Lorg/apache/spark/sql/types/StructT
>> ype;Lscala/Option;Lscala/collection/Seq;Lscala/Option;
>> Ljava/lang/String;JJLscala/collection/immutable/Map;
>> Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;
>> Lscala/collection/Seq;Z)Lorg/apache/spark/sql/catalyst/
>> catalog/CatalogTable;
>>   at org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar
>> bonSchema(CarbonSource.scala:253)
>>   at org.apache.spark.sql.execution.command.DDLStrategy.apply(
>> DDLStrategy.scala:135)
>>   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
>> $1.apply(QueryPlanner.scala:62)
>>   at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
>> $1.apply(QueryPlanner.scala:62)
>>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
>>
>>
>> 在 2017/11/1 上午1:58,“chenliang613”> ng6...@gmail.com>> 写入:
>>
>> Hi
>>
>> Did you use open source spark version?
>>
>> Can you provide more detail info :
>> 1. which carbondata version and spark version, you used ?
>> 2. Can you share with us , reproduce script and steps.
>>
>> Regards
>> Liang
>>
>>
>> hujianjun wrote
>> scala> carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table(id string,name
>> string,city string,age Int)STORED BY 'carbondata'")
>> 17/10/23 19:13:52 AUDIT command.CarbonCreateTableCommand:
>> [master][root][Thread-1]Creating Table with Database name [clb_carbon] and
>> Table name [carbon_table]
>> java.lang.NoSuchMethodError:
>> org.apache.spark.sql.catalyst.catalog.CatalogTable.copy(Lorg
>> /apache/spark/sql/catalyst/TableIdentifier;Lorg/apache/
>> spark/sql/catalyst/catalog/CatalogTableType;Lorg/apache/
>> spark/sql/catalyst/catalog/CatalogStorageFormat;Lorg/
>> apache/spark/sql/types/StructType;Lscala/Option;Lscala/
>> collection/Seq;Lscala/Option;Ljava/lang/String;JJLscala/
>> collection/immutable/Map;Lscala/Option;Lscala/Option;
>> Lscala/Option;Lscala/Option;Lscala/collection/Seq;Z)Lorg/
>> apache/spark/sql/catalyst/catalog/CatalogTable;
>>at
>> org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar
>> bonSchema(CarbonSource.scala:253)
>>at
>> org.apache.spark.sql.execution.strategy.DDLStrategy.apply(
>> DDLStrategy.scala:154)
>>at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
>> $1.apply(QueryPlanner.scala:62)
>>at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
>> $1.apply(QueryPlanner.scala:62)
>>at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>>at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>>at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
>>at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(Que
>> ryPlanner.scala:92)
>>at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
>> $2$$anonfun$apply$2.apply(QueryPlanner.scala:77)
>>at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
>> $2$$anonfun$apply$2.apply(QueryPlanner.scala:74)
>>at
>> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(T
>> raversableOnce.scala:157)
>>at
>> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(T
>> raversableOnce.scala:157)
>>at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>>at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>>at
>> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
>>at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
>>at
>> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun
>> $2.apply(QueryPlanner.scala:74)
>>at
>> 

Re: Re: Delegation Token can be issued only with kerberos or web authentication" will occur in yarn cluster

2017-11-06 Thread yixu2001
dev 
 I have copied core-site.xml to spark2 conf folder, it does not work.
I will share you the jar I am using(please get the jar file from link  

download link :https://pan.baidu.com/s/1b6AG5Spassword:rww2
 whould you help me to confirm whether the jar is suitable?
If the jar is not suitable, could you please send me a suitable one to try? 


yixu2001
 
From: Naresh P R
Date: 2017-11-04 01:26
To: dev
Subject: Re: Re: Delegation Token can be issued only with kerberos or web 
authentication" will occur in yarn cluster
Hi yixu2001,
 
From hadoop code, i could see IOException("Delegation Token can be issued
only with kerberos or web authentication") is thrown only if authentication
method is set as "SIMPLE".
 
private boolean isAllowedDelegationTokenOp() throws IOException {
AuthenticationMethod authMethod = this.getConnectionAuthenticationMethod();
return !UserGroupInformation.isSecurityEnabled() || authMethod ==
AuthenticationMethod.KERBEROS || authMethod ==
AuthenticationMethod.KERBEROS_SSL || authMethod ==
AuthenticationMethod.CERTIFICATE;
}
 
Token getDelegationToken(Text renewer)
throws IOException {

if (!this.isAllowedDelegationTokenOp()) {
throw new IOException("Delegation Token can be issued only
with kerberos or web authentication");
}
 

 
}
 
Can you try to execute queries by copying core-site.xml from hadoop-conf
folder to spark2 conf folder & classpath of spark-submit?
 
From the provided logs, i could
see carbondata_2.11-1.1.1-bdd-hadoop2.7.2.jar size is 9607344bytes, please
make sure it has only carbondata classes.
 
I could see Carbon explicitly calling
"TokenCache.obtainTokensForNamenodes" which
is throwing this exception.
 
If above mentioned steps dint work, you can raise a JIRA to investigate
further on this.

Regards,
Naresh P R
 
On Fri, Nov 3, 2017 at 3:10 PM, yixu2001  wrote:
 
> prnaresh.naresh, dev
>
>  The carbon jar I used does not include hadoop classes & core-site.xml.
> The attachment include the jar list while submtting
> spark job, please confirm it.
> --
> yixu2001
>
>
> *From:* Naresh P R 
> *Date:* 2017-11-03 16:07
> *To:* yixu2001 
> *Subject:* Re: Re: Delegation Token can be issued only with kerberos or
> web authentication" will occur in yarn cluster
> Hi yixu2001,
>
> Are you using carbon shaded jar with hadoop classes & core-site.xml
> included in carbon jar ?
>
> If so, can you try to use carbondata individual component jars while
> submtting spark job?
>
> As per my understanding, this happens if client core-site.xml has
> hadoop.security.authentication=simple & hdfs is kerberized.
>
> You can also enable verbose to see hadoop jars used in the error trace
> while querying carbon tables.
>
> Also i am not sure whether CarbonData is tested in HDP kerberos cluster.
> ---
> Regards,
> Naresh P R
>
>
> On Fri, Nov 3, 2017 at 8:36 AM, yixu2001  wrote:
>
>> Naresh P R:
>>  For the attachments can not be uploaded in maillist,I have
>> add the attachments to the mail for you, please check it.
>>  Our platform is installed with HDP 2.4, but spark 2.1 is
>> not included in HDP 2.4, we using spark 2.1 with additiona
>> l installed of apache version.
>> --
>> yixu2001
>>
>>
>> *From:* Naresh P R 
>> *Date:* 2017-11-02 22:02
>> *To:* dev 
>> *Subject:* Re: Re: Delegation Token can be issued only with kerberos or
>> web authentication" will occur in yarn cluster
>> Hi yixu,
>>
>> I am not able to see any attachment in your previous mail.
>> ---
>> Regards,
>> Naresh P R
>>
>> On Thu, Nov 2, 2017 at 4:40 PM, yixu2001  wrote:
>>
>>> dev
>>>  Please refer to the attachment "cluster carbon error2.txt"
>>> for the log trace.
>>> In this log, I try 2 query statements:
>>> select * from e_carbon.prod_inst_his   prod_inst_his is
>>> a hive table, it success.
>>> select * from e_carbon.prod_inst_his_c prod_inst_his_c i
>>> s a carbon table, it failed.
>>>
>>> I pass the principal in my start script, please refer to the
>>>  attachment "testCluster.sh
>>>
>>> ".
>>>
>>> I have set hive.server2.enable.doAs = false in the above tes
>>> t and I have printed it in the log.
>>> --
>>> yixu2001
>>>
>>>
>>> *From:* Naresh P R 
>>> *Date:* 2017-11-01 19:40
>>> *To:* dev 
>>> *Subject:* Re: Delegation Token can be issued only with kerberos or web
>>> authentication" will occur in yarn cluster
>>> Hi,
>>>
>>> Ideally kerberos authentication should work with carbon table, Can you
>>> share us log trace to analyze further more?
>>>
>>> how are you passing the principal in yarn cluster ?
>>>
>>> can you try to set hive.server2.enable.doAs = false & run query on carbon
>>> table ?
>>> 
>>> Regards,
>>> Naresh P R
>>>
>>> On Wed, Nov 1, 2017 at 3:33 PM,