回复: Block B-tree loading failed add debug information
dev What time is expected to release a patch, we can test 发件人: yixu2001 发送时间: 2017-10-30 18:11 收件人: dev 主题: Block B-tree loading failed add debug information dev environment spark.2.1.1 carbondata 1.1.1 hadoop 2.7.2 add debug information Block B-tree loading faile why CarbonUtil.calculateMetaSizeCalculation results getBlockLength=0 getBlockOffset=8301549 ? Caused by: org.apache.carbondata.core.datastore.exception.IndexBuilderException: Invalid carbon data file: hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_his1023c/Fact/Part0/Segment_1.1/part-0-172_batchno0-0-1508833127408.carbondata :getBlockLength=0 getBlockOffset=8301549 requiredMetaSize=-8301549 isV1=false getVersion=ColumnarFormatV3 1 debug information scala> cc.sql("select prod_inst_id,count(*) from e_carbon.prod_inst_his1023c group by prod_inst_id having count(*)>1").show [Stage 0:=>(157 + 50) / 283]17/10/30 10:39:24 WARN scheduler.TaskSetManager: Lost task 252.0 in stage 0.0 (TID 201, HDD010, executor 22): org.apache.carbondata.core.datastore.exception.IndexBuilderException: Block B-tree loading failed at org.apache.carbondata.core.datastore.BlockIndexStore.fillLoadedBlocks(BlockIndexStore.java:264) at org.apache.carbondata.core.datastore.BlockIndexStore.getAll(BlockIndexStore.java:189) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:131) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:186) at org.apache.carbondata.core.scan.executor.impl.VectorDetailQueryExecutor.execute(VectorDetailQueryExecutor.java:36) at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.initialize(VectorizedCarbonRecordReader.java:112) at org.apache.carbondata.spark.rdd.CarbonScanRDD.compute(CarbonScanRDD.scala:204) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: org.apache.carbondata.core.datastore.exception.IndexBuilderException: Invalid carbon data file: hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_his1023c/Fact/Part0/Segment_1.1/part-0-172_batchno0-0-1508833127408.carbondata getBlockLength=0 getBlockOffset=8301549 requiredMetaSize=-8301549 getVersion=ColumnarFormatV3 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.carbondata.core.datastore.BlockIndexStore.fillLoadedBlocks(BlockIndexStore.java:254) ... 21 more Caused by: org.apache.carbondata.core.datastore.exception.IndexBuilderException: Invalid carbon data file: hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_his1023c/Fact/Part0/Segment_1.1/part-0-172_batchno0-0-1508833127408.carbondata=lianch:getBlockLength=0 getBlockOffset=8301549 requiredMetaSize=-8301549 isV1=false getVersion=ColumnarFormatV3 at org.apache.carbondata.core.datastore.AbstractBlockIndexStoreCache.checkAndLoadTableBlocks(AbstractBlockIndexStoreCache.java:116) at org.apache.carbondata.core.datastore.BlockIndexStore.loadBlock(BlockIndexStore.java:304) at org.apache.carbondata.core.datastore.BlockIndexStore.get(BlockIndexStore.java:109) at org.apache.carbondata.core.datastore.BlockIndexStore$BlockLoaderThread.call(BlockIndexStore.java:294) at org.apache.carbondata.core.datastore.BlockIndexStore$BlockLoaderThread.call(BlockIndexStore.java:284) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more [Stage 0:==> (223 + 50) / 283]17/10/30 10:39:26 ERROR scheduler.TaskSetManager: Task 252 in stage 0.0 failed 10 times; aborting job 17/10/30 10:39:26 WARN scheduler.TaskSetManager: Lost task 61.0 in stage 0.0 (TID 184, HDD012, executor 7): TaskKilled (killed intentionally) 17/10/30 10:39:26 WARN scheduler.TaskSetManager: Lost task 71.0 in stage 0.0 (TID 212, HDD008, executor 18): TaskKilled (killed
Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
dev What time is expected to release a patch, we can test yixu2001 From: yixu2001 Date: 2017-11-07 15:47 To: dev Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0, dev hello there is no solution, the next version carbondata 1.2.1. can be released? yixu2001 From: yixu2001 Date: 2017-10-27 16:55 To: dev CC: sounak; chenliang6136 Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0, dev thank you We hope it will be resolved in a new release version carbondata 1.2.1. In addition, the problem about maillist "Is query slowing down due to the fragmentations produced by the many update operations?" is also a very important problem in our selection process. We will be grateful if it can also be resolved in carbondata 1.2.1. yixu2001 From: yixu2001 Date: 2017-10-21 11:12 To: dev Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0, dev testdata 链接:http://pan.baidu.com/s/1boVAqeF 密码:qtdh CSV yixu2001 From: Raghunandan S Date: 2017-10-20 16:56 To: dev Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0, Should be fine.do we need to create any login to access it? On Fri, 20 Oct 2017 at 1:31 PM, yixu2001wrote: > dev >Baidu SkyDrive share it ? > > > yixu2001 > > From: sounak > Date: 2017-10-20 16:14 > To: dev > Subject: Re: Re: Update statement failed with "Multiple input rows matched > for same row" in version 1.2.0, > As the tupleIds are also unique, the only option left is to get hold of > your data and reproducing it in-house. Is it possible to generate your data > in-house or get it? > > On Fri, Oct 20, 2017 at 12:14 PM, yixu2001 wrote: > > > dev > > > > scala> cc.sql(" select c.tupleId,count(*) from (select a.remark,a.id, > > getTupleId() as tupleId from c_indextest1 a,c_indextest2 b where a.id= > b.id) > > c group by c.tupleId having count(*)>1").show; > > +---++ > > |tupleId|count(1)| > > +---++ > > +---++ > > > > no Multiple input rows > > > > > > yixu2001 > > > > From: sounak > > Date: 2017-10-20 10:57 > > To: dev > > Subject: Re: Re: Update statement failed with "Multiple input rows > matched > > for same row" in version 1.2.0, > > Slight rectification in typo of the query syntax > > > > " select a.remark,a.id, getTupleId() as *tupleId* from c_indextest1 a, > > c_indextest2 > > b where a.id=b.id;" > > > > On Fri, Oct 20, 2017 at 7:54 AM, sounak wrote: > > > > > Internally, we call a UDF to generate the TupleId and based on that > > > tupleId we decide if the row is duplicated or not. You can run a > slightly > > > tweaked query > > > > > > " select a.remark,a.id, getTupleId() as TupleId from c_indextest1 a, > > c_indextest2 > > > b where a.id=b.id; " > > > > > > In order to prompt Multiple Input Rows most probably the getTupleId() > > will > > > give duplicate entries. > > > > > > In between your given query runs fine with the data i have generated in > > > house. So this is data dependent. So it will be good if we get > > > *origin_data* or the script which has build the data for origin_data. > > > > > > On Fri, Oct 20, 2017 at 7:28 AM, yixu2001 wrote: > > > > > >> dev > > >>one row record > > >> > > >> > > >> yixu2001 > > >> > > >> From: Liang Chen > > >> Date: 2017-10-19 22:26 > > >> To: dev > > >> Subject: Re: Re: Update statement failed with "Multiple input rows > > >> matched for same row" in version 1.2.0, > > >> Hi > > >> > > >> Execute the below query, return one row record or multiple row > records ? > > >> - > > >> select a.remark from c_indextest1 a where a.id=b.id > > >> > > >> Regards > > >> Liang > > >> > > >> > > >> yixu2001 wrote > > >> > dev > > >> > You can follow the steps below to reproduce the problem. > > >> > tables c_indextest2 has 1700w records and table c_indextest1 has > about > > >> 30w > > >> > records. > > >> > > > >> > step 1: > > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark > > >> string) > > >> > STORED BY 'carbondata'").show; > > >> > > > >> > step 2: origin_data is a existing table with 1700w records, and > fields > > >> in > > >> > the table do not matter. > > >> > cc.sql("insert into c_indextest2 select row_number() over(partition > by > > >> > a.PKid order by a.pkid) id,a.remark from (SELECT '1' PKID,'dfsdd' > > >> remark > > >> > from origin_data limit 1700) a").show; > > >> > > > >> > step 3: > > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark > > >> string) > > >> > STORED BY 'carbondata'").show; > > >> > > > >> > step 4: > > >> > cc.sql("insert into c_indextest1 select * from c_indextest2 where > > >> > pmod(cast(ID as int),50)=43").show; > > >> > > > >> > setp 5: > > >> > cc.sql("update c_indextest2 b set
Re: Re: After MAJOR index lost
dev What time is expected to release a patch, we can test yixu2001 From: yixu2001 Date: 2017-11-07 15:14 To: dev Subject: Re: Re: After MAJOR index lost dev run Multiple updates Generate ten thousand delta file cc.sql("update e_carbon.prod_inst_his_c A set (a.ETL_date,a.prod_inst_id,a.OWNER_CUST_ID,a.ACC_PROD_INST_ID,a.DVERSION,a.GTID,a.IND,a.ODS_STATE,A.SRC,a.kafka_date,a.PRODUCT_ID,a.ADDRESS_ID,a.PAYMENT_MODE_CD,a.PRODUCT_PASSWORD,a.IMPORTANT_LEVEL,a.AREA_CODE,a.ACC_NBR,a.EXCH_ID,a.COMMON_REGION_ID,a.REMARK,a.PAY_CYCLE,a.BEGIN_RENT_TIME,a.STOP_RENT_TIME,a.FINISH_TIME,a.STOP_STATUS,a.STATUS_CD,a.CREATE_DATE,a.STATUS_DATE,a.UPDATE_DATE,a.PROC_SERIAL,a.USE_CUST_ID,a.EXT_PROD_INST_ID,a.ADDRESS_DESC,a.AREA_ID,a.UPDATE_STAFF,a.CREATE_STAFF,a.REC_UPDATE_DATE,a.ACCOUNT,a.VERSION,a.COMMUNITY_ID,a.EXT_ACC_PROD_INST_ID,a.DISTRIBUTOR_ID,a.SHARDING_ID)=(select b.etl_date,b.prod_inst_id,b.OWNER_CUST_ID,b.ACC_PROD_INST_ID,B.DVERSION,b.GTID,b.IND,B.ODS_STATE,B.SRC,b.kafka_date,b.PRODUCT_ID,b.ADDRESS_ID,b.PAYMENT_MODE_CD,b.PRODUCT_PASSWORD,b.IMPORTANT_LEVEL,b.AREA_CODE,b.ACC_NBR,b.EXCH_ID,b.COMMON_REGION_ID,b.REMARK,b.PAY_CYCLE,b.BEGIN_RENT_TIME,b.STOP_RENT_TIME,b.FINISH_TIME,b.STOP_STATUS,b.STATUS_CD,b.CREATE_DATE,b.STATUS_DATE,b.UPDATE_DATE,b.PROC_SERIAL,b.USE_CUST_ID,b.EXT_PROD_INST_ID,b.ADDRESS_DESC,b.AREA_ID,b.UPDATE_STAFF,b.CREATE_STAFF,b.REC_UPDATE_DATE,b.ACCOUNT,b.VERSION,b.COMMUNITY_ID,b.EXT_ACC_PROD_INST_ID,b.DISTRIBUTOR_ID,b.SHARDING_ID from cache_prod_inst_his_u b where a.his_id=b.his_id)").show; yixu2001 From: Liang Chen Date: 2017-11-02 02:29 To: dev Subject: Re: After MAJOR index lost Hi Yes, checked the log message, looks have some issues. Can you share the reproduce steps: Did you use how many machines to do data load, and load how many times? Regards Liang yixu2001 wrote > dev > environment spark.2.1.1 carbondata 1.1.1 hadoop 2.7.2 > > run ALTER table e_carbon.prod_inst_all_c COMPACT 'MAJOR' > CLEAN FILES FOR TABLE e_carbon.prod_inst_all_c > > 17/10/30 14:59:21 ERROR filesystem.AbstractDFSCarbonFile: main Exception > occurred:File does not exist: > hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/624_batchno0-0-1509233731459.carbonindex > > > 623_batchno0-0-1509233118616.carbonindex and > 625_batchno0-0-1509233731459.carbonindex betwwen the lost of > 624_batchno0-0-1509233731459.carbonindex > > -rw-r--r-- 3 e_carbon e_carbon_group 6750 2017-10-29 07:17 > /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/621_batchno0-0-1509231670521.carbonindex > -rw-r--r-- 3 e_carbon e_carbon_group 11320 2017-10-29 07:19 > /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/622_batchno0-0-1509232641994.carbonindex > -rw-r--r-- 3 e_carbon e_carbon_group 6858 2017-10-29 07:35 > /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/623_batchno0-0-1509233118616.carbonindex > -rw-r--r-- 3 e_carbon e_carbon_group 11423 2017-10-29 07:37 > /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/625_batchno0-0-1509233731459.carbonindex > > scala> cc.sql("select his_id,count(*) from e_carbon.prod_inst_his_c group > by his_id having count(*)>1").show > 17/10/30 14:59:21 ERROR filesystem.AbstractDFSCarbonFile: main Exception > occurred:File does not exist: > hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/624_batchno0-0-1509233731459.carbonindex > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, > tree: > Exchange hashpartitioning(his_id#1818, 100) > +- *HashAggregate(keys=[his_id#1818], functions=[partial_count(1), > partial_count(1)], output=[his_id#1818, count#1967L, count#1968L]) >+- *BatchedScan CarbonDatasourceHadoopRelation [ Database name > :e_carbon, Table name :prod_inst_his_c, Schema > :Some(StructType(StructField(his_id,StringType,true), > StructField(ETL_date,StringType,true), > StructField(prod_inst_id,StringType,true), > StructField(owner_cust_id,StringType,true), > StructField(acc_prod_inst_id,StringType,true), > StructField(DVERSION,StringType,true), StructField(GTID,StringType,true), > StructField(IND,StringType,true), StructField(ODS_STATE,StringType,true), > StructField(SRC,StringType,true), StructField(kafka_date,StringType,true), > StructField(product_id,StringType,true), > StructField(address_id,StringType,true), > StructField(payment_mode_cd,StringType,true), > StructField(product_password,StringType,true), > StructField(important_level,StringType,true), > StructField(area_code,StringType,true), > StructField(acc_nbr,StringType,true), > StructField(exch_id,StringType,true), > StructField(common_region_id,StringType,true), > StructField(remark,StringType,true), > StructField(pay_cycle,StringType,true), > StructField(begin_rent_time,StringType,true), > StructField(stop_rent_time,StringType,true), >
Re: Re: Delegation Token can be issued only with kerberos or web authentication" will occur in yarn cluster
dev hello there is no recent progress yixu2001 From: yixu2001 Date: 2017-11-06 16:43 To: dev CC: prnaresh.naresh; 郭海涛 Subject: Re: Re: Delegation Token can be issued only with kerberos or web authentication" will occur in yarn cluster dev I have copied core-site.xml to spark2 conf folder, it does not work. I will share you the jar I am using(please get the jar file from link download link :https://pan.baidu.com/s/1b6AG5Spassword:rww2 whould you help me to confirm whether the jar is suitable? If the jar is not suitable, could you please send me a suitable one to try? yixu2001 From: Naresh P R Date: 2017-11-04 01:26 To: dev Subject: Re: Re: Delegation Token can be issued only with kerberos or web authentication" will occur in yarn cluster Hi yixu2001, From hadoop code, i could see IOException("Delegation Token can be issued only with kerberos or web authentication") is thrown only if authentication method is set as "SIMPLE". private boolean isAllowedDelegationTokenOp() throws IOException { AuthenticationMethod authMethod = this.getConnectionAuthenticationMethod(); return !UserGroupInformation.isSecurityEnabled() || authMethod == AuthenticationMethod.KERBEROS || authMethod == AuthenticationMethod.KERBEROS_SSL || authMethod == AuthenticationMethod.CERTIFICATE; } Token getDelegationToken(Text renewer) throws IOException { if (!this.isAllowedDelegationTokenOp()) { throw new IOException("Delegation Token can be issued only with kerberos or web authentication"); } } Can you try to execute queries by copying core-site.xml from hadoop-conf folder to spark2 conf folder & classpath of spark-submit? From the provided logs, i could see carbondata_2.11-1.1.1-bdd-hadoop2.7.2.jar size is 9607344bytes, please make sure it has only carbondata classes. I could see Carbon explicitly calling "TokenCache.obtainTokensForNamenodes" which is throwing this exception. If above mentioned steps dint work, you can raise a JIRA to investigate further on this. Regards, Naresh P R On Fri, Nov 3, 2017 at 3:10 PM, yixu2001wrote: > prnaresh.naresh, dev > > The carbon jar I used does not include hadoop classes & core-site.xml. > The attachment include the jar list while submtting > spark job, please confirm it. > -- > yixu2001 > > > *From:* Naresh P R > *Date:* 2017-11-03 16:07 > *To:* yixu2001 > *Subject:* Re: Re: Delegation Token can be issued only with kerberos or > web authentication" will occur in yarn cluster > Hi yixu2001, > > Are you using carbon shaded jar with hadoop classes & core-site.xml > included in carbon jar ? > > If so, can you try to use carbondata individual component jars while > submtting spark job? > > As per my understanding, this happens if client core-site.xml has > hadoop.security.authentication=simple & hdfs is kerberized. > > You can also enable verbose to see hadoop jars used in the error trace > while querying carbon tables. > > Also i am not sure whether CarbonData is tested in HDP kerberos cluster. > --- > Regards, > Naresh P R > > > On Fri, Nov 3, 2017 at 8:36 AM, yixu2001 wrote: > >> Naresh P R: >> For the attachments can not be uploaded in maillist,I have >> add the attachments to the mail for you, please check it. >> Our platform is installed with HDP 2.4, but spark 2.1 is >> not included in HDP 2.4, we using spark 2.1 with additiona >> l installed of apache version. >> -- >> yixu2001 >> >> >> *From:* Naresh P R >> *Date:* 2017-11-02 22:02 >> *To:* dev >> *Subject:* Re: Re: Delegation Token can be issued only with kerberos or >> web authentication" will occur in yarn cluster >> Hi yixu, >> >> I am not able to see any attachment in your previous mail. >> --- >> Regards, >> Naresh P R >> >> On Thu, Nov 2, 2017 at 4:40 PM, yixu2001 wrote: >> >>> dev >>> Please refer to the attachment "cluster carbon error2.txt" >>> for the log trace. >>> In this log, I try 2 query statements: >>> select * from e_carbon.prod_inst_his prod_inst_his is >>> a hive table, it success. >>> select * from e_carbon.prod_inst_his_c prod_inst_his_c i >>> s a carbon table, it failed. >>> >>> I pass the principal in my start script, please refer to the >>> attachment "testCluster.sh >>> >>> ". >>> >>> I have set hive.server2.enable.doAs = false in the above tes >>> t and I have printed it in the log. >>> -- >>> yixu2001 >>> >>> >>> *From:* Naresh P R >>> *Date:* 2017-11-01 19:40 >>> *To:* dev >>> *Subject:* Re: Delegation Token can be issued only with kerberos or web >>> authentication" will occur in yarn cluster >>> Hi, >>> >>> Ideally kerberos authentication should work with carbon table, Can you >>> share us log trace to analyze
Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0,
dev hello there is no solution, the next version carbondata 1.2.1. can be released? yixu2001 From: yixu2001 Date: 2017-10-27 16:55 To: dev CC: sounak; chenliang6136 Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0, dev thank you We hope it will be resolved in a new release version carbondata 1.2.1. In addition, the problem about maillist "Is query slowing down due to the fragmentations produced by the many update operations?" is also a very important problem in our selection process. We will be grateful if it can also be resolved in carbondata 1.2.1. yixu2001 From: yixu2001 Date: 2017-10-21 11:12 To: dev Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0, dev testdata 链接:http://pan.baidu.com/s/1boVAqeF 密码:qtdh CSV yixu2001 From: Raghunandan S Date: 2017-10-20 16:56 To: dev Subject: Re: Re: Update statement failed with "Multiple input rows matched for same row" in version 1.2.0, Should be fine.do we need to create any login to access it? On Fri, 20 Oct 2017 at 1:31 PM, yixu2001wrote: > dev >Baidu SkyDrive share it ? > > > yixu2001 > > From: sounak > Date: 2017-10-20 16:14 > To: dev > Subject: Re: Re: Update statement failed with "Multiple input rows matched > for same row" in version 1.2.0, > As the tupleIds are also unique, the only option left is to get hold of > your data and reproducing it in-house. Is it possible to generate your data > in-house or get it? > > On Fri, Oct 20, 2017 at 12:14 PM, yixu2001 wrote: > > > dev > > > > scala> cc.sql(" select c.tupleId,count(*) from (select a.remark,a.id, > > getTupleId() as tupleId from c_indextest1 a,c_indextest2 b where a.id= > b.id) > > c group by c.tupleId having count(*)>1").show; > > +---++ > > |tupleId|count(1)| > > +---++ > > +---++ > > > > no Multiple input rows > > > > > > yixu2001 > > > > From: sounak > > Date: 2017-10-20 10:57 > > To: dev > > Subject: Re: Re: Update statement failed with "Multiple input rows > matched > > for same row" in version 1.2.0, > > Slight rectification in typo of the query syntax > > > > " select a.remark,a.id, getTupleId() as *tupleId* from c_indextest1 a, > > c_indextest2 > > b where a.id=b.id;" > > > > On Fri, Oct 20, 2017 at 7:54 AM, sounak wrote: > > > > > Internally, we call a UDF to generate the TupleId and based on that > > > tupleId we decide if the row is duplicated or not. You can run a > slightly > > > tweaked query > > > > > > " select a.remark,a.id, getTupleId() as TupleId from c_indextest1 a, > > c_indextest2 > > > b where a.id=b.id; " > > > > > > In order to prompt Multiple Input Rows most probably the getTupleId() > > will > > > give duplicate entries. > > > > > > In between your given query runs fine with the data i have generated in > > > house. So this is data dependent. So it will be good if we get > > > *origin_data* or the script which has build the data for origin_data. > > > > > > On Fri, Oct 20, 2017 at 7:28 AM, yixu2001 wrote: > > > > > >> dev > > >>one row record > > >> > > >> > > >> yixu2001 > > >> > > >> From: Liang Chen > > >> Date: 2017-10-19 22:26 > > >> To: dev > > >> Subject: Re: Re: Update statement failed with "Multiple input rows > > >> matched for same row" in version 1.2.0, > > >> Hi > > >> > > >> Execute the below query, return one row record or multiple row > records ? > > >> - > > >> select a.remark from c_indextest1 a where a.id=b.id > > >> > > >> Regards > > >> Liang > > >> > > >> > > >> yixu2001 wrote > > >> > dev > > >> > You can follow the steps below to reproduce the problem. > > >> > tables c_indextest2 has 1700w records and table c_indextest1 has > about > > >> 30w > > >> > records. > > >> > > > >> > step 1: > > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest2 (id string ,remark > > >> string) > > >> > STORED BY 'carbondata'").show; > > >> > > > >> > step 2: origin_data is a existing table with 1700w records, and > fields > > >> in > > >> > the table do not matter. > > >> > cc.sql("insert into c_indextest2 select row_number() over(partition > by > > >> > a.PKid order by a.pkid) id,a.remark from (SELECT '1' PKID,'dfsdd' > > >> remark > > >> > from origin_data limit 1700) a").show; > > >> > > > >> > step 3: > > >> > cc.sql("CREATE TABLE IF NOT EXISTS c_indextest1 (id string ,remark > > >> string) > > >> > STORED BY 'carbondata'").show; > > >> > > > >> > step 4: > > >> > cc.sql("insert into c_indextest1 select * from c_indextest2 where > > >> > pmod(cast(ID as int),50)=43").show; > > >> > > > >> > setp 5: > > >> > cc.sql("update c_indextest2 b set (b.remark)=(select a.remark from > > >> > c_indextest1 a where a.id=b.id)").show; > > >> > > > >> > > > >> > yixu2001 > > >> > > > >> > From: sounak > > >> > Date: 2017-10-19 18:26 > > >> > To: dev > > >> > Subject: Re: Re:
Re: Re: After MAJOR index lost
dev run Multiple updates Generate ten thousand delta file cc.sql("update e_carbon.prod_inst_his_c A set (a.ETL_date,a.prod_inst_id,a.OWNER_CUST_ID,a.ACC_PROD_INST_ID,a.DVERSION,a.GTID,a.IND,a.ODS_STATE,A.SRC,a.kafka_date,a.PRODUCT_ID,a.ADDRESS_ID,a.PAYMENT_MODE_CD,a.PRODUCT_PASSWORD,a.IMPORTANT_LEVEL,a.AREA_CODE,a.ACC_NBR,a.EXCH_ID,a.COMMON_REGION_ID,a.REMARK,a.PAY_CYCLE,a.BEGIN_RENT_TIME,a.STOP_RENT_TIME,a.FINISH_TIME,a.STOP_STATUS,a.STATUS_CD,a.CREATE_DATE,a.STATUS_DATE,a.UPDATE_DATE,a.PROC_SERIAL,a.USE_CUST_ID,a.EXT_PROD_INST_ID,a.ADDRESS_DESC,a.AREA_ID,a.UPDATE_STAFF,a.CREATE_STAFF,a.REC_UPDATE_DATE,a.ACCOUNT,a.VERSION,a.COMMUNITY_ID,a.EXT_ACC_PROD_INST_ID,a.DISTRIBUTOR_ID,a.SHARDING_ID)=(select b.etl_date,b.prod_inst_id,b.OWNER_CUST_ID,b.ACC_PROD_INST_ID,B.DVERSION,b.GTID,b.IND,B.ODS_STATE,B.SRC,b.kafka_date,b.PRODUCT_ID,b.ADDRESS_ID,b.PAYMENT_MODE_CD,b.PRODUCT_PASSWORD,b.IMPORTANT_LEVEL,b.AREA_CODE,b.ACC_NBR,b.EXCH_ID,b.COMMON_REGION_ID,b.REMARK,b.PAY_CYCLE,b.BEGIN_RENT_TIME,b.STOP_RENT_TIME,b.FINISH_TIME,b.STOP_STATUS,b.STATUS_CD,b.CREATE_DATE,b.STATUS_DATE,b.UPDATE_DATE,b.PROC_SERIAL,b.USE_CUST_ID,b.EXT_PROD_INST_ID,b.ADDRESS_DESC,b.AREA_ID,b.UPDATE_STAFF,b.CREATE_STAFF,b.REC_UPDATE_DATE,b.ACCOUNT,b.VERSION,b.COMMUNITY_ID,b.EXT_ACC_PROD_INST_ID,b.DISTRIBUTOR_ID,b.SHARDING_ID from cache_prod_inst_his_u b where a.his_id=b.his_id)").show; yixu2001 From: Liang Chen Date: 2017-11-02 02:29 To: dev Subject: Re: After MAJOR index lost Hi Yes, checked the log message, looks have some issues. Can you share the reproduce steps: Did you use how many machines to do data load, and load how many times? Regards Liang yixu2001 wrote > dev > environment spark.2.1.1 carbondata 1.1.1 hadoop 2.7.2 > > run ALTER table e_carbon.prod_inst_all_c COMPACT 'MAJOR' > CLEAN FILES FOR TABLE e_carbon.prod_inst_all_c > > 17/10/30 14:59:21 ERROR filesystem.AbstractDFSCarbonFile: main Exception > occurred:File does not exist: > hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/624_batchno0-0-1509233731459.carbonindex > > > 623_batchno0-0-1509233118616.carbonindex and > 625_batchno0-0-1509233731459.carbonindex betwwen the lost of > 624_batchno0-0-1509233731459.carbonindex > > -rw-r--r-- 3 e_carbon e_carbon_group 6750 2017-10-29 07:17 > /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/621_batchno0-0-1509231670521.carbonindex > -rw-r--r-- 3 e_carbon e_carbon_group 11320 2017-10-29 07:19 > /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/622_batchno0-0-1509232641994.carbonindex > -rw-r--r-- 3 e_carbon e_carbon_group 6858 2017-10-29 07:35 > /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/623_batchno0-0-1509233118616.carbonindex > -rw-r--r-- 3 e_carbon e_carbon_group 11423 2017-10-29 07:37 > /user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/625_batchno0-0-1509233731459.carbonindex > > scala> cc.sql("select his_id,count(*) from e_carbon.prod_inst_his_c group > by his_id having count(*)>1").show > 17/10/30 14:59:21 ERROR filesystem.AbstractDFSCarbonFile: main Exception > occurred:File does not exist: > hdfs://ns1/user/e_carbon/public/carbon.store/e_carbon/prod_inst_his_c/Fact/Part0/Segment_0/624_batchno0-0-1509233731459.carbonindex > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, > tree: > Exchange hashpartitioning(his_id#1818, 100) > +- *HashAggregate(keys=[his_id#1818], functions=[partial_count(1), > partial_count(1)], output=[his_id#1818, count#1967L, count#1968L]) >+- *BatchedScan CarbonDatasourceHadoopRelation [ Database name > :e_carbon, Table name :prod_inst_his_c, Schema > :Some(StructType(StructField(his_id,StringType,true), > StructField(ETL_date,StringType,true), > StructField(prod_inst_id,StringType,true), > StructField(owner_cust_id,StringType,true), > StructField(acc_prod_inst_id,StringType,true), > StructField(DVERSION,StringType,true), StructField(GTID,StringType,true), > StructField(IND,StringType,true), StructField(ODS_STATE,StringType,true), > StructField(SRC,StringType,true), StructField(kafka_date,StringType,true), > StructField(product_id,StringType,true), > StructField(address_id,StringType,true), > StructField(payment_mode_cd,StringType,true), > StructField(product_password,StringType,true), > StructField(important_level,StringType,true), > StructField(area_code,StringType,true), > StructField(acc_nbr,StringType,true), > StructField(exch_id,StringType,true), > StructField(common_region_id,StringType,true), > StructField(remark,StringType,true), > StructField(pay_cycle,StringType,true), > StructField(begin_rent_time,StringType,true), > StructField(stop_rent_time,StringType,true), > StructField(finish_time,StringType,true), > StructField(stop_status,StringType,true), > StructField(status_cd,StringType,true), > StructField(create_date,StringType,true), >
Re: Error while creating table in carbondata
Hi, I think the problem is that the class signature of OpenSource spark and Cloudera spark do not match for CatalogTable class, there is an additional parameter in the Cloudera spark version shown highlighted below, we may to try building the Carbondata with the spark cloudera version to make it work. *case class CatalogTable(identifier: TableIdentifier,tableType: CatalogTableType,storage: CatalogStorageFormat,schema: StructType,provider: Option[String] = None,partitionColumnNames: Seq[String] = Seq.empty,bucketSpec: Option[BucketSpec] = None, owner: String = "",createTime: Long = System.currentTimeMillis, lastAccessTime: Long = -1,properties: Map[String, String] = Map.empty,stats: Option[Statistics] = None,viewOriginalText: Option[String] = None,viewText: Option[String] = None,comment: Option[String] = None,unsupportedFeatures: Seq[String] = Seq.empty, tracksPartitionsInCatalog: Boolean = false,schemaPreservesCase: Boolean = true) {* Thanks and regards Bhavya On Tue, Nov 7, 2017 at 7:17 AM, Lionel CLwrote: > mvn -DskipTests -Pspark-2.1 clean package > The pom file was changed as which provided in former email. > > > > 在 2017/11/6 下午7:47,“Bhavya Aggarwal” 写入: > > >Hi, > > > >Can you please let me know how are you building the Carbondata assembly > >jar, or which command you are running to build carbondata. > > > >Regards > >Bhavya > > > >On Mon, Nov 6, 2017 at 2:18 PM, Lionel CL wrote: > > > >> Yes, there is a catalyst jar under the path > /opt/cloudera/parcels/SPARK2/ > >> lib/spark2/jars/ > >> > >> spark-catalyst_2.11-2.1.0.cloudera1.jar > >> > >> > >> > >> > >> > >> > >> > >> 在 2017/11/6 下午4:12,“Bhavya Aggarwal” 写入: > >> > >> >Hi, > >> > > >> >Can you please check if you have spark-catalyst jar in $SPARK_HOME/jars > >> >folder for your cloudera version, if its not there please try to > include > >> >it and retry. > >> > > >> >Thanks and regards > >> >Bhavya > >> > > >> >On Sun, Nov 5, 2017 at 7:24 PM, Lionel CL > wrote: > >> > > >> >> I have the same problem in CDH 5.8.0 > >> >> spark2 version is 2.1.0.cloudera1 > >> >> carbondata version 1.2.0. > >> >> > >> >> There's no error occurred when using open source version spark. > >> >> > >> >> 2.6.0-cdh5.8.0 > >> >> 2.1.0.cloudera1 > >> >> 2.11 > >> >> 2.11.8 > >> >> > >> >> > >> >> scala> cc.sql("create table t111(vin string) stored by 'carbondata'") > >> >> 17/11/03 10:22:03 AUDIT command.CreateTable: [][][Thread-1]Creating > >> Table > >> >> with Database name [default] and Table name [t111] > >> >> java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst. > >> >> catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/ > >> >> TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/ > >> >> CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/ > >> >> CatalogStorageFormat;Lorg/apache/spark/sql/types/StructT > >> >> ype;Lscala/Option;Lscala/collection/Seq;Lscala/Option; > >> >> Ljava/lang/String;JJLscala/collection/immutable/Map; > >> >> Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option; > >> >> Lscala/collection/Seq;Z)Lorg/apache/spark/sql/catalyst/ > >> >> catalog/CatalogTable; > >> >> at org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar > >> >> bonSchema(CarbonSource.scala:253) > >> >> at org.apache.spark.sql.execution.command.DDLStrategy.apply( > >> >> DDLStrategy.scala:135) > >> >> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun > >> >> $1.apply(QueryPlanner.scala:62) > >> >> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun > >> >> $1.apply(QueryPlanner.scala:62) > >> >> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > >> >> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > >> >> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) > >> >> > >> >> > >> >> 在 2017/11/1 上午1:58,“chenliang613” > chenlia > >> >> ng6...@gmail.com>> 写入: > >> >> > >> >> Hi > >> >> > >> >> Did you use open source spark version? > >> >> > >> >> Can you provide more detail info : > >> >> 1. which carbondata version and spark version, you used ? > >> >> 2. Can you share with us , reproduce script and steps. > >> >> > >> >> Regards > >> >> Liang > >> >> > >> >> > >> >> hujianjun wrote > >> >> scala> carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table(id > >> string,name > >> >> string,city string,age Int)STORED BY 'carbondata'") > >> >> 17/10/23 19:13:52 AUDIT command.CarbonCreateTableCommand: > >> >> [master][root][Thread-1]Creating Table with Database name > [clb_carbon] > >> and > >> >> Table name [carbon_table] > >> >> java.lang.NoSuchMethodError: > >> >> org.apache.spark.sql.catalyst.catalog.CatalogTable.copy(Lorg > >> >> /apache/spark/sql/catalyst/TableIdentifier;Lorg/apache/ > >> >> spark/sql/catalyst/catalog/CatalogTableType;Lorg/apache/ >
Re: Error while creating table in carbondata
mvn -DskipTests -Pspark-2.1 clean package The pom file was changed as which provided in former email. 在 2017/11/6 下午7:47,“Bhavya Aggarwal”写入: >Hi, > >Can you please let me know how are you building the Carbondata assembly >jar, or which command you are running to build carbondata. > >Regards >Bhavya > >On Mon, Nov 6, 2017 at 2:18 PM, Lionel CL wrote: > >> Yes, there is a catalyst jar under the path /opt/cloudera/parcels/SPARK2/ >> lib/spark2/jars/ >> >> spark-catalyst_2.11-2.1.0.cloudera1.jar >> >> >> >> >> >> >> >> 在 2017/11/6 下午4:12,“Bhavya Aggarwal” 写入: >> >> >Hi, >> > >> >Can you please check if you have spark-catalyst jar in $SPARK_HOME/jars >> >folder for your cloudera version, if its not there please try to include >> >it and retry. >> > >> >Thanks and regards >> >Bhavya >> > >> >On Sun, Nov 5, 2017 at 7:24 PM, Lionel CL wrote: >> > >> >> I have the same problem in CDH 5.8.0 >> >> spark2 version is 2.1.0.cloudera1 >> >> carbondata version 1.2.0. >> >> >> >> There's no error occurred when using open source version spark. >> >> >> >> 2.6.0-cdh5.8.0 >> >> 2.1.0.cloudera1 >> >> 2.11 >> >> 2.11.8 >> >> >> >> >> >> scala> cc.sql("create table t111(vin string) stored by 'carbondata'") >> >> 17/11/03 10:22:03 AUDIT command.CreateTable: [][][Thread-1]Creating >> Table >> >> with Database name [default] and Table name [t111] >> >> java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst. >> >> catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/ >> >> TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/ >> >> CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/ >> >> CatalogStorageFormat;Lorg/apache/spark/sql/types/StructT >> >> ype;Lscala/Option;Lscala/collection/Seq;Lscala/Option; >> >> Ljava/lang/String;JJLscala/collection/immutable/Map; >> >> Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option; >> >> Lscala/collection/Seq;Z)Lorg/apache/spark/sql/catalyst/ >> >> catalog/CatalogTable; >> >> at org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar >> >> bonSchema(CarbonSource.scala:253) >> >> at org.apache.spark.sql.execution.command.DDLStrategy.apply( >> >> DDLStrategy.scala:135) >> >> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun >> >> $1.apply(QueryPlanner.scala:62) >> >> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun >> >> $1.apply(QueryPlanner.scala:62) >> >> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) >> >> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) >> >> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) >> >> >> >> >> >> 在 2017/11/1 上午1:58,“chenliang613” chenlia >> >> ng6...@gmail.com>> 写入: >> >> >> >> Hi >> >> >> >> Did you use open source spark version? >> >> >> >> Can you provide more detail info : >> >> 1. which carbondata version and spark version, you used ? >> >> 2. Can you share with us , reproduce script and steps. >> >> >> >> Regards >> >> Liang >> >> >> >> >> >> hujianjun wrote >> >> scala> carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table(id >> string,name >> >> string,city string,age Int)STORED BY 'carbondata'") >> >> 17/10/23 19:13:52 AUDIT command.CarbonCreateTableCommand: >> >> [master][root][Thread-1]Creating Table with Database name [clb_carbon] >> and >> >> Table name [carbon_table] >> >> java.lang.NoSuchMethodError: >> >> org.apache.spark.sql.catalyst.catalog.CatalogTable.copy(Lorg >> >> /apache/spark/sql/catalyst/TableIdentifier;Lorg/apache/ >> >> spark/sql/catalyst/catalog/CatalogTableType;Lorg/apache/ >> >> spark/sql/catalyst/catalog/CatalogStorageFormat;Lorg/ >> >> apache/spark/sql/types/StructType;Lscala/Option;Lscala/ >> >> collection/Seq;Lscala/Option;Ljava/lang/String;JJLscala/ >> >> collection/immutable/Map;Lscala/Option;Lscala/Option; >> >> Lscala/Option;Lscala/Option;Lscala/collection/Seq;Z)Lorg/ >> >> apache/spark/sql/catalyst/catalog/CatalogTable; >> >>at >> >> org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar >> >> bonSchema(CarbonSource.scala:253) >> >>at >> >> org.apache.spark.sql.execution.strategy.DDLStrategy.apply( >> >> DDLStrategy.scala:154) >> >>at >> >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun >> >> $1.apply(QueryPlanner.scala:62) >> >>at >> >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun >> >> $1.apply(QueryPlanner.scala:62) >> >>at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) >> >>at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) >> >>at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) >> >>at >> >> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(Que >> >> ryPlanner.scala:92) >> >>at >> >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun >> >> $2$$anonfun$apply$2.apply(QueryPlanner.scala:77) >> >>at >> >>
Re: [Discussion] Support pre-aggregate table to improve OLAP performance
Hi Bill, Please find my comments. 1. We are not supporting join queries in this design so it will be always one parent table for an aggregate table. We may consider the join queries for creating aggregation queries in future. 2. Aggregation column name will be created internally and it would be line agg_parentcolumnname. 3. Yes if we create aggtable on dictionary column of parent table then it uses same parent dictionary. Aggregation table does not generate any dictionary files. 4. time-series.eventtime is the time column of the main table, there should be at least one timestamp column on the main table to create timeseries tables. In design, the granularity is replaced with hierarchy it means the user can give the time hierarchy like a minute, hour, day so three aggregation tables of a minute , hour and day aggregation tables will be created automatically and loaded the data to them for every load. 5. In new design v1.1 it is now changed please check the same. 6. As I mentioned above in new V1.1 design it got changed to hierarchy so user can define his own time hierarchy. 7. Ok, we will discuss and check whether we can expose this SORT_COLUMNS configuration on aggregation table. Even if we don't support now we can expose in future. 8. Yes, merge index s applicable for aggregation table as well. Regards, Ravindra. On 3 November 2017 at 09:05, bill.zhouwrote: > hi Jacky & Ravindra, I have little more query about this design, thank you > very much can clarify my query. > > > 1. if we support create aggreagation tables from two or more tabels join, > how to set the aggretate.parent?, whether can be like > 'aggretate.parent'='fact1,dim1,dim1' > 2. what's the agg table colum name ? for following create command it will > be > as: user_id,name,c2, price ? > CREATE TABLE agg_sales > STORED BY 'carbondata' > TBLPROPERTIES ('aggregate.parent'='sales') > AS SELECT user_id,user_name as name, sum(quantity) as c2, avg(price) FROM > sales GROUP BY user_id. > 3. if we create the dictioanry column in agg table, whether the dictionary > file will use the same one main table? > > 4. for rollup table main table creation: what's the mean for > timeseries.eventtime, granualarity? what's column can belong to this? > 5. for rollup table main table creation: what's the mean for > ‘timeseries.aggtype’ =’quantity:sum, max', it means the column quantity > only > support sum, max ? > > 6. In both the above cases carbon generates the 4 pre-aggregation tables > automatically for > year, month, day and hour. (their table name will be prefixed with > agg_sales). -- in about cause only see the column hour, how to generate the > year, month and day ? > > 7.In internal implementation, carbon will create these table with > SORT_COLUMNS=’group by > column defined above’, so that filter group by query on main table will be > faster because it > can leverage the index in pre-aggregate tables. -- I suggstion user can > control the sort columns order > 8. whether support merge index to agg table ? -- it is usefull. > > > Jacky Li wrote > > Hi community, > > > > In traditional data warehouse, pre-aggregate table or cube is a common > > technology to improve OLAP query performance. To take carbondata support > > for OLAP to next level, I’d like to propose pre-aggregate table support > in > > carbondata. > > > > Please refer to CARBONDATA-1516 > > https://issues.apache.org/jira/browse/CARBONDATA-1516; and the > > design document attached in the JIRA ticket > > (https://issues.apache.org/jira/browse/CARBONDATA-1516 > > https://issues.apache.org/jira/browse/CARBONDATA-1516;) > > > > This design is still in initial phase, proposed usage and SQL syntax are > > subject to change. Please provide your comment to improve this feature. > > Any suggestion on the design from community is welcomed. > > > > Regards, > > Jacky Li > > > > > > -- > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556. > n5.nabble.com/ > -- Thanks & Regards, Ravi
Re: Error while creating table in carbondata
Hi, Can you please let me know how are you building the Carbondata assembly jar, or which command you are running to build carbondata. Regards Bhavya On Mon, Nov 6, 2017 at 2:18 PM, Lionel CLwrote: > Yes, there is a catalyst jar under the path /opt/cloudera/parcels/SPARK2/ > lib/spark2/jars/ > > spark-catalyst_2.11-2.1.0.cloudera1.jar > > > > > > > > 在 2017/11/6 下午4:12,“Bhavya Aggarwal” 写入: > > >Hi, > > > >Can you please check if you have spark-catalyst jar in $SPARK_HOME/jars > >folder for your cloudera version, if its not there please try to include > >it and retry. > > > >Thanks and regards > >Bhavya > > > >On Sun, Nov 5, 2017 at 7:24 PM, Lionel CL wrote: > > > >> I have the same problem in CDH 5.8.0 > >> spark2 version is 2.1.0.cloudera1 > >> carbondata version 1.2.0. > >> > >> There's no error occurred when using open source version spark. > >> > >> 2.6.0-cdh5.8.0 > >> 2.1.0.cloudera1 > >> 2.11 > >> 2.11.8 > >> > >> > >> scala> cc.sql("create table t111(vin string) stored by 'carbondata'") > >> 17/11/03 10:22:03 AUDIT command.CreateTable: [][][Thread-1]Creating > Table > >> with Database name [default] and Table name [t111] > >> java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst. > >> catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/ > >> TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/ > >> CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/ > >> CatalogStorageFormat;Lorg/apache/spark/sql/types/StructT > >> ype;Lscala/Option;Lscala/collection/Seq;Lscala/Option; > >> Ljava/lang/String;JJLscala/collection/immutable/Map; > >> Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option; > >> Lscala/collection/Seq;Z)Lorg/apache/spark/sql/catalyst/ > >> catalog/CatalogTable; > >> at org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar > >> bonSchema(CarbonSource.scala:253) > >> at org.apache.spark.sql.execution.command.DDLStrategy.apply( > >> DDLStrategy.scala:135) > >> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun > >> $1.apply(QueryPlanner.scala:62) > >> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun > >> $1.apply(QueryPlanner.scala:62) > >> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > >> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > >> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) > >> > >> > >> 在 2017/11/1 上午1:58,“chenliang613” >> ng6...@gmail.com>> 写入: > >> > >> Hi > >> > >> Did you use open source spark version? > >> > >> Can you provide more detail info : > >> 1. which carbondata version and spark version, you used ? > >> 2. Can you share with us , reproduce script and steps. > >> > >> Regards > >> Liang > >> > >> > >> hujianjun wrote > >> scala> carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table(id > string,name > >> string,city string,age Int)STORED BY 'carbondata'") > >> 17/10/23 19:13:52 AUDIT command.CarbonCreateTableCommand: > >> [master][root][Thread-1]Creating Table with Database name [clb_carbon] > and > >> Table name [carbon_table] > >> java.lang.NoSuchMethodError: > >> org.apache.spark.sql.catalyst.catalog.CatalogTable.copy(Lorg > >> /apache/spark/sql/catalyst/TableIdentifier;Lorg/apache/ > >> spark/sql/catalyst/catalog/CatalogTableType;Lorg/apache/ > >> spark/sql/catalyst/catalog/CatalogStorageFormat;Lorg/ > >> apache/spark/sql/types/StructType;Lscala/Option;Lscala/ > >> collection/Seq;Lscala/Option;Ljava/lang/String;JJLscala/ > >> collection/immutable/Map;Lscala/Option;Lscala/Option; > >> Lscala/Option;Lscala/Option;Lscala/collection/Seq;Z)Lorg/ > >> apache/spark/sql/catalyst/catalog/CatalogTable; > >>at > >> org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar > >> bonSchema(CarbonSource.scala:253) > >>at > >> org.apache.spark.sql.execution.strategy.DDLStrategy.apply( > >> DDLStrategy.scala:154) > >>at > >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun > >> $1.apply(QueryPlanner.scala:62) > >>at > >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun > >> $1.apply(QueryPlanner.scala:62) > >>at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > >>at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > >>at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) > >>at > >> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(Que > >> ryPlanner.scala:92) > >>at > >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun > >> $2$$anonfun$apply$2.apply(QueryPlanner.scala:77) > >>at > >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun > >> $2$$anonfun$apply$2.apply(QueryPlanner.scala:74) > >>at > >> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(T > >> raversableOnce.scala:157) > >>at > >> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(T > >> raversableOnce.scala:157) > >>at
Re: Error while creating table in carbondata
Yes, there is a catalyst jar under the path /opt/cloudera/parcels/SPARK2/lib/spark2/jars/ spark-catalyst_2.11-2.1.0.cloudera1.jar 在 2017/11/6 下午4:12,“Bhavya Aggarwal”写入: >Hi, > >Can you please check if you have spark-catalyst jar in $SPARK_HOME/jars >folder for your cloudera version, if its not there please try to include >it and retry. > >Thanks and regards >Bhavya > >On Sun, Nov 5, 2017 at 7:24 PM, Lionel CL wrote: > >> I have the same problem in CDH 5.8.0 >> spark2 version is 2.1.0.cloudera1 >> carbondata version 1.2.0. >> >> There's no error occurred when using open source version spark. >> >> 2.6.0-cdh5.8.0 >> 2.1.0.cloudera1 >> 2.11 >> 2.11.8 >> >> >> scala> cc.sql("create table t111(vin string) stored by 'carbondata'") >> 17/11/03 10:22:03 AUDIT command.CreateTable: [][][Thread-1]Creating Table >> with Database name [default] and Table name [t111] >> java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst. >> catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/ >> TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/ >> CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/ >> CatalogStorageFormat;Lorg/apache/spark/sql/types/StructT >> ype;Lscala/Option;Lscala/collection/Seq;Lscala/Option; >> Ljava/lang/String;JJLscala/collection/immutable/Map; >> Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option; >> Lscala/collection/Seq;Z)Lorg/apache/spark/sql/catalyst/ >> catalog/CatalogTable; >> at org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar >> bonSchema(CarbonSource.scala:253) >> at org.apache.spark.sql.execution.command.DDLStrategy.apply( >> DDLStrategy.scala:135) >> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun >> $1.apply(QueryPlanner.scala:62) >> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun >> $1.apply(QueryPlanner.scala:62) >> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) >> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) >> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) >> >> >> 在 2017/11/1 上午1:58,“chenliang613” > ng6...@gmail.com>> 写入: >> >> Hi >> >> Did you use open source spark version? >> >> Can you provide more detail info : >> 1. which carbondata version and spark version, you used ? >> 2. Can you share with us , reproduce script and steps. >> >> Regards >> Liang >> >> >> hujianjun wrote >> scala> carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table(id string,name >> string,city string,age Int)STORED BY 'carbondata'") >> 17/10/23 19:13:52 AUDIT command.CarbonCreateTableCommand: >> [master][root][Thread-1]Creating Table with Database name [clb_carbon] and >> Table name [carbon_table] >> java.lang.NoSuchMethodError: >> org.apache.spark.sql.catalyst.catalog.CatalogTable.copy(Lorg >> /apache/spark/sql/catalyst/TableIdentifier;Lorg/apache/ >> spark/sql/catalyst/catalog/CatalogTableType;Lorg/apache/ >> spark/sql/catalyst/catalog/CatalogStorageFormat;Lorg/ >> apache/spark/sql/types/StructType;Lscala/Option;Lscala/ >> collection/Seq;Lscala/Option;Ljava/lang/String;JJLscala/ >> collection/immutable/Map;Lscala/Option;Lscala/Option; >> Lscala/Option;Lscala/Option;Lscala/collection/Seq;Z)Lorg/ >> apache/spark/sql/catalyst/catalog/CatalogTable; >>at >> org.apache.spark.sql.CarbonSource$.updateCatalogTableWithCar >> bonSchema(CarbonSource.scala:253) >>at >> org.apache.spark.sql.execution.strategy.DDLStrategy.apply( >> DDLStrategy.scala:154) >>at >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun >> $1.apply(QueryPlanner.scala:62) >>at >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun >> $1.apply(QueryPlanner.scala:62) >>at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) >>at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) >>at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439) >>at >> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(Que >> ryPlanner.scala:92) >>at >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun >> $2$$anonfun$apply$2.apply(QueryPlanner.scala:77) >>at >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun >> $2$$anonfun$apply$2.apply(QueryPlanner.scala:74) >>at >> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(T >> raversableOnce.scala:157) >>at >> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(T >> raversableOnce.scala:157) >>at scala.collection.Iterator$class.foreach(Iterator.scala:893) >>at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) >>at >> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) >>at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336) >>at >> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun >> $2.apply(QueryPlanner.scala:74) >>at >>
Re: Re: Delegation Token can be issued only with kerberos or web authentication" will occur in yarn cluster
dev I have copied core-site.xml to spark2 conf folder, it does not work. I will share you the jar I am using(please get the jar file from link download link :https://pan.baidu.com/s/1b6AG5Spassword:rww2 whould you help me to confirm whether the jar is suitable? If the jar is not suitable, could you please send me a suitable one to try? yixu2001 From: Naresh P R Date: 2017-11-04 01:26 To: dev Subject: Re: Re: Delegation Token can be issued only with kerberos or web authentication" will occur in yarn cluster Hi yixu2001, From hadoop code, i could see IOException("Delegation Token can be issued only with kerberos or web authentication") is thrown only if authentication method is set as "SIMPLE". private boolean isAllowedDelegationTokenOp() throws IOException { AuthenticationMethod authMethod = this.getConnectionAuthenticationMethod(); return !UserGroupInformation.isSecurityEnabled() || authMethod == AuthenticationMethod.KERBEROS || authMethod == AuthenticationMethod.KERBEROS_SSL || authMethod == AuthenticationMethod.CERTIFICATE; } Token getDelegationToken(Text renewer) throws IOException { if (!this.isAllowedDelegationTokenOp()) { throw new IOException("Delegation Token can be issued only with kerberos or web authentication"); } } Can you try to execute queries by copying core-site.xml from hadoop-conf folder to spark2 conf folder & classpath of spark-submit? From the provided logs, i could see carbondata_2.11-1.1.1-bdd-hadoop2.7.2.jar size is 9607344bytes, please make sure it has only carbondata classes. I could see Carbon explicitly calling "TokenCache.obtainTokensForNamenodes" which is throwing this exception. If above mentioned steps dint work, you can raise a JIRA to investigate further on this. Regards, Naresh P R On Fri, Nov 3, 2017 at 3:10 PM, yixu2001wrote: > prnaresh.naresh, dev > > The carbon jar I used does not include hadoop classes & core-site.xml. > The attachment include the jar list while submtting > spark job, please confirm it. > -- > yixu2001 > > > *From:* Naresh P R > *Date:* 2017-11-03 16:07 > *To:* yixu2001 > *Subject:* Re: Re: Delegation Token can be issued only with kerberos or > web authentication" will occur in yarn cluster > Hi yixu2001, > > Are you using carbon shaded jar with hadoop classes & core-site.xml > included in carbon jar ? > > If so, can you try to use carbondata individual component jars while > submtting spark job? > > As per my understanding, this happens if client core-site.xml has > hadoop.security.authentication=simple & hdfs is kerberized. > > You can also enable verbose to see hadoop jars used in the error trace > while querying carbon tables. > > Also i am not sure whether CarbonData is tested in HDP kerberos cluster. > --- > Regards, > Naresh P R > > > On Fri, Nov 3, 2017 at 8:36 AM, yixu2001 wrote: > >> Naresh P R: >> For the attachments can not be uploaded in maillist,I have >> add the attachments to the mail for you, please check it. >> Our platform is installed with HDP 2.4, but spark 2.1 is >> not included in HDP 2.4, we using spark 2.1 with additiona >> l installed of apache version. >> -- >> yixu2001 >> >> >> *From:* Naresh P R >> *Date:* 2017-11-02 22:02 >> *To:* dev >> *Subject:* Re: Re: Delegation Token can be issued only with kerberos or >> web authentication" will occur in yarn cluster >> Hi yixu, >> >> I am not able to see any attachment in your previous mail. >> --- >> Regards, >> Naresh P R >> >> On Thu, Nov 2, 2017 at 4:40 PM, yixu2001 wrote: >> >>> dev >>> Please refer to the attachment "cluster carbon error2.txt" >>> for the log trace. >>> In this log, I try 2 query statements: >>> select * from e_carbon.prod_inst_his prod_inst_his is >>> a hive table, it success. >>> select * from e_carbon.prod_inst_his_c prod_inst_his_c i >>> s a carbon table, it failed. >>> >>> I pass the principal in my start script, please refer to the >>> attachment "testCluster.sh >>> >>> ". >>> >>> I have set hive.server2.enable.doAs = false in the above tes >>> t and I have printed it in the log. >>> -- >>> yixu2001 >>> >>> >>> *From:* Naresh P R >>> *Date:* 2017-11-01 19:40 >>> *To:* dev >>> *Subject:* Re: Delegation Token can be issued only with kerberos or web >>> authentication" will occur in yarn cluster >>> Hi, >>> >>> Ideally kerberos authentication should work with carbon table, Can you >>> share us log trace to analyze further more? >>> >>> how are you passing the principal in yarn cluster ? >>> >>> can you try to set hive.server2.enable.doAs = false & run query on carbon >>> table ? >>> >>> Regards, >>> Naresh P R >>> >>> On Wed, Nov 1, 2017 at 3:33 PM,