RE: How to increase split number for Fact distinct columns when using spark engine?(picture added)

陈熹（chenxi07）-技术产品中心 Mon, 05 Nov 2018 00:44:23 -0800

Hi:
       Thank you for your suggestion!
I checked source code. The ‘kylin.engine.spark.rdd-partition-cut-mb’ parameter 
is only used for spark cubing job, not fact table distinct count job.
Maybe it’s worth adding this parameter to fact table distinct count job!


--
Best regards,

Xi Chen


From: Support DrakosData <[email protected]>
Sent: Monday, November 5, 2018 4:01 PM
To: [email protected]; 陈熹（chenxi07）-技术产品中心 <[email protected]>
Subject: Re: How to increase split number for Fact distinct columns when using 
spark engine?(picture added)


Hi Xi Chen



I think your refer to 'kylin.engine.spark.rdd-partition-cut-mb' parameter

(howto_optimize_build.html was written for 2.1-2.2 version I don't remember, 
but these concepts are very important yet, I recommended read it)


On 5/11/18 8:52, 陈熹（chenxi07）-技术产品中心 wrote:

Hi, shaofeng:

  Thank you for your reply.

  I've checked the doc, it does not contain tuning for spark engine.



--

Best regards,



Xi Chen



-----Original Message-----

From: ShaoFeng Shi <[email protected]><mailto:[email protected]>

Sent: Monday, November 5, 2018 3:30 PM

To: dev <[email protected]><mailto:[email protected]>

Subject: Re: How to increase split number for Fact distinct columns when using 
spark engine?(picture added)



Please check this doc:

https://kylin.apache.org/docs/howto/howto_optimize_build.html



陈熹（chenxi07）-技术产品中心 <[email protected]><mailto:[email protected]> 于2018年11月5日周一 
下午3:25写道：



Hi:



       I’m sorry the picture is dead again.



       I upload it as attachment this time







--



Best regards,







Xi Chen











*From:* 陈熹（chenxi07）-技术产品中心 <[email protected]><mailto:[email protected]>

*Sent:* Monday, November 5, 2018 3:04 PM

*To:* [email protected]<mailto:[email protected]>

*Subject:* How to increase split number for Fact distinct columns when

using spark engine?(picture added)







Hi, ALL:



       I’m using spark engine to build cube.



Now I found the bottleneck of build time lies in the #3 Step Name:

Extract Fact Table Distinct Columns.



When I look into the spark application, I found there is only two

splits regardless of how large the input sequence file is.



I wonder how to increase the number of split for this step?



I’m new to spark and any help will be great thanks!







P.S. Spark job of #3 Step Name: Extract Fact Table Distinct Columns.



--



Best regards,







Xi Chen















--

Best regards,



Shaofeng Shi 史少锋

--



========================

DRAKOSDATA: Apache Kylin Experts

Kyligence Partner Spain (Madrid)

========================



This message and any attachments are confidential and privileged and intented 
for the use of the addressee only.

You are entitled to exercise your rights of access, rectification, cancellation 
and opposition by addressing such written application to address 
[email protected]<mailto:[email protected]>

RE: How to increase split number for Fact distinct columns when using spark engine?(picture added)

Reply via email to