Re: Re: repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-26 Thread
I have replace default java serialization with Kyro. It indeed reduce the shuffle size and the performance has been improved, however the shuffle speed remains unchanged. I am quite newbie to Spark, does anyone have idea about towards which direction I should go to find the root cause? 周千昊 <

Re: Re: repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-23 Thread
ta, cause we kinda copied MR implementation > into Spark. > > Let us know if more info is needed. > > On Fri, Oct 23, 2015 at 10:24 AM, 周千昊 <qhz...@apache.org> wrote: > > > +kylin dev list > > > > 周千昊 <qhz...@apache.org>于2015年10月23日周五 上午10:20写道: > >

repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-22 Thread
Hi, spark community I have an application which I try to migrate from MR to Spark. It will do some calculations from Hive and output to hfile which will be bulk load to HBase Table, details as follow: Rdd input = getSourceInputFromHive() Rdd>

Re: repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-22 Thread
+kylin dev list 周千昊 <qhz...@apache.org>于2015年10月23日周五 上午10:20写道: > Hi, Reynold > Using glom() is because it is easy to adapt to calculation logic > already implemented in MR. And o be clear, we are still in POC. > Since the results shows there is almost no

Re: repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-22 Thread
do you do a glom? It seems unnecessarily expensive to materialize each > partition in memory. > > > On Thu, Oct 22, 2015 at 2:02 AM, 周千昊 <qhz...@apache.org> wrote: > >> Hi, spark community >> I have an application which I try to migrate from MR to Spark. >&

Re: please help with ClassNotFoundException

2015-08-14 Thread
in production? Spark 1.3 is better than spark1.4. -- 原始邮件 -- *发件人:* 周千昊;z.qian...@gmail.com; *发送时间:* 2015年8月14日(星期五) 中午11:14 *收件人:* Sea261810...@qq.com; dev@spark.apache.org dev@spark.apache.org; *主题:* Re: please help with ClassNotFoundException Hi Sea I have

avoid creating small objects

2015-08-14 Thread
Hi, All I want to do is that, 1. read from some source 2. do some calculation to get some byte array 3. write the byte array to hdfs In hadoop, I can share an ImmutableByteWritable, and do some System.arrayCopy, it will prevent the application from creating a lot of small

Re: avoid creating small objects

2015-08-14 Thread
I am thinking that creating a shared object outside the closure, use this object to hold the byte array. will this work? 周千昊 qhz...@apache.org于2015年8月14日周五 下午4:02写道: Hi, All I want to do is that, 1. read from some source 2. do some calculation to get some byte array 3. write

please help with ClassNotFoundException

2015-08-13 Thread
Hi, I am using spark 1.4 when an issue occurs to me. I am trying to use the aggregate function: JavaRddString rdd = some rdd; HashMapLong, TypeA zeroValue = new HashMap(); // add initial key-value pair for zeroValue rdd.aggregate(zeroValue, new

Re: please help with ClassNotFoundException

2015-08-13 Thread
Hi sea Is it the same issue as https://issues.apache.org/jira/browse/SPARK-8368 Sea 261810...@qq.com于2015年8月13日周四 下午6:52写道: Are you using 1.4.0? If yes, use 1.4.1 -- 原始邮件 -- *发件人:* 周千昊;qhz...@apache.org; *发送时间:* 2015年8月13日(星期四) 晚上6:04 *收件人:* devdev

Re: please help with ClassNotFoundException

2015-08-13 Thread
Hi Sea I have updated spark to 1.4.1, however the problem still exists, any idea? Sea 261810...@qq.com于2015年8月14日周五 上午12:36写道: Yes, I guess so. I see this bug before. -- 原始邮件 -- *发件人:* 周千昊;z.qian...@gmail.com; *发送时间:* 2015年8月13日(星期四) 晚上9:30 *收件人