I have replace default java serialization with Kyro.
It indeed reduce the shuffle size and the performance has been improved,
however the shuffle speed remains unchanged.
I am quite newbie to Spark, does anyone have idea about towards which
direction I should go to find the root cause?
周千昊 <
ta, cause we kinda copied MR implementation
> into Spark.
>
> Let us know if more info is needed.
>
> On Fri, Oct 23, 2015 at 10:24 AM, 周千昊 <qhz...@apache.org> wrote:
>
> > +kylin dev list
> >
> > 周千昊 <qhz...@apache.org>于2015年10月23日周五 上午10:20写道:
> >
Hi, spark community
I have an application which I try to migrate from MR to Spark.
It will do some calculations from Hive and output to hfile which will
be bulk load to HBase Table, details as follow:
Rdd input = getSourceInputFromHive()
Rdd>
+kylin dev list
周千昊 <qhz...@apache.org>于2015年10月23日周五 上午10:20写道:
> Hi, Reynold
> Using glom() is because it is easy to adapt to calculation logic
> already implemented in MR. And o be clear, we are still in POC.
> Since the results shows there is almost no
do you do a glom? It seems unnecessarily expensive to materialize each
> partition in memory.
>
>
> On Thu, Oct 22, 2015 at 2:02 AM, 周千昊 <qhz...@apache.org> wrote:
>
>> Hi, spark community
>> I have an application which I try to migrate from MR to Spark.
>&
in production? Spark 1.3 is better than spark1.4.
-- 原始邮件 --
*发件人:* 周千昊;z.qian...@gmail.com;
*发送时间:* 2015年8月14日(星期五) 中午11:14
*收件人:* Sea261810...@qq.com; dev@spark.apache.org
dev@spark.apache.org;
*主题:* Re: please help with ClassNotFoundException
Hi Sea
I have
Hi,
All I want to do is that,
1. read from some source
2. do some calculation to get some byte array
3. write the byte array to hdfs
In hadoop, I can share an ImmutableByteWritable, and do some
System.arrayCopy, it will prevent the application from creating a lot of
small
I am thinking that creating a shared object outside the closure, use this
object to hold the byte array.
will this work?
周千昊 qhz...@apache.org于2015年8月14日周五 下午4:02写道:
Hi,
All I want to do is that,
1. read from some source
2. do some calculation to get some byte array
3. write
Hi,
I am using spark 1.4 when an issue occurs to me.
I am trying to use the aggregate function:
JavaRddString rdd = some rdd;
HashMapLong, TypeA zeroValue = new HashMap();
// add initial key-value pair for zeroValue
rdd.aggregate(zeroValue,
new
Hi sea
Is it the same issue as https://issues.apache.org/jira/browse/SPARK-8368
Sea 261810...@qq.com于2015年8月13日周四 下午6:52写道:
Are you using 1.4.0? If yes, use 1.4.1
-- 原始邮件 --
*发件人:* 周千昊;qhz...@apache.org;
*发送时间:* 2015年8月13日(星期四) 晚上6:04
*收件人:* devdev
Hi Sea
I have updated spark to 1.4.1, however the problem still exists, any
idea?
Sea 261810...@qq.com于2015年8月14日周五 上午12:36写道:
Yes, I guess so. I see this bug before.
-- 原始邮件 --
*发件人:* 周千昊;z.qian...@gmail.com;
*发送时间:* 2015年8月13日(星期四) 晚上9:30
*收件人
11 matches
Mail list logo