yan <ryan.hd....@gmail.com>
> *发送时间:* 2017年4月17日 16:48:47
> *收件人:* 莫涛
> *抄送:* user
> *主题:* Re: 答复: 答复: How to store 10M records in HDFS to speed up further
> filtering?
>
> how about the event timeline on executors? It seems add more executor
> could help.
>
&g
ail.com>
发送时间: 2017年4月17日 16:48:47
收件人: 莫涛
抄送: user
主题: Re: 答复: 答复: How to store 10M records in HDFS to speed up further filtering?
how about the event timeline on executors? It seems add more executor could
help.
1. I found a jira(https://issues.apache.org/jira/browse/SPARK-11621) that
state
件人: Jörn Franke <jornfra...@gmail.com<mailto:jornfra...@gmail.com>>
发送时间: 2017年4月17日 22:37:48
收件人: 莫涛
抄送: user@spark.apache.org<mailto:user@spark.apache.org>
主题: Re: 答复: How to store 10M records in HDFS to speed up further filtering?
Yes 5 mb is a difficult size, too small for HDFS too b
> --
> *发件人:* Jörn Franke <jornfra...@gmail.com>
> *发送时间:* 2017年4月17日 22:37:48
> *收件人:* 莫涛
> *抄送:* user@spark.apache.org
> *主题:* Re: 答复: How to store 10M records in HDFS to speed up further
> filtering?
>
> Yes 5 mb is a difficult siz
ache.org
主题: Re: 答复: How to store 10M records in HDFS to speed up further filtering?
Yes 5 mb is a difficult size, too small for HDFS too big for parquet/orc.
Maybe you can put the data in a HAR and store id, path in orc/parquet.
On 17. Apr 2017, at 10:52, 莫涛 <mo...@sensetime.com<mailto:mo.
epends on the distribution of the
> given ID list. No partition could be skipped in the worst case.
>
>
> Mo Tao
>
>
>
> ----------
> *发件人:* Ryan <ryan.hd@gmail.com>
> *发送时间:* 2017年4月17日 15:42:46
> *收件人:* 莫涛
> *抄送:* user
> *主题:*
f the given ID
list. No partition could be skipped in the worst case.
Mo Tao
发件人: Ryan <ryan.hd@gmail.com>
发送时间: 2017年4月17日 15:42:46
收件人: 莫涛
抄送: user
主题: Re: 答复: How to store 10M records in HDFS to speed up further filtering?
1. Per my understanding