Re: 答复: 答复: 答复: How to store 10M records in HDFS to speed up further filtering?

2017-04-20 Thread Ryan
yan <ryan.hd....@gmail.com> > *发送时间:* 2017年4月17日 16:48:47 > *收件人:* 莫涛 > *抄送:* user > *主题:* Re: 答复: 答复: How to store 10M records in HDFS to speed up further > filtering? > > how about the event timeline on executors? It seems add more executor > could help. > &g

答复: 答复: 答复: How to store 10M records in HDFS to speed up further filtering?

2017-04-20 Thread 莫涛
ail.com> 发送时间: 2017年4月17日 16:48:47 收件人: 莫涛 抄送: user 主题: Re: 答复: 答复: How to store 10M records in HDFS to speed up further filtering? how about the event timeline on executors? It seems add more executor could help. 1. I found a jira(https://issues.apache.org/jira/browse/SPARK-11621) that state

答复: 答复: 答复: How to store 10M records in HDFS to speed up further filtering?

2017-04-20 Thread 莫涛
件人: Jörn Franke <jornfra...@gmail.com<mailto:jornfra...@gmail.com>> 发送时间: 2017年4月17日 22:37:48 收件人: 莫涛 抄送: user@spark.apache.org<mailto:user@spark.apache.org> 主题: Re: 答复: How to store 10M records in HDFS to speed up further filtering? Yes 5 mb is a difficult size, too small for HDFS too b

Re: 答复: 答复: How to store 10M records in HDFS to speed up further filtering?

2017-04-20 Thread Alonso Isidoro Roman
> -- > *发件人:* Jörn Franke <jornfra...@gmail.com> > *发送时间:* 2017年4月17日 22:37:48 > *收件人:* 莫涛 > *抄送:* user@spark.apache.org > *主题:* Re: 答复: How to store 10M records in HDFS to speed up further > filtering? > > Yes 5 mb is a difficult siz

答复: 答复: How to store 10M records in HDFS to speed up further filtering?

2017-04-20 Thread 莫涛
ache.org 主题: Re: 答复: How to store 10M records in HDFS to speed up further filtering? Yes 5 mb is a difficult size, too small for HDFS too big for parquet/orc. Maybe you can put the data in a HAR and store id, path in orc/parquet. On 17. Apr 2017, at 10:52, 莫涛 <mo...@sensetime.com<mailto:mo.

Re: 答复: 答复: How to store 10M records in HDFS to speed up further filtering?

2017-04-17 Thread Ryan
epends on the distribution of the > given ID list. No partition could be skipped in the worst case. > > > Mo Tao > > > > ---------- > *发件人:* Ryan <ryan.hd@gmail.com> > *发送时间:* 2017年4月17日 15:42:46 > *收件人:* 莫涛 > *抄送:* user > *主题:*

答复: 答复: How to store 10M records in HDFS to speed up further filtering?

2017-04-17 Thread 莫涛
f the given ID list. No partition could be skipped in the worst case. Mo Tao 发件人: Ryan <ryan.hd@gmail.com> 发送时间: 2017年4月17日 15:42:46 收件人: 莫涛 抄送: user 主题: Re: 答复: How to store 10M records in HDFS to speed up further filtering? 1. Per my understanding