from:"莫涛"

答复: 答复: 答复: How to store 10M records in HDFS to speed up further filtering?

2017-04-20 Thread 莫涛

ail.com> 发送时间: 2017年4月17日 16:48:47 收件人: 莫涛抄送: user 主题: Re: 答复: 答复: How to store 10M records in HDFS to speed up further filtering? how about the event timeline on executors? It seems add more executor could help. 1. I found a jira(https://issues.apache.org/jira/browse/SPARK-11621) that state

答复: 答复: 答复: How to store 10M records in HDFS to speed up further filtering?

2017-04-20 Thread 莫涛

It's hadoop archive. https://hadoop.apache.org/docs/r1.2.1/hadoop_archives.html 发件人: Alonso Isidoro Roman <alons...@gmail.com> 发送时间: 2017年4月20日 17:03:33 收件人: 莫涛抄送: Jörn Franke; user@spark.apache.org 主题: Re: 答复: 答复: How to store 10M records in HDFS to sp

答复: 答复: How to store 10M records in HDFS to speed up further filtering?

2017-04-20 Thread 莫涛

t I expected: "only the requested BINARY are scanned". Moreover, HAR provides directly access to each record by hdfs shell command. Thank you very much! 发件人: Jörn Franke <jornfra...@gmail.com> 发送时间: 2017年4月17日 22:37:48 收件人: 莫涛抄送: user@spark.ap

答复: How to store 10M records in HDFS to speed up further filtering?

2017-04-17 Thread 莫涛

best knowledge, HBase works best for record around hundreds of KB and it requires extra work of the cluster administrator. So this would be the last option. Thanks! Mo Tao 发件人: Jörn Franke <jornfra...@gmail.com> 发送时间: 2017年4月17日 15:59:28 收件人: 莫涛抄送

答复: 答复: How to store 10M records in HDFS to speed up further filtering?

2017-04-17 Thread 莫涛

f the given ID list. No partition could be skipped in the worst case. Mo Tao 发件人: Ryan <ryan.hd@gmail.com> 发送时间: 2017年4月17日 15:42:46 收件人: 莫涛抄送: user 主题: Re: 答复: How to store 10M records in HDFS to speed up further filtering? 1. Per my understanding

答复: How to store 10M records in HDFS to speed up further filtering?

2017-04-17 Thread 莫涛

ng I'm looking for! Could you kindly provide some links for reference? I found nothing in spark document about index or bloom filter working inside partition. Thanks very much! Mo Tao 发件人: Ryan <ryan.hd@gmail.com> 发送时间: 2017年4月17日 14:32:00 收件人: 莫涛抄送: u

答复: 答复: how to generate a column using mapParition and then add it back to the df?

2016-08-08 Thread 莫涛

Hi guha, Thanks a lot! This is perfectly what I want and I'll try to implement it. MoTao 发件人: ayan guha <guha.a...@gmail.com> 发送时间: 2016年8月8日 18:05:37 收件人: 莫涛抄送: ndj...@gmail.com; user@spark.apache.org 主题: Re: 答复: how to generate a column using mapPa

答复: how to generate a column using mapParition and then add it back to the df?

2016-08-08 Thread 莫涛

as possible as I can. Best 发件人: ndj...@gmail.com <ndj...@gmail.com> 发送时间: 2016年8月8日 17:16:27 收件人: 莫涛抄送: user@spark.apache.org 主题: Re: how to generate a column using mapParition and then add it back to the df? Hi MoTao, What about broadcasting the model? Cheers,

答复: 答复: 答复: How to store 10M records in HDFS to speed up further filtering?

答复: 答复: 答复: How to store 10M records in HDFS to speed up further filtering?

答复: 答复: How to store 10M records in HDFS to speed up further filtering?

答复: How to store 10M records in HDFS to speed up further filtering?

答复: 答复: How to store 10M records in HDFS to speed up further filtering?

答复: How to store 10M records in HDFS to speed up further filtering?

答复: 答复: how to generate a column using mapParition and then add it back to the df?

答复: how to generate a column using mapParition and then add it back to the df?

8 matches

Site Navigation

Mail list logo

Footer information