Github user allwefantasy commented on the pull request:

    https://github.com/apache/spark/pull/1983#issuecomment-55238978
  
    @witgo 感谢这个技巧的分享。  
我目前还遇到一个问题。昨天你
问我这边24w文档的词数是多少,我统计了下,是 2400w 
计算方式是(parsedData.map(f:Document=>f.content.size).sum()),词空间是8w。
 
第一轮初始迭代非常快,只要分钟左右就跑完。到第二轮后,每个task
 大概需要序列化26m的数据。然后到Cleaned broadcast 后 spark-shell 
就没有反应了。 进入类似 
http://csdn-hdp-nn-01:4040/stages/stage/?id=11 这种url 后task 
显示都是running,然后我看了下每个worker 
老年代什么的都是正常的。但是cpu很空闲,感觉人物都没有在跑的æ
 ·å­ã€‚你有遇到这个问题么?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to