Github user allwefantasy commented on the pull request:

    https://github.com/apache/spark/pull/1983#issuecomment-55129263
  
    @witgo 那就是我犯了错误,对Document 中content 
理解错了。我以为content 
是一个固定维度的向量,每个位置代表一个词,每个位置的值代表该词在文ç«
 ä¸­å‡ºçŽ°çš„次数,å…
¶å®žå°±æ˜¯è¯¯ä»¥ä¸ºæ˜¯ä¸€ä¸ªè¯é¢‘向量。所以在我的程序中,Document中content都是固定长度的一个向量,比如20000,é€
 æˆdocument 非常大。如果是你描述的那样,那么 content 
就足够稀疏,计算量和内存占用都会小很多。非常感谢。


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to