[
https://issues.apache.org/jira/browse/SPARK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481132#comment-14481132
]
Sean Owen commented on SPARK-5261:
--
In the new code you pasted, I don't see a difference
[
https://issues.apache.org/jira/browse/SPARK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481378#comment-14481378
]
Guoqiang Li commented on SPARK-5261:
I'm sorry, the after one 's mincount is 100
[
https://issues.apache.org/jira/browse/SPARK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14396112#comment-14396112
]
Sean Owen commented on SPARK-5261:
--
I think they both come down to a minCount that is too
[
https://issues.apache.org/jira/browse/SPARK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334021#comment-14334021
]
Xiangrui Meng commented on SPARK-5261:
--
Could you try a larger minCount to reduce the
[
https://issues.apache.org/jira/browse/SPARK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292823#comment-14292823
]
Guoqiang Li commented on SPARK-5261:
[~lewuathe]
{code}
normalize_text() {
awk
[
https://issues.apache.org/jira/browse/SPARK-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292750#comment-14292750
]
Kai Sasaki commented on SPARK-5261:
---
[~gq] Can you provide us data set? I tried with